Overview
Create workflows programmatically using the Kadoa SDK, CLI, MCP Server, or REST API:
Create workflows with natural language prompts
Use existing schemas or define custom ones
Configure monitoring and scheduling options
Prerequisites
Before you begin, you’ll need:
A Kadoa account
Your API key
For SDK: npm install @kadoa/node-sdk or yarn add @kadoa/node-sdk or uv add kadoa-sdk
Authentication
Node SDK
Python SDK
CLI
MCP Server
REST API
import { KadoaClient } from '@kadoa/node-sdk' ;
const client = new KadoaClient ({
apiKey: 'YOUR_API_KEY'
});
const status = await client . status ();
console . log ( status );
console . log ( status . user );
Choose how you want to extract data from websites:
Auto-Detection
Auto-detect uses AI to detect and extract what’s on the page. If you’re using the REST API directly, auto-detection isn’t available and you need to pass a data schema.
Node SDK
Python SDK
CLI
MCP Server
REST API
// SDK: AI automatically detects and extracts data
const result = await client . extraction . run ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Auto Product Extraction" ,
limit: 10 ,
});
console . log ( result . data );
Custom Schema
Define exactly what fields you want to extract for precise control:
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Structured Product Extraction" ,
extraction : ( builder ) =>
builder
. entity ( "Product" )
. field ( "title" , "Product name" , "STRING" , {
example: "iPhone 15 Pro" ,
})
. field ( "price" , "Price in USD" , "MONEY" )
. field ( "inStock" , "Availability" , "BOOLEAN" )
. field ( "rating" , "Rating 1-5" , "NUMBER" )
. field ( "releaseDate" , "Launch date" , "DATE" ),
})
. create ();
const result = await workflow . run ({ limit: 10 });
// Use destructuring for cleaner access
const { data } = await result . fetchData ({});
console . log ( data );
Available Data Types: STRING, NUMBER, BOOLEAN, DATE, DATETIME, MONEY, IMAGE, LINK, OBJECT, ARRAY
See all data types →
PDF Page Selection
When extracting from PDF URLs, you can specify which pages to process:
// POST https://api.kadoa.com/v4/workflows
{
"urls" : [ "https://example.com/report.pdf" ],
"name" : "PDF Extraction" ,
"entity" : "Data" ,
"fields" : [
{
"name" : "content" ,
"dataType" : "STRING" ,
"description" : "Extracted content"
}
],
"pageNumbers" : [ 1 , 2 , 3 ] // Extract only pages 1, 2, and 3
}
If pageNumbers is omitted, all pages are processed.
Extract unstructured content as HTML, Markdown, or plain text:
Node SDK
Python SDK
REST API
// Extract as Markdown
const extraction = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/news" ],
name: "Article Content" ,
extraction : ( builder ) => builder . raw ( "MARKDOWN" ),
})
. create ();
const run = await extraction . run ({ limit: 10 });
const data = await run . fetchData ({});
console . log ( data );
Available Formats:
HTML - Raw HTML content
MARKDOWN - Markdown formatted text
PAGE_URL - URLs of extracted pages
Classification
Automatically categorize content into predefined classes:
Node SDK
Python SDK
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/news" ],
name: "Article Classifier" ,
extraction : ( builder ) =>
builder
. entity ( "Article" )
. field ( "title" , "Headline" , "STRING" , {
example: "Tech Company Announces New Product" ,
})
. field ( "content" , "Article text" , "STRING" , {
example: "The article discusses the latest innovations..." ,
})
. classify ( "sentiment" , "Content tone" , [
{ title: "Positive" , definition: "Optimistic tone" },
{ title: "Negative" , definition: "Critical tone" },
{ title: "Neutral" , definition: "Balanced tone" },
])
. classify ( "category" , "Article topic" , [
{ title: "Technology" , definition: "Tech news" },
{ title: "Business" , definition: "Business news" },
{ title: "Politics" , definition: "Political news" },
]),
})
. create ();
//Note: 'limit' here is limiting number of extracted records not fetched
const result = await workflow . run ({ limit: 10 , variables: {} });
console . log ( result . jobId );
const data = result . fetchData ({ limit: 10 });
console . log ( data );
Prompts
Every workflow is driven by a prompt: a plain-language description of what to extract and how to get there. The AI agent handles navigation, pagination, clicking, and forms automatically. For details, see Prompts .
Legacy navigationMode values (single-page, paginated-page, page-and-detail) are still accepted by the API for backwards compatibility.
Using a Prompt
For multi-step or complex extractions, write a prompt in plain language. The AI agent follows your prompt to navigate, interact, and extract:
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Product Catalog" ,
userPrompt: "Extract all products. Click 'Next' to continue through pagination. For each product, open the detail page and extract the full description and specifications." ,
extraction : ( builder ) =>
builder
. entity ( "Product" )
. field ( "title" , "Product name" , "STRING" , {
example: "Wireless Headphones" ,
})
. field ( "price" , "Product price" , "MONEY" )
. field ( "description" , "Full description" , "STRING" )
. field ( "specifications" , "Technical specs" , "STRING" ),
})
. create ();
const result = await workflow . run ({ limit: 50 });
Learn how to write effective prompts →
Complex multi-step extractions can take significantly longer to complete (usually around an hour). We recommend avoiding waiting for results synchronously.
Mode-Specific Examples
For simple extractions without writing a prompt, you can also use specific navigationMode values. These are useful for straightforward cases and for backwards compatibility with existing workflows.
Extract data from a single page, such as a job posting or product page:
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/careers-simple" ],
name: "Job Posting Monitor" ,
navigationMode: "single-page" ,
extraction : ( builder ) =>
builder
. entity ( "Job Posting" )
. field ( "jobTitle" , "Job title" , "STRING" , {
example: "Senior Software Engineer" ,
})
. field ( "department" , "Department or team" , "STRING" , {
example: "Engineering" ,
})
. field ( "location" , "Job location" , "STRING" , {
example: "San Francisco, CA" ,
}),
})
. setInterval ({ interval: "DAILY" })
. create ();
console . log ( "Workflow created:" , workflow . workflowId );
const result = await workflow . run ({ limit: 10 , variables: {} });
console . log ( result . jobId );
List Navigation
Navigate through paginated lists to extract multiple items:
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Product Catalog Monitor" ,
navigationMode: "paginated-page" ,
extraction : () => ({ schemaId }),
})
. setInterval ({ interval: "HOURLY" })
. create ();
// Run the workflow
const result = await workflow . run ({ limit: 10 });
const response = await result . fetchData ({});
console . log ( "Extracted items:" , response . data );
List + Details Navigation
Navigate through a list and then open each item for detailed extraction:
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Product Details Extractor" ,
navigationMode: "page-and-detail" ,
extraction : ( builder ) =>
builder
. entity ( "Product" )
. field ( "title" , "Product name" , "STRING" , {
example: "Wireless Headphones" ,
})
. field ( "price" , "Product price" , "MONEY" )
. field ( "description" , "Full description" , "STRING" , {
example: "Premium noise-cancelling headphones..." ,
})
. field ( "specifications" , "Technical specs" , "STRING" , {
example: "Battery life: 30 hours, Bluetooth 5.0..." ,
}),
})
. create ();
const result = await workflow . run ({ limit: 10 });
const productDetails = await result . fetchData ({});
console . log ( productDetails . data );
All Pages (Crawler) Navigation
Crawl all pages or up to maxPages pages (if specified) and extract matching entities from discovered pages.
The starting URL must display the entity you want to extract.
Node SDK
Python SDK
CLI
MCP Server
REST API
const workflow = await client
. extract ({
urls: [ "https://sandbox.kadoa.com/ecommerce" ],
name: "Product Catalog Crawler" ,
navigationMode: "all-pages" ,
extraction : ( builder ) =>
builder
. entity ( "Product" )
. field ( "title" , "Product name" , "STRING" , {
example: "Sennheiser HD 6XX" ,
})
. field ( "price" , "Product price" , "MONEY" )
. field ( "reviews" , "Number of reviews" , "STRING" , {
example: "155 reviews" ,
}),
})
. create ();
const result = await workflow . run ({ limit: 10 });
const response = await result . fetchData ({});
console . log ( response . data );
All URLs must share the exact same hostname. For example, https://example.com and https://example.com/products are valid, but mixing https://example.com with https://www.example.com or https://shop.example.com fails.
For crawler parameters (maxPages, maxDepth, pathsFilterIn, pathsFilterOut), see Crawling .
Raw Data Mode (No Schema)
Crawl a website and retrieve raw page artifacts (HTML, Markdown, screenshots) without defining an entity or schema. Useful for LLM ingestion, site archival, or content analysis.
For output options and parameters, see Crawling .
// POST https://api.kadoa.com/v4/workflows
{
"urls" : [ "https://example.com" ],
"name" : "Site Archive" ,
"outputOptions" : {
"includeHtml" : true ,
"includeMarkdown" : true ,
"includeScreenshots" : false ,
"includeJson" : false
},
"maxPages" : 500 ,
"maxDepth" : 5
}
Writing Prompts
Write clear, step-by-step prompts for complex extraction tasks. The AI agent follows your prompt autonomously, handling clicks, forms, pagination, and file downloads.
Learn how to write effective prompts →
Next Steps