Overview
This guide shows you how to create workflows programmatically using either the Kadoa SDK or REST API. You’ll learn how to:- Create workflows with different navigation modes
- Use existing schemas or define custom ones
- Set up AI Navigation with natural language instructions
- Configure monitoring and scheduling options
Prerequisites
Before you begin, you’ll need:- A Kadoa account
- Your API key
- For SDK:
npm install @kadoa/node-sdkoryarn add @kadoa/node-sdkorpip install kadoa-sdk
Authentication
Extraction Methods
Choose how you want to extract data from websites:Auto-Detection
Auto-detect uses AI to detect and extract what’s on the page. If you’re using the REST API directly, auto-detection isn’t available and you need to pass a data schema.
Custom Schema
Define exactly what fields you want to extract for precise control:STRING- Text contentNUMBER- Numeric valuesBOOLEAN- True/false valuesDATE- Date valuesDATETIME- Date and time valuesMONEY/CURRENCY- Monetary valuesIMAGE- Image URLsLINK- HyperlinksOBJECT- Nested objectsARRAY- Lists of items
Raw Content Extraction
Extract unstructured content as HTML, Markdown, or plain text:HTML- Raw HTML contentMARKDOWN- Markdown formatted textPAGE_URL- URLs of extracted pages
Classification
Automatically categorize content into predefined classes:Navigation Modes
Kadoa supports five navigation modes to handle different website structures:| Mode | Value | Best For |
|---|---|---|
| Single Page | single-page | Extract data from a single page |
| List | paginated-page | Navigate through lists with pagination |
| List + Details | page-and-detail | Navigate lists then open each item for details |
| All Pages | all-pages | Crawl all pages or up to maxPages pages and extract matching entities |
| AI Navigation | agentic-navigation | AI-driven navigation using natural language |
Navigation Mode Examples
Single Page Extraction
Extract data from a single page, such as a job posting or product page:List Navigation
Navigate through paginated lists to extract multiple items:List + Details Navigation
Navigate through a list and then open each item for detailed extraction:All Pages (Crawler) Navigation
Crawl all pages or up tomaxPages pages (if specified) and extract matching entities from discovered pages.
The starting URL must display the entity you want to extract.
All URLs must share the exact same hostname. For example,
https://example.com and https://example.com/products are valid, but mixing https://example.com with https://www.example.com or https://shop.example.com will be rejected.| Parameter | Description | Default |
|---|---|---|
maxPages | Maximum pages to crawl (1-100,000). Crawling stops when reached. | 10,000 |
maxDepth | Maximum crawl depth from starting URL (1-200) | 50 |
pathsFilterIn | Regex patterns to include specific paths (e.g., ["/products/.*"]) | None |
pathsFilterOut | Regex patterns to exclude specific paths (e.g., ["/admin/.*"]) | None |
The crawler will crawl all pages or up to
maxPages pages (if specified) and extract entities matching your schema from those pages.AI Navigation
AI Navigation enables autonomous website navigation through natural language instructions. The AI understands your intent and navigates complex websites automatically. Learn more about AI Navigation →Schema Options
AI Navigation supports three approaches:- Existing Schema (
schemaId) - Reference a pre-built schema from your account - Custom Schema (
entity+fields) - Define specific fields and data types - Auto-Detected Schema (no schema) - Let AI determine what data to extract
AI Navigation with Existing Schema
Use a pre-built schema by referencing its ID:AI Navigation with Custom Schema
Define your own schema for precise data extraction:AI Navigation with Auto-Detected Schema
Let AI determine what data to extract based on your instructions:Using Variables in AI Navigation
Variables allow dynamic workflows that reference values defined in your dashboard. Create variables in the UI first, then reference them in API requests:- Create variables in the dashboard UI (e.g.,
productTypes) - Reference them using
@variableNamesyntax in your prompt - The backend automatically interpolates variables using account values
Scheduling & Running Workflows
Scheduling Options
Configure when your workflow runs:ONLY_ONCE- Run onceHOURLY,DAILY,WEEKLY,MONTHLY- Standard intervalsREAL_TIME- Continuous monitoring (Self-service limited to 10 workflows; Enterprise is unlimited)CUSTOM- Use cron expressions
Manual Execution
Run workflows on demand:Checking Workflow Status
When using the API, poll the workflow status to know when extraction is complete:IN_PROGRESS- Extraction is runningCOMPLETED- Data is ready to retrieveFAILED- Extraction failed (check errors field)
Pagination Handling
Automatically navigate through multiple pages of results:Advanced Configuration
Proxy Locations
Specify geographic location for scraping:US- United StatesGB- United KingdomDE- GermanyNL- NetherlandsCA- Canadaauto- Automatic selection