Skip to main content

Overview

This guide shows you how to create workflows programmatically using either the Kadoa SDK or REST API. You’ll learn how to:
  • Create workflows with different navigation modes
  • Use existing schemas or define custom ones
  • Set up AI Navigation with natural language instructions
  • Configure monitoring and scheduling options

Prerequisites

Before you begin, you’ll need:
  • A Kadoa account
  • Your API key
  • For SDK: npm install @kadoa/node-sdk or yarn add @kadoa/node-sdk

Authentication

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'your-api-key'
});

Extraction Methods

Choose how you want to extract data from websites:

Auto-Detection

Auto-detection is a convenience feature available in the SDK. When using the API directly, you must provide either a schema definition or use AI Navigation mode.
// SDK: AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  name: 'Auto Product Extraction'
});

// Fetch the extracted data
const response = await result.fetchData({});
console.log(response.data);

Custom Schema

Define exactly what fields you want to extract for precise control:
const workflow = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Structured Product Extraction',
    extraction: builder => builder
      .entity('Product')
      .field('title', 'Product name', 'STRING', { example: 'iPhone 15 Pro' })
      .field('price', 'Price in USD', 'MONEY')
      .field('inStock', 'Availability', 'BOOLEAN')
      .field('rating', 'Rating 1-5', 'NUMBER')
      .field('releaseDate', 'Launch date', 'DATE')
  })
  .create();

const result = await workflow.run();

// Use destructuring for cleaner access
const { data } = await result.fetchData({});
console.log(data);
Available Data Types:
  • STRING - Text content
  • NUMBER - Numeric values
  • BOOLEAN - True/false values
  • DATE - Date values
  • DATETIME - Date and time values
  • MONEY / CURRENCY - Monetary values
  • IMAGE - Image URLs
  • LINK - Hyperlinks
  • OBJECT - Nested objects
  • ARRAY - Lists of items
See all data types →

Raw Content Extraction

Extract unstructured content as HTML, Markdown, or plain text:
// Extract as Markdown
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Content',
    extraction: builder => builder.raw('MARKDOWN')
  })
  .create();

// Extract multiple formats
const multiFormat = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Multi-format',
    extraction: builder => builder
      .raw(['HTML', 'MARKDOWN', 'PAGE_URL'])
  })
  .create();
Available Formats:
  • HTML - Raw HTML content
  • MARKDOWN - Markdown formatted text
  • PAGE_URL - URLs of extracted pages

Classification

Automatically categorize content into predefined classes:
const workflow = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Classifier',
    extraction: builder => builder
      .entity('Article')
      .field('title', 'Headline', 'STRING', { example: 'Tech Company Announces New Product' })
      .field('content', 'Article text', 'STRING', { example: 'The article discusses the latest innovations...' })
      .classify('sentiment', 'Content tone', [
        { title: 'Positive', definition: 'Optimistic tone' },
        { title: 'Negative', definition: 'Critical tone' },
        { title: 'Neutral', definition: 'Balanced tone' }
      ])
      .classify('category', 'Article topic', [
        { title: 'Technology', definition: 'Tech news' },
        { title: 'Business', definition: 'Business news' },
        { title: 'Politics', definition: 'Political news' }
      ])
  })
  .create();
Kadoa supports four navigation modes to handle different website structures:
ModeValueBest For
Single Pagesingle-pageExtract data from a single page
Listpaginated-pageNavigate through lists with pagination
List + Detailspage-and-detailNavigate lists then open each item for details
AI Navigationagentic-navigationAI-driven navigation using natural language
Learn more about Navigation Modes →

Single Page Extraction

Extract data from a single page, such as a job posting or product page:
const workflow = await client
  .extract({
    urls: ['https://example.com/careers/job-123'],
    name: 'Job Posting Monitor',
    navigationMode: 'single-page',
    extraction: builder => builder
      .entity('Job Posting')
      .field('jobTitle', 'Job title', 'STRING', { example: 'Senior Software Engineer' })
      .field('department', 'Department or team', 'STRING', { example: 'Engineering' })
      .field('location', 'Job location', 'STRING', { example: 'San Francisco, CA' })
  })
  .setInterval({ interval: 'DAILY' })
  .create();

console.log('Workflow created:', workflow.id);

List Navigation

Navigate through paginated lists to extract multiple items:
const workflow = await client
  .extract({
    urls: ['https://example.com/products'],
    name: 'Product Catalog Monitor',
    navigationMode: 'paginated-page',
    schemaId: 'YOUR_SCHEMA_ID', // Use existing schema
    limit: 100
  })
  .setInterval({ interval: 'HOURLY' })
  .create();

// Run the workflow
const result = await workflow.run();
const response = await result.fetchData({});
console.log('Extracted items:', response.data);

List + Details Navigation

Navigate through a list and then open each item for detailed extraction:
const workflow = await client
  .extract({
    urls: ['https://example.com/products'],
    name: 'Product Details Extractor',
    navigationMode: 'page-and-detail',
    extraction: builder => builder
      .entity('Product')
      .field('title', 'Product name', 'STRING', { example: 'Wireless Headphones' })
      .field('price', 'Product price', 'MONEY')
      .field('description', 'Full description', 'STRING', { example: 'Premium noise-cancelling headphones...' })
      .field('specifications', 'Technical specs', 'STRING', { example: 'Battery life: 30 hours, Bluetooth 5.0...' }),
    limit: 50
  })
  .create();

const result = await workflow.run();
const productDetails = await result.fetchData({});
console.log(productDetails.data);

AI Navigation

AI Navigation enables autonomous website navigation through natural language instructions. The AI understands your intent and navigates complex websites automatically. Learn more about AI Navigation →

Schema Options

AI Navigation supports three approaches:
  1. Existing Schema (schemaId) - Reference a pre-built schema from your account
  2. Custom Schema (entity + fields) - Define specific fields and data types
  3. Auto-Detected Schema (no schema) - Let AI determine what data to extract

AI Navigation with Existing Schema

Use a pre-built schema by referencing its ID:
const workflow = await client
  .extract({
    urls: ['https://example.com'],
    name: 'AI Job Scraper',
    navigationMode: 'agentic-navigation',
    schemaId: '507f1f77bcf86cd799439020',
    userPrompt: `Navigate to the careers section, find all
      engineering job postings, and extract the job details
      including requirements and benefits. Make sure to
      click 'Load More' if present.`,
    limit: 100
  })
  .create();

const result = await workflow.run();
const response = await result.fetchData({});
console.log(response.data);

AI Navigation with Custom Schema

Define your own schema for precise data extraction:
const workflow = await client
  .extract({
    urls: ['https://example.com'],
    name: 'AI Job Scraper',
    navigationMode: 'agentic-navigation',
    extraction: builder => builder
      .entity('Job Posting')
      .field('jobTitle', 'Job title', 'STRING', { example: 'Product Manager' })
      .field('description', 'Job description', 'STRING', { example: 'We are looking for an experienced PM...' })
      .field('requirements', 'Job requirements', 'STRING', { example: '5+ years of product management experience...' })
      .field('benefits', 'Benefits offered', 'STRING', { example: 'Health insurance, 401k, unlimited PTO...' }),
    userPrompt: `Navigate to the careers section, find all
      engineering job postings, and extract the job details
      including requirements and benefits. Make sure to
      click 'Load More' if present.`,
    limit: 100
  })
  .create();

const result = await workflow.run();
const jobPostings = await result.fetchData({});
console.log(jobPostings.data);

AI Navigation with Auto-Detected Schema

Let AI determine what data to extract based on your instructions:
const result = await client.extraction.run({
  urls: ['https://example.com'],
  name: 'AI Blog Scraper',
  navigationMode: 'agentic-navigation',
  userPrompt: `Find all blog posts from 2024. For each post,
    extract the title, author, publication date, main content,
    and any tags or categories. Also check if there are
    comments and extract the comment count.`
});

// AI automatically detects and extracts relevant fields
const { data } = await result.fetchData({});
console.log(data);

Using Variables in AI Navigation

Variables allow dynamic workflows that reference values defined in your dashboard. Create variables in the UI first, then reference them in API requests:
const workflow = await client
  .extract({
    urls: ['https://example.com'],
    name: 'Dynamic Product Search',
    navigationMode: 'agentic-navigation',
    userPrompt: `Navigate to search and loop through
      '@productTypes', press search, and extract
      product details for all results.`
  })
  .create();
Variable Workflow:
  1. Create variables in the dashboard UI (e.g., productTypes)
  2. Reference them using @variableName syntax in your prompt
  3. The backend automatically interpolates variables using account values

Scheduling & Running Workflows

Scheduling Options

Configure when your workflow runs:
const workflow = await client
  .extract({
    urls: ['https://example.com/products'],
    name: 'Scheduled Extraction',
    // ... extraction configuration
  })
  .setInterval({
    interval: 'CUSTOM',
    schedules: ['0 9 * * MON-FRI', '0 18 * * MON-FRI']
  })
  .create();

// Workflow runs automatically on schedule
console.log('Scheduled workflow:', workflow.id);
Available intervals:
  • ONLY_ONCE - Run once
  • HOURLY, DAILY, WEEKLY, MONTHLY - Standard intervals
  • REAL_TIME - Continuous monitoring (Enterprise only)
  • CUSTOM - Use cron expressions

Manual Execution

Run workflows on demand:
// Run existing workflow
const workflow = await client.workflow('workflow-id');
const result = await workflow.run();
const response = await result.fetchData({});
console.log(response.data);

// Or run extraction directly
const extractionResult = await client.extraction.run({
  urls: ['https://example.com/page'],
  // ... configuration
});
const { data } = await extractionResult.fetchData({});
console.log(data);

Checking Workflow Status

When using the API, poll the workflow status to know when extraction is complete:
// SDK handles status checking automatically
const workflow = await client.workflow('workflow-id');
const result = await workflow.run(); // Returns run result, not data

// Fetch the extracted data
const response = await result.fetchData({});
console.log('Data:', response.data);

// Or check status manually
const status = await workflow.getStatus();
console.log('Status:', status);
Workflow States:
  • IN_PROGRESS - Extraction is running
  • COMPLETED - Data is ready to retrieve
  • FAILED - Extraction failed (check errors field)

Pagination Handling

Automatically navigate through multiple pages of results:
const result = await client.extraction.run({
  urls: ['https://example.com/products'],
  navigationMode: 'paginated-page',
  pagination: {
    enabled: true,
    maxPages: 10 // Limit number of pages
  }
});

// Iterate through pages
for await (const page of result.fetchDataPages()) {
  console.log('Page data:', page.data);
  console.log('Page number:', page.pagination.page);
}

// Or get all data at once
const allData = await result.fetchAllData();

Advanced Configuration

Proxy Locations

Specify geographic location for scraping:
const workflow = await client
  .extract({
    urls: ['https://example.com'],
    // ... configuration
  })
  .setLocation({
    type: 'manual',
    isoCode: 'US'
  })
  .create();
Available locations:
  • US - United States
  • GB - United Kingdom
  • DE - Germany
  • NL - Netherlands
  • CA - Canada
  • auto - Automatic selection

Preview Mode

Skip manual review and activate workflows immediately:
const workflow = await client
  .extract({
    // ... configuration
  })
  .bypassPreview() // Skip review step
  .create();

// Workflow is immediately active

Next Steps

I