Skip to main content

Overview

Create workflows programmatically using the Kadoa SDK, CLI, MCP Server, or REST API:
  • Create workflows with natural language prompts
  • Use existing schemas or define custom ones
  • Configure monitoring and scheduling options

Prerequisites

Before you begin, you’ll need:
  • A Kadoa account
  • Your API key
  • For SDK: npm install @kadoa/node-sdk or yarn add @kadoa/node-sdk or uv add kadoa-sdk

Authentication

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'YOUR_API_KEY'
});

const status = await client.status();
console.log(status);
console.log(status.user);

Extraction Methods

Choose how you want to extract data from websites:

Auto-Detection

Auto-detect uses AI to detect and extract what’s on the page. If you’re using the REST API directly, auto-detection isn’t available and you need to pass a data schema.
// SDK: AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ["https://sandbox.kadoa.com/ecommerce"],
  name: "Auto Product Extraction",
  limit: 10,
});

console.log(result.data);

Custom Schema

Define exactly what fields you want to extract for precise control:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Structured Product Extraction",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "iPhone 15 Pro",
        })
        .field("price", "Price in USD", "MONEY")
        .field("inStock", "Availability", "BOOLEAN")
        .field("rating", "Rating 1-5", "NUMBER")
        .field("releaseDate", "Launch date", "DATE"),
  })
  .create();

const result = await workflow.run({ limit: 10 });

// Use destructuring for cleaner access
const { data } = await result.fetchData({});
console.log(data);
Available Data Types: STRING, NUMBER, BOOLEAN, DATE, DATETIME, MONEY, IMAGE, LINK, OBJECT, ARRAY See all data types →

PDF Page Selection

When extracting from PDF URLs, you can specify which pages to process:
REST API
// POST https://api.kadoa.com/v4/workflows
{
  "urls": ["https://example.com/report.pdf"],
  "name": "PDF Extraction",
  "entity": "Data",
  "fields": [
    {
      "name": "content",
      "dataType": "STRING",
      "description": "Extracted content"
    }
  ],
  "pageNumbers": [1, 2, 3]  // Extract only pages 1, 2, and 3
}
If pageNumbers is omitted, all pages are processed.

Raw Content Extraction

Extract unstructured content as HTML, Markdown, or plain text:
// Extract as Markdown
const extraction = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/news"],
    name: "Article Content",
    extraction: (builder) => builder.raw("MARKDOWN"),
  })
  .create();

const run = await extraction.run({ limit: 10 });
const data = await run.fetchData({});
console.log(data);
Available Formats:
  • HTML - Raw HTML content
  • MARKDOWN - Markdown formatted text
  • PAGE_URL - URLs of extracted pages

Classification

Automatically categorize content into predefined classes:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/news"],
    name: "Article Classifier",
    extraction: (builder) =>
      builder
        .entity("Article")
        .field("title", "Headline", "STRING", {
          example: "Tech Company Announces New Product",
        })
        .field("content", "Article text", "STRING", {
          example: "The article discusses the latest innovations...",
        })
        .classify("sentiment", "Content tone", [
          { title: "Positive", definition: "Optimistic tone" },
          { title: "Negative", definition: "Critical tone" },
          { title: "Neutral", definition: "Balanced tone" },
        ])
        .classify("category", "Article topic", [
          { title: "Technology", definition: "Tech news" },
          { title: "Business", definition: "Business news" },
          { title: "Politics", definition: "Political news" },
        ]),
  })
  .create();
//Note: 'limit' here is limiting number of extracted records not fetched
const result = await workflow.run({ limit: 10, variables: {} });
console.log(result.jobId);
const data = result.fetchData({ limit: 10 });
console.log(data);

Prompts

Every workflow is driven by a prompt: a plain-language description of what to extract and how to get there. The AI agent handles navigation, pagination, clicking, and forms automatically. For details, see Prompts.
Legacy navigationMode values (single-page, paginated-page, page-and-detail) are still accepted by the API for backwards compatibility.

Using a Prompt

For multi-step or complex extractions, write a prompt in plain language. The AI agent follows your prompt to navigate, interact, and extract:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Catalog",
    userPrompt: "Extract all products. Click 'Next' to continue through pagination. For each product, open the detail page and extract the full description and specifications.",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "Wireless Headphones",
        })
        .field("price", "Product price", "MONEY")
        .field("description", "Full description", "STRING")
        .field("specifications", "Technical specs", "STRING"),
  })
  .create();

const result = await workflow.run({ limit: 50 });
Learn how to write effective prompts →
Complex multi-step extractions can take significantly longer to complete (usually around an hour). We recommend avoiding waiting for results synchronously.

Mode-Specific Examples

For simple extractions without writing a prompt, you can also use specific navigationMode values. These are useful for straightforward cases and for backwards compatibility with existing workflows.

Single Page Extraction

Extract data from a single page, such as a job posting or product page:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/careers-simple"],
    name: "Job Posting Monitor",
    navigationMode: "single-page",
    extraction: (builder) =>
      builder
        .entity("Job Posting")
        .field("jobTitle", "Job title", "STRING", {
          example: "Senior Software Engineer",
        })
        .field("department", "Department or team", "STRING", {
          example: "Engineering",
        })
        .field("location", "Job location", "STRING", {
          example: "San Francisco, CA",
        }),
  })
  .setInterval({ interval: "DAILY" })
  .create();

console.log("Workflow created:", workflow.workflowId);
const result = await workflow.run({ limit: 10, variables: {} });
console.log(result.jobId);

List Navigation

Navigate through paginated lists to extract multiple items:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Catalog Monitor",
    navigationMode: "paginated-page",
    extraction: () => ({ schemaId }),
  })
  .setInterval({ interval: "HOURLY" })
  .create();

// Run the workflow
const result = await workflow.run({ limit: 10 });
const response = await result.fetchData({});
console.log("Extracted items:", response.data);

List + Details Navigation

Navigate through a list and then open each item for detailed extraction:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Details Extractor",
    navigationMode: "page-and-detail",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "Wireless Headphones",
        })
        .field("price", "Product price", "MONEY")
        .field("description", "Full description", "STRING", {
          example: "Premium noise-cancelling headphones...",
        })
        .field("specifications", "Technical specs", "STRING", {
          example: "Battery life: 30 hours, Bluetooth 5.0...",
        }),
  })
  .create();

const result = await workflow.run({ limit: 10 });
const productDetails = await result.fetchData({});
console.log(productDetails.data);

All Pages (Crawler) Navigation

Crawl all pages or up to maxPages pages (if specified) and extract matching entities from discovered pages.
The starting URL must display the entity you want to extract.
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Catalog Crawler",
    navigationMode: "all-pages",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "Sennheiser HD 6XX",
        })
        .field("price", "Product price", "MONEY")
        .field("reviews", "Number of reviews", "STRING", {
          example: "155 reviews",
        }),
  })
  .create();

const result = await workflow.run({ limit: 10 });
const response = await result.fetchData({});
console.log(response.data);
All URLs must share the exact same hostname. For example, https://example.com and https://example.com/products are valid, but mixing https://example.com with https://www.example.com or https://shop.example.com fails.
For crawler parameters (maxPages, maxDepth, pathsFilterIn, pathsFilterOut), see Crawling.

Raw Data Mode (No Schema)

Crawl a website and retrieve raw page artifacts (HTML, Markdown, screenshots) without defining an entity or schema. Useful for LLM ingestion, site archival, or content analysis. For output options and parameters, see Crawling.
// POST https://api.kadoa.com/v4/workflows
{
  "urls": ["https://example.com"],
  "name": "Site Archive",
  "outputOptions": {
    "includeHtml": true,
    "includeMarkdown": true,
    "includeScreenshots": false,
    "includeJson": false
  },
  "maxPages": 500,
  "maxDepth": 5
}

Writing Prompts

Write clear, step-by-step prompts for complex extraction tasks. The AI agent follows your prompt autonomously, handling clicks, forms, pagination, and file downloads. Learn how to write effective prompts →

Next Steps