SDKs - Kadoa API

We currently support Node.js/TypeScript. Python SDK coming soon.

Prerequisites

To get the most out of this guide, you’ll need to:

Create a Kadoa account
Get your API key

1. Install

npm install @kadoa/node-sdk

2. Extract Data

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'your-api-key'
});

// AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  name: 'Product Extraction'
});

console.log(`Extracted data ${result.data}`);

Choose Your Extraction Method

Auto-Detection

Let AI find and extract data automatically (Recommended)

Custom Schema

Define exactly what fields you want

Raw Content

Get HTML, Markdown, or plain text

Classification

Categorize content automatically

Extraction Methods

Auto-Detection

The fastest way to extract data. AI automatically identifies structured content:

const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  name: 'Auto Product Extraction'
});

// Data is available directly in result.data
console.log(result.data);

Custom Schema (Builder API)

Define exactly what you want to extract with type-safe field definitions:

const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Product Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('title', 'Product name', 'STRING', {
        example: 'Wireless Headphones'
      })
      .field('price', 'Product price in USD', 'MONEY')
      .field('inStock', 'Availability status', 'BOOLEAN')
      .field('rating', 'Star rating 1-5', 'NUMBER')
  })
  .bypassPreview()
  .setInterval({ interval: 'ONLY_ONCE' })
  .create();

const result = await extraction.run();

// Fetch the extracted data
const data = await result.fetchData({});
console.log(data.data); // Array of extracted items
// [{ title: "Dell XPS", price: "$999", inStock: true, rating: 4.5 }, ...]

See all available field types →

Raw Content Extraction

Extract raw HTML, Markdown, or URLs without structure:

// Extract as Markdown
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Content',
    extraction: builder => builder.raw('markdown')
  })
  .create();

// Extract multiple formats
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Multi-format',
    extraction: builder => builder.raw(['html', 'markdown', 'url'])
  })
  .create();

Available Formats:

html - Raw HTML
markdown - Markdown formatted content
url - Page URLs

Classification

Automatically categorize content into predefined classes:

const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Classifier',
    extraction: builder => builder
      .schema('Article')
      .field('title', 'Article headline', 'STRING', { example: 'Breaking News: AI Advances' })
      .classify('sentiment', 'Content sentiment', [
        { title: 'Positive', definition: 'Optimistic or favorable tone' },
        { title: 'Negative', definition: 'Critical or unfavorable tone' },
        { title: 'Neutral', definition: 'Balanced or objective tone' }
      ])
      .classify('category', 'Article category', [
        { title: 'Technology', definition: 'Tech news and updates' },
        { title: 'Business', definition: 'Business and finance' },
        { title: 'Politics', definition: 'Political news' },
        { title: 'Sports', definition: 'Sports coverage' }
      ])
  })
  .create();

Hybrid Extraction

Combine structured fields with raw content:

const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Hybrid Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('title', 'Product name', 'STRING', { example: 'Laptop Pro' })
      .field('price', 'Price', 'MONEY')
      .raw('html')  // Also include raw HTML
  })
  .create();

Real-time Notifications

Get instant WebSocket notifications when data changes:

const client = new KadoaClient({
  apiKey: 'your-api-key',
  enableRealtime: true
});

// Listen to all events
client.realtime?.onEvent((event) => {
  console.log('Event:', event);
  // Handle: EXTRACTION_STARTED, EXTRACTION_COMPLETED, DATA_CHANGED, etc.
});

// Check connection
if (client.isRealtimeConnected()) {
  console.log('Connected to real-time updates');
}

// Run extraction with notifications
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  notifications: {
    events: 'all', // or ['EXTRACTION_COMPLETED', 'DATA_CHANGED']
    channels: {
      WEBSOCKET: true
    }
  }
});

Available Events:

EXTRACTION_STARTED - Extraction begins
EXTRACTION_COMPLETED - Extraction finished
DATA_CHANGED - New data detected
VALIDATION_COMPLETED - Validation finished
ERROR - Error occurred

Workflow Scheduling

Create reusable workflows that run on a schedule:

const workflow = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Scheduled Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('name', 'Product name', 'STRING', { example: 'Smart Watch' })
      .field('price', 'Price', 'MONEY')
  })
  .bypassPreview() // Skip manual review
  .setLocation({
    type: 'auto' // Use automatic location selection
  })
  .setInterval({
    interval: 'DAILY' // ONLY_ONCE, HOURLY, DAILY, WEEKLY, MONTHLY
  })
  .create();

// Run manually when needed
const result = await workflow.run();

// Or let it run on schedule
console.log('Workflow created:', workflow.id);

Pagination Handling

Automatically navigate through multiple pages:

const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce/pagination'],
  pagination: {
    enabled: true,
    maxPages: 10 // Limit number of pages
  }
});

// Fetch data with pagination
const data = await result.fetchData({
  page: 1,
  limit: 50
});

console.log(`Total items: ${data.pagination.total}`);
console.log(`Page ${data.pagination.page} of ${data.pagination.totalPages}`);

// Or iterate through all pages
for await (const page of result.fetchDataPages()) {
  console.log('Page data:', page.data);
  console.log('Page number:', page.pagination.page);
}

// Or get everything at once
const allData = await result.fetchAllData();

Reuse Existing Schemas

Reference previously created schemas:

// Use an existing schema by ID
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Use Existing Schema',
    extraction: builder => builder.useSchema('schema-id-123')
  })
  .create();

Getting Help

If you’re stuck:

Check the examples in this documentation
Browse the GitHub examples
Search GitHub Issues
Contact support at support@kadoa.com

Documentation

​Prerequisites

​1. Install

​2. Extract Data

​Choose Your Extraction Method

Auto-Detection

Custom Schema

Raw Content

Classification

​Extraction Methods

​Auto-Detection

​Custom Schema (Builder API)

​Raw Content Extraction

​Classification

​Hybrid Extraction

​Real-time Notifications

​Workflow Scheduling

​Pagination Handling

​Reuse Existing Schemas

​Getting Help

Prerequisites

1. Install

2. Extract Data

Choose Your Extraction Method

Extraction Methods

Auto-Detection

Custom Schema (Builder API)

Raw Content Extraction

Classification

Hybrid Extraction

Real-time Notifications

Workflow Scheduling

Pagination Handling

Reuse Existing Schemas

Getting Help