Skip to main content
We currently support Node.js/TypeScript. Python SDK coming soon.

Prerequisites

To get the most out of this guide, you’ll need to:

1. Install

npm install @kadoa/node-sdk

2. Extract Data

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'your-api-key'
});

// AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  name: 'Product Extraction'
});

console.log(`Extracted data ${result.data}`);

Choose Your Extraction Method

Extraction Methods

Auto-Detection

The fastest way to extract data. AI automatically identifies structured content:
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  name: 'Auto Product Extraction'
});

// Data is available directly in result.data
console.log(result.data);

Custom Schema (Builder API)

Define exactly what you want to extract with type-safe field definitions:
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Product Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('title', 'Product name', 'STRING', {
        example: 'Wireless Headphones'
      })
      .field('price', 'Product price in USD', 'MONEY')
      .field('inStock', 'Availability status', 'BOOLEAN')
      .field('rating', 'Star rating 1-5', 'NUMBER')
  })
  .bypassPreview()
  .setInterval({ interval: 'ONLY_ONCE' })
  .create();

const result = await extraction.run();

// Fetch the extracted data
const data = await result.fetchData({});
console.log(data.data); // Array of extracted items
// [{ title: "Dell XPS", price: "$999", inStock: true, rating: 4.5 }, ...]
See all available field types →

Raw Content Extraction

Extract raw HTML, Markdown, or URLs without structure:
// Extract as Markdown
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Content',
    extraction: builder => builder.raw('markdown')
  })
  .create();

// Extract multiple formats
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Multi-format',
    extraction: builder => builder.raw(['html', 'markdown', 'url'])
  })
  .create();
Available Formats:
  • html - Raw HTML
  • markdown - Markdown formatted content
  • url - Page URLs

Classification

Automatically categorize content into predefined classes:
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/news'],
    name: 'Article Classifier',
    extraction: builder => builder
      .schema('Article')
      .field('title', 'Article headline', 'STRING', { example: 'Breaking News: AI Advances' })
      .classify('sentiment', 'Content sentiment', [
        { title: 'Positive', definition: 'Optimistic or favorable tone' },
        { title: 'Negative', definition: 'Critical or unfavorable tone' },
        { title: 'Neutral', definition: 'Balanced or objective tone' }
      ])
      .classify('category', 'Article category', [
        { title: 'Technology', definition: 'Tech news and updates' },
        { title: 'Business', definition: 'Business and finance' },
        { title: 'Politics', definition: 'Political news' },
        { title: 'Sports', definition: 'Sports coverage' }
      ])
  })
  .create();

Hybrid Extraction

Combine structured fields with raw content:
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Hybrid Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('title', 'Product name', 'STRING', { example: 'Laptop Pro' })
      .field('price', 'Price', 'MONEY')
      .raw('html')  // Also include raw HTML
  })
  .create();

Real-time Notifications

Get instant WebSocket notifications when data changes:
const client = new KadoaClient({
  apiKey: 'your-api-key',
  enableRealtime: true
});

// Listen to all events
client.realtime?.onEvent((event) => {
  console.log('Event:', event);
  // Handle: EXTRACTION_STARTED, EXTRACTION_COMPLETED, DATA_CHANGED, etc.
});

// Check connection
if (client.isRealtimeConnected()) {
  console.log('Connected to real-time updates');
}

// Run extraction with notifications
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce'],
  notifications: {
    events: 'all', // or ['EXTRACTION_COMPLETED', 'DATA_CHANGED']
    channels: {
      WEBSOCKET: true
    }
  }
});
Available Events:
  • EXTRACTION_STARTED - Extraction begins
  • EXTRACTION_COMPLETED - Extraction finished
  • DATA_CHANGED - New data detected
  • VALIDATION_COMPLETED - Validation finished
  • ERROR - Error occurred

Workflow Scheduling

Create reusable workflows that run on a schedule:
const workflow = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Scheduled Extraction',
    extraction: builder => builder
      .schema('Product')
      .field('name', 'Product name', 'STRING', { example: 'Smart Watch' })
      .field('price', 'Price', 'MONEY')
  })
  .bypassPreview() // Skip manual review
  .setLocation({
    type: 'auto' // Use automatic location selection
  })
  .setInterval({
    interval: 'DAILY' // ONLY_ONCE, HOURLY, DAILY, WEEKLY, MONTHLY
  })
  .create();

// Run manually when needed
const result = await workflow.run();

// Or let it run on schedule
console.log('Workflow created:', workflow.id);

Pagination Handling

Automatically navigate through multiple pages:
const result = await client.extraction.run({
  urls: ['https://sandbox.kadoa.com/ecommerce/pagination'],
  pagination: {
    enabled: true,
    maxPages: 10 // Limit number of pages
  }
});

// Fetch data with pagination
const data = await result.fetchData({
  page: 1,
  limit: 50
});

console.log(`Total items: ${data.pagination.total}`);
console.log(`Page ${data.pagination.page} of ${data.pagination.totalPages}`);

// Or iterate through all pages
for await (const page of result.fetchDataPages()) {
  console.log('Page data:', page.data);
  console.log('Page number:', page.pagination.page);
}

// Or get everything at once
const allData = await result.fetchAllData();

Reuse Existing Schemas

Reference previously created schemas:
// Use an existing schema by ID
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Use Existing Schema',
    extraction: builder => builder.useSchema('schema-id-123')
  })
  .create();

Getting Help

If you’re stuck:
  1. Check the examples in this documentation
  2. Browse the GitHub examples
  3. Search GitHub Issues
  4. Contact support at support@kadoa.com
I