Data Schemas

Working with Schemas

Define the structure of data you want to extract using the builder API:

const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Product Extraction',
    extraction: builder => builder
      .entity('Product')
      .field('title', 'Product name', 'STRING', { example: 'Laptop' })
      .field('price', 'Product price', 'MONEY')
      .field('inStock', 'Availability', 'BOOLEAN')
      .field('rating', 'Star rating 1-5', 'NUMBER')
  })
  .create();

Reusable Schemas

For consistent data extraction across multiple workflows, you can create and manage schemas separately using the Schema Management API.

Schema Management API

The Schema Management API allows you to create, retrieve, and delete schemas programmatically. Saved schemas can be reused across multiple extractions, ensuring consistent data structure.

When to Use Saved Schemas

Use saved schemas when you:

Extract the same data structure from multiple websites
Want to maintain consistent field definitions across workflows
Need to programmatically manage schema lifecycle
Share schemas across different parts of your application

For one-off extractions, inline schema definitions (shown above) are simpler and don’t require separate schema management.

Create a Schema

const schema = await client.schema.create({
  name: 'Product Schema',
  entity: 'Product',
  fields: [
    {
      name: 'title',
      description: 'Product name',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'iPhone 15 Pro'
    },
    {
      name: 'price',
      description: 'Product price',
      fieldType: 'SCHEMA',
      dataType: 'MONEY'
    },
    {
      name: 'inStock',
      description: 'Availability',
      fieldType: 'SCHEMA',
      dataType: 'BOOLEAN'
    },
    {
      name: 'rating',
      description: 'Star rating',
      fieldType: 'SCHEMA',
      dataType: 'NUMBER'
    }
  ]
});

console.log('Schema created:', schema.id);

Get a Schema

Retrieve an existing schema by ID:

const schema = await client.schema.get('schema-id-123');

console.log(schema.name);     // 'Product Schema'
console.log(schema.entity);   // 'Product'
console.log(schema.fields);   // Array of field definitions

Delete a Schema

Remove a schema when it’s no longer needed:

await client.schema.delete('schema-id-123');

Deleting a schema does not affect existing workflows or extractions that were created using it.

Use a Saved Schema

Reference a saved schema in your extraction:

const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Product Extraction',
    extraction: { schemaId: schema.id }
  })
  .create();

const result = await extraction.run();

Field Types

Schemas support three types of fields:

Regular fields - Structured data extraction (shown above)
Classification fields - Categorize content into predefined labels
Metadata fields - Extract raw page content (HTML, Markdown, URLs)

Available Data Types

For regular fields, specify the dataType: STRING • NUMBER • BOOLEAN • DATE • DATETIME • MONEY • IMAGE • LINK • OBJECT • ARRAY See data type details and examples →

Classification Fields

Categorize extracted content into predefined labels:

const schema = await client.schema.create({
  name: 'Article Schema',
  entity: 'Article',
  fields: [
    {
      name: 'title',
      description: 'Article headline',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'Breaking News'
    },
    {
      name: 'category',
      description: 'Article category',
      fieldType: 'CLASSIFICATION',
      categories: [
        { title: 'Technology', definition: 'Tech news and updates' },
        { title: 'Business', definition: 'Business and finance' },
        { title: 'Sports', definition: 'Sports coverage' }
      ]
    }
  ]
});

Metadata Fields (Raw Content)

Extract raw page content alongside structured data:

const schema = await client.schema.create({
  name: 'Article with Raw Content',
  entity: 'Article',
  fields: [
    {
      name: 'title',
      description: 'Article headline',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'Latest News'
    },
    {
      name: 'rawMarkdown',
      description: 'Page content as Markdown',
      fieldType: 'METADATA',
      metadataKey: 'MARKDOWN'
    },
    {
      name: 'rawHtml',
      description: 'Page HTML source',
      fieldType: 'METADATA',
      metadataKey: 'HTML'
    },
    {
      name: 'pageUrl',
      description: 'Page URL',
      fieldType: 'METADATA',
      metadataKey: 'PAGE_URL'
    }
  ]
});

Available options: HTML • MARKDOWN • PAGE_URL

Quick Start

Build with UI

Build with SDK/API

Integrations

Working with Schemas

Reusable Schemas

Schema Management API

When to Use Saved Schemas

Create a Schema

Get a Schema

Delete a Schema

Use a Saved Schema

Field Types

Available Data Types

Classification Fields

Metadata Fields (Raw Content)

Quick Start

Build with UI

Build with SDK/API

Integrations

​Working with Schemas

​Reusable Schemas

​Schema Management API

​When to Use Saved Schemas

​Create a Schema

​Get a Schema

​Delete a Schema

​Use a Saved Schema

​Field Types

​Available Data Types

​Classification Fields

​Metadata Fields (Raw Content)

Working with Schemas

Reusable Schemas

Schema Management API

When to Use Saved Schemas

Create a Schema

Get a Schema

Delete a Schema

Use a Saved Schema

Field Types

Available Data Types

Classification Fields

Metadata Fields (Raw Content)