Skip to main content
The Schema Management API allows you to create, retrieve, and delete schemas programmatically. Saved schemas can be reused across multiple extractions, ensuring consistent data structure.

When to Use Saved Schemas

Use saved schemas when you:
  • Extract the same data structure from multiple websites
  • Want to maintain consistent field definitions across workflows
  • Need to programmatically manage schema lifecycle
  • Share schemas across different parts of your application
For one-off extractions, inline schema definitions are simpler and don’t require separate schema management.

Create a Schema

const schema = await client.schema.create({
  name: 'Product Schema',
  entity: 'Product',
  fields: [
    {
      name: 'title',
      description: 'Product name',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'iPhone 15 Pro'
    },
    {
      name: 'price',
      description: 'Product price',
      fieldType: 'SCHEMA',
      dataType: 'MONEY'
    },
    {
      name: 'inStock',
      description: 'Availability',
      fieldType: 'SCHEMA',
      dataType: 'BOOLEAN'
    },
    {
      name: 'rating',
      description: 'Star rating',
      fieldType: 'SCHEMA',
      dataType: 'NUMBER'
    }
  ]
});

console.log('Schema created:', schema.id);

Get a Schema

Retrieve an existing schema by ID:
const schema = await client.schema.get('schema-id-123');

console.log(schema.name);     // 'Product Schema'
console.log(schema.entity);   // 'Product'
console.log(schema.fields);   // Array of field definitions

Delete a Schema

Remove a schema when it’s no longer needed:
await client.schema.delete('schema-id-123');
Deleting a schema does not affect existing workflows or extractions that were created using it.

Use a Saved Schema

Reference a saved schema in your extraction:
const extraction = await client
  .extract({
    urls: ['https://sandbox.kadoa.com/ecommerce'],
    name: 'Product Extraction',
    extraction: { schemaId: schema.id }
  })
  .create();

const result = await extraction.run();

Field Types

Schemas support three types of fields:
  1. Regular fields - Structured data extraction (shown above)
  2. Classification fields - Categorize content into predefined labels
  3. Metadata fields - Extract raw page content (HTML, Markdown, URLs)

Available Data Types

For regular fields, specify the dataType: STRINGNUMBERBOOLEANDATEDATETIMEMONEYIMAGELINKOBJECTARRAY See data type details and examples →

Classification Fields

Categorize extracted content into predefined labels:
const schema = await client.schema.create({
  name: 'Article Schema',
  entity: 'Article',
  fields: [
    {
      name: 'title',
      description: 'Article headline',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'Breaking News'
    },
    {
      name: 'category',
      description: 'Article category',
      fieldType: 'CLASSIFICATION',
      categories: [
        { title: 'Technology', definition: 'Tech news and updates' },
        { title: 'Business', definition: 'Business and finance' },
        { title: 'Sports', definition: 'Sports coverage' }
      ]
    }
  ]
});

Metadata Fields (Raw Content)

Extract raw page content alongside structured data:
const schema = await client.schema.create({
  name: 'Article with Raw Content',
  entity: 'Article',
  fields: [
    {
      name: 'title',
      description: 'Article headline',
      fieldType: 'SCHEMA',
      dataType: 'STRING',
      example: 'Latest News'
    },
    {
      name: 'rawMarkdown',
      description: 'Page content as Markdown',
      fieldType: 'METADATA',
      metadataKey: 'MARKDOWN'
    },
    {
      name: 'rawHtml',
      description: 'Page HTML source',
      fieldType: 'METADATA',
      metadataKey: 'HTML'
    },
    {
      name: 'pageUrl',
      description: 'Page URL',
      fieldType: 'METADATA',
      metadataKey: 'PAGE_URL'
    }
  ]
});
Available options: HTMLMARKDOWNPAGE_URL

Best Practices

  1. Use descriptive names - Make schema names clear and specific (e.g., “E-commerce Product Schema” vs “Schema 1”)
  2. Provide examples - Include example values for STRING fields to improve extraction accuracy
  3. Keep schemas focused - Create separate schemas for different entity types rather than combining them
  4. Version your schemas - Include version numbers in schema names when making significant changes (e.g., “Product Schema v2”)
  5. Document field purposes - Write clear descriptions for each field to help future developers understand the schema

Troubleshooting

Duplicate Schema Names

Each schema name must be unique within your workspace. If you get a duplicate name error:
try {
  await client.schema.create({ name: 'Product Schema', /* ... */ });
} catch (error) {
  if (error.message.includes('duplicate')) {
    // Use a different name or update the existing schema
  }
}

Schema Not Found

Ensure the schema ID is correct and the schema hasn’t been deleted:
try {
  const schema = await client.schema.get('schema-id-123');
} catch (error) {
  console.error('Schema not found or deleted');
}
I