Schemas

Schemas define the structure of data you want to extract from websites. They specify field names, data types, and descriptions. Schemas can be reused across multiple workflows. For example, if you need to extract store locations from 5 different websites, you can create one schema and use it for all of them.

Managing Schemas

Schemas can be created and managed in two ways:

In the UI

Create schemas visually using the schema builder, or select from existing templates.

Via Code

Create and manage schemas programmatically with full control over field definitions.

Data Types

When defining schemas, you specify the data type for each field to ensure accurate extraction and validation. Kadoa supports the following data types:

Data Type	Description	Example Use Cases
STRING	String/text content	Product names, descriptions, article headlines
NUMBER	Numeric values (integers, decimals)	Quantities, ratings, scores, counts
BOOLEAN	True/false values	Availability status, feature flags, yes/no indicators
DATE	Date values	Publication dates, deadlines, event dates
DATETIME	Date and time values	Timestamps, scheduled times, last updated
MONEY	Currency and monetary values	Prices, costs, revenue, discounts
IMAGE	Image URLs and references	Product photos, thumbnails, profile pictures
LINK	URLs and hyperlinks	Product pages, external links, social media
OBJECT	Nested/complex JSON structures	Structured metadata, complex configurations
ARRAY	Lists/arrays of values	Tags, categories, multiple images, feature lists

Choose the appropriate data type to ensure your data is extracted and validated correctly.

Data Type Formats

Some data types return structured values:

Data Type	Format	Example
MONEY	`{"amount": number, "currencyCode": string}`	`$124.50` → `{"amount": 12450, "currencyCode": "USD"}`

The amount field is always in the smallest currency unit (e.g., cents for USD, pence for GBP).

Special Field Types

Beyond regular data fields, Kadoa supports special field types for advanced use cases:

Classification Fields

Automatically categorize content into predefined labels. Useful for:

Sentiment analysis (Positive/Negative/Neutral)
Content categorization (Technology/Business/Sports)
Priority classification (High/Medium/Low)

Learn more about classification in the SDK →

Metadata Fields (Raw Content)

Extract raw page content in different formats:

HTML - Raw HTML source code
MARKDOWN - Markdown formatted content
PAGE_URL - Page URL

Learn more about metadata fields in the SDK → Need help creating a custom schema? Contact our support team for assistance.

Get Started

Security & Compliance

Build with UI

Build with Code

Integrations

Managing Schemas

In the UI

Via Code

Data Types

Data Type Formats

Special Field Types

Classification Fields

Metadata Fields (Raw Content)

Get Started

Security & Compliance

Build with UI

Build with Code

Integrations

​Managing Schemas

In the UI

Via Code

​Data Types

​Data Type Formats

​Special Field Types

​Classification Fields

​Metadata Fields (Raw Content)

Managing Schemas

Data Types

Data Type Formats

Special Field Types

Classification Fields

Metadata Fields (Raw Content)