Skip to main content
Schemas define the structure of data you want to extract from websites. They specify field names, data types, and descriptions. Schemas can be reused across multiple workflows. For example, if you need to extract store locations from 5 different websites, you can create one schema and use it for all of them.

Managing Schemas

Schemas can be created and managed in two ways:

Data Types

When defining schemas, you specify the data type for each field to ensure accurate extraction and validation. Kadoa supports the following data types:
Data TypeDescriptionExample Use Cases
STRINGString/text contentProduct names, descriptions, article headlines
NUMBERNumeric values (integers, decimals)Quantities, ratings, scores, counts
BOOLEANTrue/false valuesAvailability status, feature flags, yes/no indicators
DATEDate valuesPublication dates, deadlines, event dates
DATETIMEDate and time valuesTimestamps, scheduled times, last updated
MONEYCurrency and monetary valuesPrices, costs, revenue, discounts
IMAGEImage URLs and referencesProduct photos, thumbnails, profile pictures
LINKURLs and hyperlinksProduct pages, external links, social media
OBJECTNested/complex JSON structuresStructured metadata, complex configurations
ARRAYLists/arrays of valuesTags, categories, multiple images, feature lists
Choose the appropriate data type to ensure your data is extracted and validated correctly.

Data Type Formats

Some data types return structured values:
Data TypeFormatExample
MONEY{"amount": number, "currencyCode": string}$124.50{"amount": 12450, "currencyCode": "USD"}
The amount field is always in the smallest currency unit (e.g., cents for USD, pence for GBP).

Special Field Types

Beyond regular data fields, Kadoa supports special field types for advanced use cases:

Classification Fields

Automatically categorize content into predefined labels. Useful for:
  • Sentiment analysis (Positive/Negative/Neutral)
  • Content categorization (Technology/Business/Sports)
  • Priority classification (High/Medium/Low)
Learn more about classification in the SDK →

Metadata Fields (Raw Content)

Extract raw page content in different formats:
  • HTML - Raw HTML source code
  • MARKDOWN - Markdown formatted content
  • PAGE_URL - Page URL
Learn more about metadata fields in the SDK → Need help creating a custom schema? Contact our support team for assistance.