Working with Schemas
Define the structure of data you want to extract using the builder API:Reusable Schemas
For consistent data extraction across multiple workflows, you can create and manage schemas separately using the Schema Management API.Schema Management API
The Schema Management API allows you to create, retrieve, and delete schemas programmatically. Saved schemas can be reused across multiple extractions, ensuring consistent data structure.When to Use Saved Schemas
Use saved schemas when you:- Extract the same data structure from multiple websites
- Want to maintain consistent field definitions across workflows
- Need to programmatically manage schema lifecycle
- Share schemas across different parts of your application
Create a Schema
Get a Schema
Retrieve an existing schema by ID:Delete a Schema
Remove a schema when it’s no longer needed:Deleting a schema does not affect existing workflows or extractions that were created using it.
Update a Schema
Modify an existing schema’s name, entity, or fields:Use a Saved Schema
Reference a saved schema in your extraction:Field Types
Schemas support three types of fields:- Regular fields - Structured data extraction (shown above)
- Classification fields - Categorize content into predefined labels
- Metadata fields - Extract raw page content (HTML, Markdown, URLs)
Available Data Types
For regular fields, specify thedataType:
STRING • NUMBER • BOOLEAN • DATE • DATETIME • MONEY • IMAGE • LINK • OBJECT • ARRAY
See data type details and examples →
Classification Fields
Categorize extracted content into predefined labels:Metadata Fields (Raw Content)
Extract raw page content alongside structured data:HTML • MARKDOWN • PAGE_URL