Data validation lets you define rules that detect anomalies or unexpected results after a workflow run. Use it to detect missing values, outliers, and schema issues before downstream use.
Rule States
Rules are organized by state throughout their lifecycle:
| State | Description |
|---|
PREVIEW | Suggested rule awaiting review and approval |
ENABLED | Active rule generating validation issues |
DISABLED | Inactive rule (manually disabled or auto-disabled due to schema changes) |
Rules are automatically disabled by Kadoa when schema changes would cause them to error.
Issue Status Indicators
When viewing validation results, issues are marked with status indicators:
| Status | Description |
|---|
NEW | First time the issue appears |
RESOLVED | Issue no longer present |
Summary chips show change since previous run: +n new issues, –n resolved.
Rule Structure
Validation rules are expressed as SQL WHERE clauses that identify problematic rows:
-- Check email formats are valid
WHERE email NOT REGEXP '^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'
-- All prices should be positive
WHERE price <= 0 OR price IS NULL
-- Product URLs should contain the domain
WHERE url NOT LIKE '%example.com%'
-- Check that publication dates are not in the future
WHERE publication_date > CURRENT_DATE()
Key Fields
Defining key fields lets Kadoa track rows across runs for richer insights.
Requirements
- Values should be present for most rows
- Values should be unique per row (no duplicates)
- Prefer stable identifiers (e.g., product ID, URL, SKU)
How to Pick Key Fields
- If a row cannot be matched via the key, it is treated as a new row
- Common key fields:
id, url, link, sku, product_id
Key-Based Insights
When key fields are set, the validation report shows change indicators between runs:
+n: new issues discovered since the previous run
-n: issues resolved since the previous run
Individual issues are labeled as “new” or “resolved” when applicable.
Validation Results Structure
The validation results include:
{
"workflowId": "workflow-123",
"runId": "run-456",
"issues": [
{
"ruleId": "rule-789",
"ruleName": "Valid email format",
"status": "NEW",
"affectedRows": 5,
"rows": [...]
}
],
"summary": {
"totalIssues": 12,
"newIssues": 5,
"resolvedIssues": 2
}
}
Rule Approval
Rules are created in PREVIEW status when:
- Auto-suggested after a workflow run
- Generated on-demand via the “Suggest Rules” feature
- Created via the SDK or API with preview status
Preview rules must be approved before they detect validation issues. Approval transitions rules from PREVIEW to ENABLED status.
You can approve rules:
- In the UI: Select rules and click “Approve” (see UI guide)
- Via SDK: Use
bulkApproveRules() method (see SDK guide)
- Via API: Call the bulk approve endpoint (see API reference)
Rule Deletion
Rules can be permanently deleted when they are no longer needed. You can delete rules individually or in bulk.
You can delete rules:
- In the UI: Select rules and click “Delete” (see UI guide)
- Via SDK: Use
bulkDeleteRules() method (see SDK guide)
- Via API: Call the bulk delete endpoint (see API reference)
Deleting rules is permanent. Consider disabling rules instead if you may need them later.
Rule Execution
- Validation is executed at the end of each pipeline run
- Preview rules require approval before detecting validation issues
- Changes take effect on the next run
- Invalid rules auto-disable when schema changes break them
Learn More