Skip to main content

Introduction

Data validation lets you define rules that detect anomalies or unexpected results after a workflow run. Use it to detect missing values, outliers, and schema issues before downstream use.

Prerequisites

  • A completed workflow run with preview data (Kadoa will automatically suggest initial validation rules)

Configure validation

  1. Open the workflow in your dashboard.
  2. In the sidebar, select Issues. You will see 2 tabs there:
    • Rules: Create, view, suggest, and delete validation rules.
    • Results: Review issues found by rules for the selected run.
  3. Select the Rules tab.

Issues Rules sidebar navigation

  1. Choose one of the following:
    • Add new Rule: Manually choose a target column, define a condition, and (optionally) add domain hints for better precision.
    • Suggest Rules: Kadoa auto-suggests rules from your schema and sample data.

Rule creation dialog

Manual rules often provide the most precise results when you know the domain.

Suggested rules

Kadoa suggests validation rules in two ways:
  • Automatically: After preview run, Kadoa analyzes your data and suggests relevant rules
  • On demand: Click “AI suggest rules” to generate additional suggestions

Suggested rules with bulk actions

Suggested rules require approval before activation. Select rules using checkboxe, then approve or delete in bulk.
On-demand rule generation doesn’t yet consider existing rules and may suggest overlapping validations. Review suggestions carefully before approval.

Rule execution

  • Validation is executed at the end of each subsequent pipeline run
  • Preview rules require approval before detecting validation issues
  • Changes take effect on the next run
  • Invalid rules auto-disable when schema changes or they would result into errors

Working with rules

Rule operations

  • Create manually: Select target columns and describe the rule in natural language
  • Generate suggestions: Auto-generate common rules based on data types and sample data
  • View SQL: All rules expose raw SQL for transparency
  • Bulk actions: Select multiple rules to approve or delete at once
  • Auto-disable: Rules are automatically disabled by Kadoa when schema changes or they would result into errors

Rule states

Preview and active rules sections

Rules are organized by state:
  • Suggested Rules: Preview rules awaiting approval
  • Active and Disabled Rules: Enabled and disabled rules generating validation issues

Example rules

Here are examples of natural language inputs and their generated SQL validation rules:
-- Natural language: "Check email formats are valid"
WHERE email NOT REGEXP '^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'

-- Natural language: "All prices should be positive"  
WHERE price <= 0 OR price IS NULL

-- Natural language: "Product URLs should contain the domain"
WHERE url NOT LIKE '%example.com%'

-- Natural language: "Check that publication dates are not in the future"
WHERE publication_date > CURRENT_DATE()

Raw SQL view for a rule

  • Historical runs: When viewing a past run, you’ll see the rules that were in effect at that time (read-only).

Validation report

After a run finishes, go to IssuesReport.
  • See issues grouped by rule.
  • Click an issue to open the row detail and view all issues associated with that row.

Validation report overview

Results view details

  • Filter by rule: Use the filter to focus on specific rules when multiple are present.
  • Row details: Click an item to open the row and see the offending value and related context.

Row detail view with issues

  • Rule states:
    • PREVIEW: suggested rule awaiting review and approval
    • ENABLED: active rule generating validation issues
    • DISABLED: inactive rule
  • Issue status indicators:
    • NEW: first time the issue appears
    • RESOLVED: issue no longer present
  • Summary chips show change since previous run:
    • +n new issues, –n resolved

Key fields (optional)

Defining key fields lets Kadoa track rows across runs for richer insights.
  • Configure in Schema.
  • Pick one or more fields used to match the same row across runs.
  • Requirements: values should be present for most rows and unique per row (no duplicates).

How to pick key fields

  • Prefer stable identifiers (e.g., product ID, URL, SKU).
  • If a row cannot be matched via the key, it is treated as a new row.

Key‑based insights

When key fields are set, the report shows change indicators between runs:
  • +n: new issues discovered since the previous run
  • -n: issues resolved since the previous run
Individual issues are labeled as “new” or “resolved” when applicable.

Validate now

Use the Validate now button to schedule validation for the current workflow’s latest data. This is available when no specific past run is selected.

Notifications

Enable the notification “New data validation issues detected” to be alerted when anomalies increase compared to the previous run.
I