Introduction

Data validation lets you define rules that detect anomalies or unexpected results after a workflow run. Use it to detect missing values, outliers, and schema issues before downstream use.

Prerequisites

  • A completed workflow run with preview data

Configure validation

  1. Open the workflow in your dashboard.
  2. In the sidebar, select Issues. You will see 3 tabs there:
    • Key columns: Define how rows are tracked across runs.
    • Rules: Create, view, suggest, and delete validation rules.
    • Results: Review issues found by rules for the selected run.
  3. Select the Rules tab.

Issues Rules sidebar navigation

  1. Choose one of the following:
    • Add new Rule: Manually choose a target column, define a condition, and (optionally) add domain hints for better precision.
    • Suggest Rules: Kadoa auto-suggests rules from your schema and sample data.

Rule creation dialog

Manual rules often provide the most precise results when you know the domain.

Rule execution

  • Rules run at the end of each subsequent pipeline run.
  • Adding or removing a rule takes effect on the next run.
  • If data shape changes make a rule invalid, Kadoa disables it automatically.

Working with rules

  • Add rule: Select target columns and describe the rule in natural language. Kadoa generates the rule.
  • Suggest rules: Auto-generate common rules based on data types.
  • Delete rules: Remove a single rule or delete all rules.
  • Disabled rules: Rules can be auto-disabled when they no longer apply. A reason is shown when available.
  • Raw rule code: Rules expose raw SQL for transparency.

Example rules

Here are examples of natural language inputs and their generated SQL validation rules:
-- Natural language: "Check email formats are valid"
WHERE email NOT REGEXP '^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$'

-- Natural language: "All prices should be positive"  
WHERE price <= 0 OR price IS NULL

-- Natural language: "Product URLs should contain the domain"
WHERE url NOT LIKE '%example.com%'

-- Natural language: "Check that publication dates are not in the future"
WHERE publication_date > CURRENT_DATE()

Raw SQL view for a rule

  • Historical runs: When viewing a past run, you’ll see the rules that were in effect at that time (read-only).

Validation report

After a run finishes, go to IssuesReport.
  • See issues grouped by rule.
  • Click an issue to open the row detail and view all issues associated with that row.

Validation report overview

Results view details

  • Filter by rule: Use the filter to focus on specific rules when multiple are present.
  • Row details: Click an item to open the row and see the offending value and related context.

Row detail view with issues

  • Status indicators:
    • NEW: first time the issue appears
    • RESOLVED: issue no longer present
  • Summary chips show change since previous run:
    • +n new issues, –n resolved

Key columns (optional)

Defining key columns lets Kadoa track rows across runs for richer insights.
  • Configure in IssuesKey columns.
  • Pick one or more columns used to match the same row across runs.
  • Requirements: values should be present for most rows and unique per row (no duplicates).

Key columns configuration

How to pick key columns

  • Prefer stable identifiers (e.g., product ID, URL, SKU).
  • If a row cannot be matched via the key, it is treated as a new row.

Key‑based insights

When key columns are set, the report shows change indicators between runs:
  • +n: new issues discovered since the previous run
  • -n: issues resolved since the previous run
Individual issues are labeled as “new” or “resolved” when applicable.

Validate now

Use the Validate now button to schedule validation for the current workflow’s latest data. This is available when no specific past run is selected.

Notifications

Enable the notification “New data validation issues detected” to be alerted when anomalies increase compared to the previous run.