Data validation is a critical feature that automatically checks the quality of your extracted data and identifies potential issues before they impact your downstream processes. Kadoa’s validation system helps you maintain high data quality standards and catch problems early.

Why Data Validation Matters

Web scraping can be unpredictable. Websites change their structure, content varies, and network issues can affect data extraction. Without validation, you might not notice when:

  • Missing data: Important fields become empty
  • Format changes: Prices switch from numbers to text
  • Duplicate content: Same data appears multiple times
  • Schema drift: New fields appear or existing ones disappear
  • Quality degradation: Data becomes incomplete or corrupted

Getting Started with Data Validation

Step 1: Enable Validation for Your Workflow

  1. Navigate to your workflow in the Kadoa dashboard
  2. Open the Data Quality panel on the right side of your screen

Data Quality panel location on the right sidebar

  1. Toggle “Enable data validation” checkbox
  2. You’ll see a notification that changes take effect after the next workflow run

Data Quality panel with validation enabled showing toggle and notification

💡 Pro Tip: If you make changes to validation settings, click the “Run Now” button to see results immediately.

Step 2: Set Up Validation Rules

Once validation is enabled, switch to the Rules tab in the Data Quality panel. You have two ways to create rules:

  1. Click “Suggest rules” button
  2. Kadoa analyzes your actual data and creates rules for common issues like:
    • Missing or empty values
    • Invalid email formats
    • Inconsistent data types
    • Unusual value ranges

Suggested rules interface showing generated validation rules

Option B: Create Custom Rules

  1. Click “Add new” button
  2. Select columns you want to validate using checkboxes
  3. Describe your rule in plain language, such as:
    • “Prices must be positive numbers”
    • “Email addresses must be valid format”
    • “Phone numbers should be 10 digits”
  4. Click “Generate rule” and Kadoa will create the validation logic

Rule code popup showing the actual validation logic

Managing Your Rules

  • View rule details: Click the code icon to see the actual validation logic
  • Delete rules: Use the trash icon to remove rules you no longer need
  • Historical view: When viewing past runs, rules are shown as read-only

Data Quality Showing rule detail

Step 3: Review Validation Results

After your workflow runs, switch to the Report tab to see validation results:

Data Quality Report tab showing validation results and anomalies

Understanding Scores

Each rule shows a percentage score with color coding:

  • 🟢 100%: Perfect - no issues found
  • 🟡 80-99%: Good - minor issues detected
  • 🟠 60-79%: Warning - moderate issues found
  • 🔴 Below 60%: Critical - significant data quality problems

Investigating Issues

  1. Click on any rule with less than 100% to expand details
  2. View specific anomalies - see exactly which data points failed validation
  3. Click on anomalies to see the full row details
  4. Scroll through results - large datasets are paginated automatically

Step 4: Monitor Data Quality Over Time

In Your Data Table

  • Anomaly indicators appear directly on problematic data points
  • Visual highlighting makes issues easy to spot at a glance
  • Click indicators to see why the data was flagged

Data record detail view showing anomaly indicators and explanations

Ongoing Monitoring

  • Check scores regularly - dropping scores often indicate upstream changes
  • Adjust rules as needed - business requirements evolve over time
  • Run validation after website changes - ensure your data quality remains high

Common Questions & Troubleshooting

”My validation scores are low - what should I do?”

  1. Review the specific anomalies by expanding failed rules in the Report tab
  2. Check if the issues are real problems or false positives
  3. Adjust rules if they’re too strict for your use case
  4. Delete rules that don’t match your business requirements

”I want different validation rules for different data”

  1. Use custom rules with specific column selections
  2. Describe precise requirements in the rule description
  3. Create multiple rules targeting different aspects of your data

”Validation is taking too long”

  • Validation typically completes within seconds
  • Large datasets (10k+ rows) may take up to 10 minutes
  • Contact support if validation consistently times out

”I don’t see validation results”

  • Ensure validation is enabled in the Data Quality panel
  • Run your workflow after enabling validation
  • Check the Report tab after the workflow completes