Advanced Workflows are currently available only for Enterprise customers. To upgrade your account or request a demo, get in touch with our sales team.

Overview

Kadoa’s Advanced Workflows allow you to set up fully customizable ETL (Extract, Transform, Load) pipelines for unstructured data, no matter the source.

Our intuitive Workflow Builder lets you visually define workflows by connecting configurable modules for data sources, processing actions, and destinations.


Core Concepts and Components

An advanced workflow consists of three main components:

1. Data Sources

Kadoa provides an extensive range of data sources for unstructured data extraction:

OperatorDescriptionExample Use Cases
WebsitesCrawl and scrape websites with automatic content discovery and structuring.Any web data-related workflows.
CSV Batch ProcessingBulk extract data from URLs or file paths listed in CSV files.Mass-processing filings, documents, or structured URL lists.
DocumentsUpload filings, regulatory reports, and other documents for processingEarnings reports, regulatory data ingestion (e.g., EDGAR filings).
Email IntegrationExtract content from email attachments and message bodies.Ingesting analyst reports, financial statements received by email.

2. Data Actions

Refine and structure the extracted data with built-in transformation operators:

OperatorDescriptionExample Use Cases
ExtractionExtract data from websites or documents based on a defined data schema.Extract data from any unstructured input source.
TransformationFormat, normalize, and restructure extracted data.Removing noise, standardizing dates and values.
Data ValidationSet validation rules to ensure high-quality, consistent outputs.Compliance verification, data integrity checks.
Enrichment & MappingCombine external mappings and enrich data from multiple sources.Mapping tickers and securities, adding computed fields.
ClassificationAuto-categorize documents based on content or metadata.Automatic classification of filings, news articles, and transcripts.
Filtering & DeduplicationKeep or remove data meeting specific criteria.Filtering specific company filings, removing duplicate data.

3. Data Destinations

Easily deliver processed data to your internal tools, teams, and systems:

OperatorDescriptionExample Use Cases
File WriterExport data to JSON, CSV, XLSX files.Direct integration into internal databases and data warehouses.
Pre-Build ConnectorsConnect to popular databases, file storage, and applications.Direct integration into internal databases and data warehouses.
API IntegrationsAutomatically send structured data to your internal or third-party APIs.Real-time integration into analytics platforms.
Notifications & WebhooksNotify internal teams or trigger automations instantly with webhook triggers.Alert teams about filing updates, detection of changes, or data availability.
WebsocketsNotify your systems about data updates or relevant changes in real-time.Real-time monitoring

Building Your First Advanced Workflow

Below is an example of how to automatically pull PDF documents from a target website, extract structured content from these PDFs, and run this workflow weekly.

Step 1: Configure Data Sources

  • Open the Kadoa Workflow Builder.

  • Select the “Website Scraper” operator as your Data Source.

  • Enter the target URL of the website.

  • Configure the scraper to extract the PDF links in the data editor.

Step 2: Add PDF Extraction

  • Click “Add Action” and choose “Extract PDF”.

  • Map the PDF links collected from Step 1 directly into this PDF extraction step.

  • Specify the type of structured data fields you require (e.g., financial metrics, company details, dates).

Step 3: Apply Transformations (Optional)

  • If necessary, add transformation actions to clean, format, or enrich the extracted data.

Step 4: Configure the Destination

  • Choose the desired data destination.

  • Configure the destination operator.

Step 5: Schedule Workflow to Run Automatically Weekly

  • Click “Repeat & monitor”.

  • Set the frequency to “Weekly” (e.g., every Monday at 08:00 AM).

  • Enable automated monitoring and notifications to alert your teams on data updates after each run.

Step 6: Execute and Confirm Your Workflow

  • Click “Run now” to execute a sample run of your configured ETL workflow.

  • Confirm results and outputs. If correct, activate your scheduled workflow to run for the whole dataset.


Get Help & Guidance

Need assistance building or optimizing your advanced workflows? We provide dedicated enterprise onboarding and personalized support: