Introduction
Advanced Workflows are currently available only for Enterprise customers. To upgrade your account or request a demo, get in touch with our sales team.
Overview
Kadoa’s Advanced Workflows allow you to set up fully customizable ETL (Extract, Transform, Load) pipelines for unstructured data, no matter the source.
Our intuitive Workflow Builder lets you visually define workflows by connecting configurable modules for data sources, processing actions, and destinations.
Core Concepts and Components
An advanced workflow consists of three main components:
1. Data Sources
Kadoa provides an extensive range of data sources for unstructured data extraction:
Operator | Description | Example Use Cases |
---|---|---|
Websites | Crawl and scrape websites with automatic content discovery and structuring. | Any web data-related workflows. |
CSV Batch Processing | Bulk extract data from URLs or file paths listed in CSV files. | Mass-processing filings, documents, or structured URL lists. |
Documents | Upload filings, regulatory reports, and other documents for processing | Earnings reports, regulatory data ingestion (e.g., EDGAR filings). |
Email Integration | Extract content from email attachments and message bodies. | Ingesting analyst reports, financial statements received by email. |
2. Data Actions
Refine and structure the extracted data with built-in transformation operators:
Operator | Description | Example Use Cases |
---|---|---|
Extraction | Extract data from websites or documents based on a defined data schema. | Extract data from any unstructured input source. |
Transformation | Format, normalize, and restructure extracted data. | Removing noise, standardizing dates and values. |
Data Validation | Set validation rules to ensure high-quality, consistent outputs. | Compliance verification, data integrity checks. |
Enrichment & Mapping | Combine external mappings and enrich data from multiple sources. | Mapping tickers and securities, adding computed fields. |
Classification | Auto-categorize documents based on content or metadata. | Automatic classification of filings, news articles, and transcripts. |
Filtering & Deduplication | Keep or remove data meeting specific criteria. | Filtering specific company filings, removing duplicate data. |
3. Data Destinations
Easily deliver processed data to your internal tools, teams, and systems:
Operator | Description | Example Use Cases |
---|---|---|
File Writer | Export data to JSON, CSV, XLSX files. | Direct integration into internal databases and data warehouses. |
Pre-Build Connectors | Connect to popular databases, file storage, and applications. | Direct integration into internal databases and data warehouses. |
API Integrations | Automatically send structured data to your internal or third-party APIs. | Real-time integration into analytics platforms. |
Notifications & Webhooks | Notify internal teams or trigger automations instantly with webhook triggers. | Alert teams about filing updates, detection of changes, or data availability. |
Websockets | Notify your systems about data updates or relevant changes in real-time. | Real-time monitoring |
Building Your First Advanced Workflow
Below is an example of how to automatically pull PDF documents from a target website, extract structured content from these PDFs, and run this workflow weekly.
Step 1: Configure Data Sources
-
Open the Kadoa Workflow Builder.
-
Select the “Website Scraper” operator as your Data Source.
-
Enter the target URL of the website.
-
Configure the scraper to extract the PDF links in the data editor.
Step 2: Add PDF Extraction
-
Click “Add Action” and choose “Extract PDF”.
-
Map the PDF links collected from Step 1 directly into this PDF extraction step.
-
Specify the type of structured data fields you require (e.g., financial metrics, company details, dates).
Step 3: Apply Transformations (Optional)
- If necessary, add transformation actions to clean, format, or enrich the extracted data.
Step 4: Configure the Destination
-
Choose the desired data destination.
-
Configure the destination operator.
Step 5: Schedule Workflow to Run Automatically Weekly
-
Click “Repeat & monitor”.
-
Set the frequency to “Weekly” (e.g., every Monday at 08:00 AM).
-
Enable automated monitoring and notifications to alert your teams on data updates after each run.
Step 6: Execute and Confirm Your Workflow
-
Click “Run now” to execute a sample run of your configured ETL workflow.
-
Confirm results and outputs. If correct, activate your scheduled workflow to run for the whole dataset.
Get Help & Guidance
Need assistance building or optimizing your advanced workflows? We provide dedicated enterprise onboarding and personalized support:
-
Contact Enterprise Sales for demos, quotes, and onboarding assistance.
-
Contact Technical Support for workflow assistance or technical guidance.