Start crawl

curl --request POST \
  --url https://api.kadoa.com/v4/crawl/ \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "startUrls": [
    "<string>"
  ],
  "pathsFilterIn": [
    "<string>"
  ],
  "pathsFilterOut": [
    "<string>"
  ],
  "proxyType": "<string>",
  "proxyCountry": "<string>",
  "timeout": 2,
  "maxDepth": 2,
  "maxPages": 2,
  "maxMatches": 2,
  "concurrency": 2,
  "strictDomain": true,
  "loadImages": true,
  "safeMode": false,
  "callbackUrl": "<string>",
  "processDuringCrawl": true,
  "crawlMethod": {
    "type": "<string>"
  },
  "matchThreshold": 0.5,
  "blueprint": [
    {
      "name": "<string>",
      "description": "<string>",
      "selector": "<string>",
      "type": "<string>"
    }
  ],
  "extractionOptions": {
    "extractions": {},
    "schema": "<unknown>",
    "entity": "<unknown>",
    "mainContextSelector": "<string>",
    "xhrExtractorConfigs": [
      "<unknown>"
    ]
  },
  "navigationOptions": {
    "browserActions": [
      {}
    ],
    "preBrowserActions": [
      {}
    ],
    "scrollHtml": true,
    "scrollHtmlTimeout": 1,
    "visualHtml": true,
    "navigationStrategy": "<string>",
    "navigationStrategies": [
      "<string>"
    ],
    "limit": 1,
    "disableNavigation": true,
    "ignoreIframes": true,
    "navigationExploration": {},
    "loadHtmlOnly": true,
    "acceptCookies": true,
    "cachedCookieAccept": true
  },
  "artifactOptions": {
    "screenshot": true,
    "screenshotFull": true,
    "screenshotCache": true,
    "screenshotPublic": true,
    "screenshotLink": "<string>",
    "archivePdf": true
  }
}'

{
  "message": "<string>",
  "sessionId": "<string>",
  "error": "<string>",
  "configId": "<string>"
}

POST

crawl

Start crawl

curl --request POST \
  --url https://api.kadoa.com/v4/crawl/ \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "startUrls": [
    "<string>"
  ],
  "pathsFilterIn": [
    "<string>"
  ],
  "pathsFilterOut": [
    "<string>"
  ],
  "proxyType": "<string>",
  "proxyCountry": "<string>",
  "timeout": 2,
  "maxDepth": 2,
  "maxPages": 2,
  "maxMatches": 2,
  "concurrency": 2,
  "strictDomain": true,
  "loadImages": true,
  "safeMode": false,
  "callbackUrl": "<string>",
  "processDuringCrawl": true,
  "crawlMethod": {
    "type": "<string>"
  },
  "matchThreshold": 0.5,
  "blueprint": [
    {
      "name": "<string>",
      "description": "<string>",
      "selector": "<string>",
      "type": "<string>"
    }
  ],
  "extractionOptions": {
    "extractions": {},
    "schema": "<unknown>",
    "entity": "<unknown>",
    "mainContextSelector": "<string>",
    "xhrExtractorConfigs": [
      "<unknown>"
    ]
  },
  "navigationOptions": {
    "browserActions": [
      {}
    ],
    "preBrowserActions": [
      {}
    ],
    "scrollHtml": true,
    "scrollHtmlTimeout": 1,
    "visualHtml": true,
    "navigationStrategy": "<string>",
    "navigationStrategies": [
      "<string>"
    ],
    "limit": 1,
    "disableNavigation": true,
    "ignoreIframes": true,
    "navigationExploration": {},
    "loadHtmlOnly": true,
    "acceptCookies": true,
    "cachedCookieAccept": true
  },
  "artifactOptions": {
    "screenshot": true,
    "screenshotFull": true,
    "screenshotCache": true,
    "screenshotPublic": true,
    "screenshotLink": "<string>",
    "archivePdf": true
  }
}'

{
  "message": "<string>",
  "sessionId": "<string>",
  "error": "<string>",
  "configId": "<string>"
}

Authorizations

x-api-key

string

header

required

API key for authentication

Body

application/json

Body

Schema for starting a crawling session with support for both single URL and multiple URLs

url

string<uri>

Single URL to start crawling (for backward compatibility)

startUrls

string<uri>[]

List of URLs for crawling

Minimum string length: 1

pathsFilterIn

string[]

Regex patterns to include specific paths

pathsFilterOut

string[]

Regex patterns to exclude specific paths

proxyType

string | null

Type of proxy to use

proxyCountry

string | null

Country for proxy selection

timeout

number

Timeout in milliseconds

Required range: x >= 1

maxDepth

number

Maximum crawling depth

Required range: x >= 1

maxPages

number

Maximum number of pages to crawl

Required range: x >= 1

maxMatches

number

Maximum number of matched pages to crawl before stopping

Required range: x >= 1

concurrency

number

Number of concurrent crawlers

Required range: x >= 1

strictDomain

boolean

default:true

Whether to stay within the same domain

loadImages

boolean

default:true

Whether to load images during crawling

safeMode

boolean

default:false

Enable safe mode for crawling

callbackUrl

string<uri> | null

Webhook URL for completion notifications

processDuringCrawl

boolean

Whether to run preprocessing and extraction during the crawl phase

crawlMethod

object

Crawl method configuration

Show child attributes

crawlMethod.type

string

required

Crawler execution type (e.g. puppeteer, impit)

crawlMethod.{key}

unknown

matchThreshold

number

Match threshold override for blueprint filtering

Required range: 0 <= x <= 1

blueprint

object[]

Blueprint fields applied during crawling

Show child attributes

blueprint.name

string

blueprint.description

string

blueprint.selector

string

blueprint.type

string

blueprint.{key}

unknown

extractionOptions

object

Extraction-related options derived from legacy launch summary

Show child attributes

extractionOptions.extractions

object

Feature toggles for available extraction helpers

Show child attributes

extractionOptions.extractions.{key}

boolean

extractionOptions.schema

any | null

Schema definition for structured extractions

extractionOptions.entity

any | null

Entity metadata used for extraction mappings

extractionOptions.mainContextSelector

string

CSS selector anchoring focus-driven extractors

extractionOptions.xhrExtractorConfigs

null[]

XHR extractor configuration entries

Navigation-related options derived from legacy launch summary

Show child attributes

Ordered list of scripted browser actions executed after navigation

Show child attributes

Browser actions performed before the main action chain

Show child attributes

Enable HTML capture while scrolling through the page

Maximum scroll duration in milliseconds

Required range: x >= 0

Enable visual HTML capture for navigation heuristics

Primary navigation strategy identifier

Fallback navigation strategy order

Maximum number of navigation steps

Required range: x >= 0

Skip automated navigation heuristics

Disable iframe traversal during navigation

Navigation exploration tuning payload

Show child attributes

Skip scripted interactions and only capture HTML

Force cookie acceptance automation

Reuse cached cookie acceptance flow

artifactOptions

object

Artifact capture options derived from legacy launch summary

Show child attributes

artifactOptions.screenshot

boolean

Capture standard screenshot

artifactOptions.screenshotFull

boolean

Capture full-page screenshot

artifactOptions.screenshotCache

boolean

Re-use cached screenshot when available

artifactOptions.screenshotPublic

boolean

Make captured screenshot publicly accessible

artifactOptions.screenshotLink

string

Stored screenshot link for follow-up processing

artifactOptions.archivePdf

boolean

Generate archival PDF for crawled pages

Response

200

Response schema for starting a crawling session

message

string

required

Response message

sessionId

string

required

Session ID

error

string | null

required

Error message if any

configId

string

Config ID (included when creating config and starting session)

Introduction Get session status

⌘I

API Reference

Authorizations

Body

Response