Skip to main content
POST
/
v4
/
crawl
/
Start crawl
curl --request POST \
  --url https://api.kadoa.com/v4/crawl/ \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "url": "<string>",
  "startUrls": [
    "<string>"
  ],
  "pathsFilterIn": [
    "<string>"
  ],
  "pathsFilterOut": [
    "<string>"
  ],
  "proxyType": "<string>",
  "proxyCountry": "<string>",
  "timeout": 2,
  "maxDepth": 2,
  "maxPages": 2,
  "maxMatches": 2,
  "concurrency": 2,
  "strictDomain": true,
  "loadImages": true,
  "safeMode": false,
  "callbackUrl": "<string>",
  "processDuringCrawl": true,
  "crawlMethod": {
    "type": "<string>"
  },
  "matchThreshold": 0.5,
  "blueprint": [
    {
      "name": "<string>",
      "description": "<string>",
      "selector": "<string>",
      "type": "<string>"
    }
  ],
  "extractionOptions": {
    "extractions": {},
    "schema": "<any>",
    "entity": "<any>",
    "mainContextSelector": "<string>",
    "xhrExtractorConfigs": [
      "<any>"
    ]
  },
  "navigationOptions": {
    "browserActions": [
      {}
    ],
    "preBrowserActions": [
      {}
    ],
    "scrollHtml": true,
    "scrollHtmlTimeout": 1,
    "visualHtml": true,
    "navigationStrategy": "<string>",
    "navigationStrategies": [
      "<string>"
    ],
    "limit": 1,
    "disableNavigation": true,
    "ignoreIframes": true,
    "navigationExploration": {},
    "loadHtmlOnly": true,
    "acceptCookies": true,
    "cachedCookieAccept": true
  },
  "artifactOptions": {
    "screenshot": true,
    "screenshotFull": true,
    "screenshotCache": true,
    "screenshotPublic": true,
    "screenshotLink": "<string>",
    "archivePdf": true
  }
}'
{
  "message": "<string>",
  "sessionId": "<string>",
  "configId": "<string>",
  "error": "<string>"
}

Authorizations

x-api-key
string
header
required

API key for authentication

Body

application/json

Body

Schema for starting a crawling session with support for both single URL and multiple URLs

url
string<uri>

Single URL to start crawling (for backward compatibility)

startUrls
string<uri>[]

List of URLs for crawling

pathsFilterIn
string[]

Regex patterns to include specific paths

pathsFilterOut
string[]

Regex patterns to exclude specific paths

proxyType
string | null

Type of proxy to use

proxyCountry
string | null

Country for proxy selection

timeout
number

Timeout in milliseconds

Required range: x >= 1
maxDepth
number

Maximum crawling depth

Required range: x >= 1
maxPages
number

Maximum number of pages to crawl

Required range: x >= 1
maxMatches
number

Maximum number of matched pages to crawl before stopping

Required range: x >= 1
concurrency
number

Number of concurrent crawlers

Required range: x >= 1
strictDomain
boolean
default:true

Whether to stay within the same domain

loadImages
boolean
default:true

Whether to load images during crawling

safeMode
boolean
default:false

Enable safe mode for crawling

callbackUrl
string<uri> | null

Webhook URL for completion notifications

processDuringCrawl
boolean

Whether to run preprocessing and extraction during the crawl phase

crawlMethod
object

Crawl method configuration

matchThreshold
number

Match threshold override for blueprint filtering

Required range: 0 <= x <= 1
blueprint
object[]

Blueprint fields applied during crawling

extractionOptions
object

Extraction-related options derived from legacy launch summary

navigationOptions
object

Navigation-related options derived from legacy launch summary

artifactOptions
object

Artifact capture options derived from legacy launch summary

Response

200

Response schema for starting a crawling session

message
string
required

Response message

sessionId
string
required

Session ID

error
string | null
required

Error message if any

configId
string

Config ID (included when creating config and starting session)