curl --request POST \
--url https://api.kadoa.com/v4/crawl/ \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"url": "<string>",
"startUrls": [
"<string>"
],
"pathsFilterIn": [
"<string>"
],
"pathsFilterOut": [
"<string>"
],
"proxyType": "<string>",
"proxyCountry": "<string>",
"timeout": 2,
"maxDepth": 2,
"maxPages": 2,
"maxMatches": 2,
"concurrency": 2,
"strictDomain": true,
"loadImages": true,
"safeMode": false,
"callbackUrl": "<string>",
"processDuringCrawl": true,
"crawlMethod": {
"type": "<string>"
},
"matchThreshold": 0.5,
"blueprint": [
{
"name": "<string>",
"description": "<string>",
"selector": "<string>",
"type": "<string>"
}
],
"extractionOptions": {
"extractions": {},
"schema": "<unknown>",
"entity": "<unknown>",
"mainContextSelector": "<string>",
"xhrExtractorConfigs": [
"<unknown>"
]
},
"navigationOptions": {
"browserActions": [
{}
],
"preBrowserActions": [
{}
],
"scrollHtml": true,
"scrollHtmlTimeout": 1,
"visualHtml": true,
"navigationStrategy": "<string>",
"navigationStrategies": [
"<string>"
],
"limit": 1,
"disableNavigation": true,
"ignoreIframes": true,
"navigationExploration": {},
"loadHtmlOnly": true,
"acceptCookies": true,
"cachedCookieAccept": true
},
"artifactOptions": {
"screenshot": true,
"screenshotFull": true,
"screenshotCache": true,
"screenshotPublic": true,
"screenshotLink": "<string>",
"archivePdf": true
},
"rawMode": true,
"outputOptions": {
"includeHtml": true,
"includeMarkdown": true,
"includeScreenshots": true,
"includeJson": true
},
"jobId": "<string>",
"dataKey": "<string>",
"billingSource": "<string>"
}
'{
"message": "<string>",
"sessionId": "<string>",
"error": "<string>",
"configId": "<string>"
}Create a crawling configuration and start a session in one operation (equivalent to v4/crawl)
curl --request POST \
--url https://api.kadoa.com/v4/crawl/ \
--header 'Content-Type: application/json' \
--header 'x-api-key: <api-key>' \
--data '
{
"url": "<string>",
"startUrls": [
"<string>"
],
"pathsFilterIn": [
"<string>"
],
"pathsFilterOut": [
"<string>"
],
"proxyType": "<string>",
"proxyCountry": "<string>",
"timeout": 2,
"maxDepth": 2,
"maxPages": 2,
"maxMatches": 2,
"concurrency": 2,
"strictDomain": true,
"loadImages": true,
"safeMode": false,
"callbackUrl": "<string>",
"processDuringCrawl": true,
"crawlMethod": {
"type": "<string>"
},
"matchThreshold": 0.5,
"blueprint": [
{
"name": "<string>",
"description": "<string>",
"selector": "<string>",
"type": "<string>"
}
],
"extractionOptions": {
"extractions": {},
"schema": "<unknown>",
"entity": "<unknown>",
"mainContextSelector": "<string>",
"xhrExtractorConfigs": [
"<unknown>"
]
},
"navigationOptions": {
"browserActions": [
{}
],
"preBrowserActions": [
{}
],
"scrollHtml": true,
"scrollHtmlTimeout": 1,
"visualHtml": true,
"navigationStrategy": "<string>",
"navigationStrategies": [
"<string>"
],
"limit": 1,
"disableNavigation": true,
"ignoreIframes": true,
"navigationExploration": {},
"loadHtmlOnly": true,
"acceptCookies": true,
"cachedCookieAccept": true
},
"artifactOptions": {
"screenshot": true,
"screenshotFull": true,
"screenshotCache": true,
"screenshotPublic": true,
"screenshotLink": "<string>",
"archivePdf": true
},
"rawMode": true,
"outputOptions": {
"includeHtml": true,
"includeMarkdown": true,
"includeScreenshots": true,
"includeJson": true
},
"jobId": "<string>",
"dataKey": "<string>",
"billingSource": "<string>"
}
'{
"message": "<string>",
"sessionId": "<string>",
"error": "<string>",
"configId": "<string>"
}API key for authentication
Body
Schema for starting a crawling session with support for both single URL and multiple URLs
Single URL to start crawling (for backward compatibility)
List of URLs for crawling
1Regex patterns to include specific paths
Regex patterns to exclude specific paths
Type of proxy to use
Country for proxy selection
Timeout in milliseconds
x >= 1Maximum crawling depth
x >= 1Maximum number of pages to crawl
x >= 1Maximum number of matched pages to crawl before stopping
x >= 1Number of concurrent crawlers
x >= 1Whether to stay within the same domain
Whether to load images during crawling
Enable safe mode for crawling
Webhook URL for completion notifications
Whether to run preprocessing and extraction during the crawl phase
Crawl method configuration
Show child attributes
Match threshold override for blueprint filtering
0 <= x <= 1Blueprint fields applied during crawling
Show child attributes
Extraction-related options derived from legacy launch summary
Show child attributes
Navigation-related options derived from legacy launch summary
Show child attributes
Artifact capture options derived from legacy launch summary
Show child attributes
Whether this is a raw data mode crawl
Output options for raw mode
Show child attributes
Internal: Job ID for workflow tracking
Internal: Data key for Parquet storage path
Internal: Billing source identifier