Crawl all accessible subpages of a website and convert them into structured JSON or markdown:Documentation Index
Fetch the complete documentation index at: https://docs.kadoa.com/llms.txt
Use this file to discover all available pages before exploring further.
- Initiating crawl sessions with single or multiple URLs
- Checking crawl status
- Listing crawled pages
- Accessing page content
Prerequisites
- Kadoa account with API key
- SDK installed:
npm install @kadoa/node-sdkoruv add kadoa-sdk
1. Start a Crawl
Start a crawl session with a single URL or multiple URLs from the same domain. View full API reference →Multiple URLs
Crawl from multiple entry points on the same domain:When using
startUrls, all URLs must be from the same domain or subdomain. You can mix www.example.com and shop.example.com, but not example.com and different-site.com.2. Check Crawl Status
Monitor crawl progress to know when extraction is complete. View full API reference →3. List Crawled Pages
Get a paginated list of crawled pages with their statuses. View full API reference →DONE, CRAWLING, PENDING
4. Retrieve Page Content
Get page content in markdown (LLM-ready) or HTML format. View full API reference →md (markdown), html
Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | - | Single URL to crawl (use this or startUrls) |
startUrls | string[] | - | Multiple URLs to crawl (use this or url) |
maxDepth | number | - | Maximum crawl depth from entry points |
maxPages | number | - | Maximum pages to crawl |
maxMatches | number | - | Stop after N matched pages (with blueprint) |
pathsFilterIn | string[] | - | Regex patterns to include paths |
pathsFilterOut | string[] | - | Regex patterns to exclude paths |
proxyType | string | null | Proxy type: "dc" (datacenter) or "residential" |
proxyCountry | string | - | ISO country code for proxy location |
concurrency | number | 20 | Number of parallel crawlers |
timeout | number | - | Request timeout in milliseconds |
strictDomain | boolean | true | Stay within the same domain |
loadImages | boolean | true | Load images during crawl |
callbackUrl | string | - | Webhook URL for completion notification |
Artifact Options
| Parameter | Type | Default | Description |
|---|---|---|---|
screenshot | boolean | false | Capture page screenshots |
screenshotFull | boolean | false | Capture full-page screenshots |
archivePdf | boolean | false | Generate PDF archives |
Error Handling
| Error | Cause | Solution |
|---|---|---|
| 401 Unauthorized | Invalid API key | Verify API key in dashboard |
| 402 Payment Required | Insufficient credits | Top up account credits |
| 404 Not Found | Invalid session or page ID | Verify ID exists |
| 429 Too Many Requests | Rate limit exceeded | Reduce request frequency |
Next Steps
- Set up webhooks for crawl completion notifications
- API Reference →