Crawling - Kadoa Documentation

Crawl all accessible subpages of a website and convert them into structured JSON or markdown:

Initiating crawl sessions with single or multiple URLs
Checking crawl status
Listing crawled pages
Accessing page content

Prerequisites

Kadoa account with API key
SDK installed: npm install @kadoa/node-sdk or uv add kadoa-sdk

1. Start a Crawl

Start a crawl session with a single URL or multiple URLs from the same domain. View full API reference →

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({ apiKey: 'YOUR_API_KEY' });

const result = await client.crawler.session.start({
  url: "https://demo.vercel.store/",
  maxDepth: 10,
  maxPages: 50,
});

console.log(result.sessionId);

Multiple URLs

Crawl from multiple entry points on the same domain:

const result = await client.crawler.session.start({
  startUrls: [
    "https://demo.vercel.store/",
    "https://demo.vercel.store/collections",
    "https://demo.vercel.store/about",
  ],
  maxDepth: 10,
  maxPages: 50,
});

When using startUrls, all URLs must be from the same domain or subdomain. You can mix www.example.com and shop.example.com, but not example.com and different-site.com.

2. Check Crawl Status

Monitor crawl progress to know when extraction is complete. View full API reference →

const status = await client.crawler.session.getSessionStatus(sessionId);

console.log(status.payload.crawledPages);
console.log(status.payload.finished);

3. List Crawled Pages

Get a paginated list of crawled pages with their statuses. View full API reference →

const pages = await client.crawler.session.getPages(sessionId, {
  currentPage: 1,
  pageSize: 100,
});

for (const page of pages.payload) {
  console.log(page.id, page.url, page.status);
}

Page statuses: DONE, CRAWLING, PENDING

4. Retrieve Page Content

Get page content in markdown (LLM-ready) or HTML format. View full API reference →

// Get as markdown
const markdown = await client.crawler.session.getPage(sessionId, pageId, {
  format: "markdown",
});

console.log(markdown.payload);

// Get as HTML
const html = await client.crawler.session.getPage(sessionId, pageId, {
  format: "html",
});

Supported formats: md (markdown), html

Configuration Options

Parameter	Type	Default	Description
`url`	string	-	Single URL to crawl (use this or `startUrls`)
`startUrls`	string[]	-	Multiple URLs to crawl (use this or `url`)
`maxDepth`	number	-	Maximum crawl depth from entry points
`maxPages`	number	-	Maximum pages to crawl
`maxMatches`	number	-	Stop after N matched pages (with blueprint)
`pathsFilterIn`	string[]	-	Regex patterns to include paths
`pathsFilterOut`	string[]	-	Regex patterns to exclude paths
`proxyType`	string	null	Proxy type: `"dc"` (datacenter) or `"residential"`
`proxyCountry`	string	-	ISO country code for proxy location
`concurrency`	number	20	Number of parallel crawlers
`timeout`	number	-	Request timeout in milliseconds
`strictDomain`	boolean	true	Stay within the same domain
`loadImages`	boolean	true	Load images during crawl
`callbackUrl`	string	-	Webhook URL for completion notification

Artifact Options

Parameter	Type	Default	Description
`screenshot`	boolean	false	Capture page screenshots
`screenshotFull`	boolean	false	Capture full-page screenshots
`archivePdf`	boolean	false	Generate PDF archives

Error Handling

Error	Cause	Solution
401 Unauthorized	Invalid API key	Verify API key in dashboard
402 Payment Required	Insufficient credits	Top up account credits
404 Not Found	Invalid session or page ID	Verify ID exists
429 Too Many Requests	Rate limit exceeded	Reduce request frequency

Next Steps

Set up webhooks for crawl completion notifications
API Reference →

Documentation Index

​Prerequisites

​1. Start a Crawl

​Multiple URLs

​2. Check Crawl Status

​3. List Crawled Pages

​4. Retrieve Page Content

​Configuration Options

​Artifact Options

​Error Handling

​Next Steps

Prerequisites

1. Start a Crawl

Multiple URLs

2. Check Crawl Status

3. List Crawled Pages

4. Retrieve Page Content

Configuration Options

Artifact Options

Error Handling

Next Steps