<- Back to API docs

LIVE PRODUCT

Crawl API

Crawl API handles same-origin discovery, robots-aware crawling, page caps, and progress tracking through one async contract. It is built for teams that need bounded site walks without owning the queueing and crawl-control layer themselves.

Endpoint

POST /v1/crawl

Poll GET /v1/crawl/:jobId and cancel queued jobs with DELETE /v1/crawl/:jobId.

Credits

1 credit per completed page

Credits are reserved up front based on max_pages and finalized as the crawl completes.

Output

Progress JSON with page status list

Read crawl progress, job status, and per-page status from the status endpoint.

Request parameters

NameTypeRequiredDescription
domainstringYesStarting domain or URL. HTTPS is assumed if omitted.
max_pagesintegerNoMaximum pages to crawl. Defaults to 50. Range 1-500.
depthintegerNoLink depth from the seed URL. Defaults to 3. Range 1-5.
include_patternsstring[]NoOptional glob-style path allowlist.
exclude_patternsstring[]NoOptional glob-style path denylist.
render_jsbooleanNoUse browser-backed rendering for page fetches when true. Defaults to false.
use_proxyauto | always | neverNoProxy strategy for fetches. Defaults to auto.
webhook_urlstringNoOptional completion webhook target.

Response fields

NameTypeDescription
job_idstringCrawl job identifier.
request_idstringRequest identifier for tracing the public API call.
trace_idstringTrace identifier attached to the crawl lifecycle.
statusstringCurrent job status such as queued, running, completed, failed, cancelled, or expired.
domainstringNormalized crawl seed URL.
max_pagesintegerConfigured crawl page budget.
depthintegerConfigured crawl depth.
pages_foundintegerPages discovered so far.
pages_completedintegerPages completed successfully.
pages_failedintegerPages that failed or were skipped.
credits_reservedintegerCredits reserved from the configured crawl budget.
credits_chargedintegerCredits actually charged so far.
webhook_urlstring | nullConfigured webhook target, if any.
errorobject | undefinedPresent on failed, cancelled, or expired jobs.
pagesarrayPer-page status objects with url, status, title, error_code, and fetched_at.

Code examples

Switch languages in one place. The examples below all follow the current Crawl API contract.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/crawl" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "https://docs.example.com",
    "max_pages": 25,
    "depth": 2,
    "include_patterns": ["/blog/**", "/docs/**"],
    "exclude_patterns": ["/account/**"],
    "render_js": false,
    "use_proxy": "auto"
  }'

curl -X GET "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

curl -X DELETE "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Crawl API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json
{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "crawl_123456",
  "status": "queued",
  "crawl_credits_reserved": 25
}

Completed response

After polling, this is the final payload your app reads.

json
{
  "job_id": "crawl_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "domain": "https://docs.example.com",
  "max_pages": 25,
  "depth": 2,
  "pages_found": 18,
  "pages_completed": 16,
  "pages_failed": 2,
  "credits_reserved": 25,
  "credits_charged": 16,
  "webhook_url": null,
  "pages": [
    {
      "url": "https://docs.example.com/getting-started",
      "status": "completed",
      "title": "Getting started",
      "error_code": null,
      "fetched_at": "2026-03-27T10:40:00.000Z"
    }
  ]
}

Operational notes

  • The current public contract uses domain, include_patterns, and exclude_patterns rather than singular url or pattern fields.
  • Only queued crawl jobs can be cancelled through DELETE /v1/crawl/:jobId.
  • The current deployment bills 1 credit per completed page and reserves credits from the max_pages budget up front.

Start scraping faster - ask Orbit AI.