Documentation

OrbitScraper API reference

OrbitScraper exposes four live API families for search, extraction, research, and crawl workloads. The current backend contract uses queue-backed POST endpoints for job creation, plus poll endpoints for long-running jobs and a cancel endpoint for queued crawl jobs.

Base URL

https://api.orbitscraper.com

All public product requests use the same API host.

Authentication

x-api-key

Send x-api-key: ORS_live_1234567890. The current backend rejects bearer auth on these routes.

Execution model

POST, then poll

Search, extract, research, and crawl all create jobs through POST endpoints in the current contract.

Quick start

  1. 1. Create an API key in the OrbitScraper dashboard.
  2. 2. Send a POST request to the product endpoint you want to use.
  3. 3. Read the queued response and keep the returned job id.
  4. 4. Poll the status endpoint until the job completes and then consume the final payload.

First request example using the current SERP API contract:

python
import requests
import time

API_BASE = "https://api.orbitscraper.com"
API_KEY = "ORS_live_1234567890"

enqueue = requests.post(
    f"{API_BASE}/v1/search",
    headers={
        "x-api-key": API_KEY,
        "Content-Type": "application/json",
    },
    json={
        "q": "best ai chips 2026",
        "engine": "google",
        "location": "New York",
        "gl": "us",
        "hl": "en",
        "device": "desktop",
        "num": 10,
        "page": 1,
        "safe": "active",
    },
    timeout=30,
)
enqueue.raise_for_status()
job = enqueue.json()

while True:
    status = requests.get(
        f"{API_BASE}/v1/search/{job['jobId']}",
        headers={"x-api-key": API_KEY},
        timeout=30,
    )
    status.raise_for_status()
    payload = status.json()
    if payload["status"] == "completed":
        print(payload["result"]["organic_results"][:3])
        break
    if payload["status"] == "failed":
        raise RuntimeError(payload["code"])
    time.sleep(1)

Products overview

ProductEndpointCreditsDescription
SERP APIPOST /v1/search1 credit per successful requestQueue live search jobs across Google, Bing, Brave, and DuckDuckGo and consume normalized results from one API contract.
Extract APIPOST /v1/extract2 credits per successful requestQueue a single-page extraction job and poll for normalized content, metadata, and extracted fields.
Research APIPOST /v1/research12 credits per jobQueue a research job that discovers sources, fetches supporting content, and returns a synthesized summary with source metadata.
Crawl APIPOST /v1/crawl1 credit per completed pageQueue bounded crawl jobs, monitor progress, and retrieve per-page status through one public API contract.

SERP API

SERP API removes parser drift, retry handling, and search engine branching from your application code. Send one search job, poll for completion, and work with normalized search modules instead of raw markup.

Open product page

Endpoint

POST /v1/search

Poll GET /v1/search/:jobId for the final result payload.

Credits

1 credit per successful request

Pagination and retries that become new successful jobs each bill 1 credit.

Output

Structured JSON + optional markdown

Use markdown=true when you need a prompt-ready rendering of the same SERP.

Request parameters

NameTypeRequiredDescription
qstringYesSearch query text. Maximum length 512 characters.
enginestringNoOptional engine preference when using POST /v1/search. Supported values: google, bing, brave, duckduckgo.
locationstringNoLocation context string. Defaults to Global.
glstringNoTwo-letter country code. Defaults to us.
hlstringNoLanguage code such as en or de-DE. Defaults to en.
devicedesktop | mobileNoClient device context. Defaults to desktop.
numintegerNoResults requested per page. Defaults to 10. Range 1-20.
pageintegerNoResults page number. Defaults to 1. Range 1-100.
safeactive | offNoSafe search mode. Defaults to active.
tbmstringNoOptional vertical selector. Supported values: isch, nws, vid, lcl.
time_periodstringNoOptional freshness filter. Supported values: past_day, past_week, past_month, past_year.
markdownbooleanNoReturn a markdown rendering alongside JSON fields when true.

Response fields

NameTypeDescription
search_metadataobjectRequest metadata including id, status, created_at, processing_time_ms, credits_used, source, and resolved engine.
search_parametersobjectNormalized request parameters used to run the search job.
search_informationobjectPagination summary including returned_results, page, and num.
organic_resultsarrayPrimary organic result list with position, title, link, displayed_link, and snippet.
local_resultsarrayLocal result entries when the query triggers a local pack.
knowledge_graphobjectKnowledge graph fields when available for the query.
people_also_askarrayRelated question modules returned by the search engine.
related_searchesarrayRelated query suggestions from the same SERP.
detected_intentstringIntent label derived from query context, for example product or informational.
markdownstringPrompt-ready markdown representation. Present only when markdown=true.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same SERP API flow.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/search" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "best ai chips 2026",
    "engine": "google",
    "location": "New York",
    "gl": "us",
    "hl": "en",
    "device": "desktop",
    "num": 10,
    "page": 1,
    "safe": "active",
    "markdown": true
  }'

curl -X GET "https://api.orbitscraper.com/v1/search/search_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for SERP API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json
{
  "jobId": "search_123456",
  "status": "queued"
}

Completed response

After polling, this is the final payload your app reads.

json
{
  "jobId": "search_123456",
  "status": "completed",
  "result": {
    "search_metadata": {
      "id": "req_9f2b",
      "status": "Success",
      "created_at": "2026-03-27T10:30:00.000Z",
      "processing_time_ms": 322,
      "credits_used": 1,
      "source": "live",
      "engine": "google"
    },
    "search_parameters": {
      "q": "best ai chips 2026",
      "gl": "us",
      "hl": "en",
      "location": "New York",
      "device": "desktop",
      "num": 10,
      "page": 1,
      "safe": "active",
      "markdown": true
    },
    "organic_results": [
      {
        "position": 1,
        "title": "Best AI chips in 2026",
        "link": "https://example.com/ai-chips",
        "displayed_link": "example.com",
        "snippet": "A breakdown of current inference leaders."
      }
    ],
    "related_searches": ["ai accelerator comparison"],
    "markdown": "# Search Results\n..."
  },
  "code": null
}

Extract API

Extract API turns a single URL into normalized content through an async job flow. It supports rendered fetches, proxy mode selection, and structured output that is easier to store or pass downstream than raw page HTML.

Open product page

Endpoint

POST /v1/extract

Poll GET /v1/extract/:jobId for the completed extraction result.

Credits

2 credits per successful request

Credits are reserved when the job is queued and finalized when the job completes.

Output

Markdown, JSON, or text

Choose output_format to shape the returned content field.

Request parameters

NameTypeRequiredDescription
urlstringYesTarget URL to fetch and parse.
render_jsbooleanNoRender the page in a browser-backed path when true. Defaults to false.
use_proxyauto | always | neverNoProxy strategy for the fetch path. Defaults to auto.
output_formatmarkdown | json | textNoShape of the content field returned in the completed payload. Defaults to markdown.
extract_fieldsstring[]NoOptional extraction hints such as title, body, author, or price.

Response fields

NameTypeDescription
urlstringOriginal URL submitted with the job.
titlestringResolved page title for the extracted document.
contentstringFormatted extraction output based on output_format.
structuredobjectNormalized extracted fields. Includes title, body, and any extracted hints.
metadataobjectFetch details including fetched_at, render_js_used, proxy_used, fetch_path, final_url, content_type, and redirects_followed.
extract_credits_usedintegerCredits charged for the completed job.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Extract API flow.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/extract" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/ai-chip-roundup",
    "render_js": false,
    "use_proxy": "auto",
    "output_format": "markdown",
    "extract_fields": ["title", "body", "author"]
  }'

curl -X GET "https://api.orbitscraper.com/v1/extract/extract_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Extract API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json
{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "extract_123456",
  "status": "queued",
  "extract_credits_reserved": 2
}

Completed response

After polling, this is the final payload your app reads.

json
{
  "job_id": "extract_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "url": "https://example.com/blog/ai-chip-roundup",
  "title": "AI chip roundup",
  "content": "# AI chip roundup\n...",
  "structured": {
    "url": "https://example.com/blog/ai-chip-roundup",
    "title": "AI chip roundup",
    "author": "Orbit Team",
    "body": "Extracted body text..."
  },
  "metadata": {
    "fetched_at": "2026-03-27T10:35:00.000Z",
    "render_js_used": false,
    "proxy_used": false,
    "fetch_path": "direct",
    "final_url": "https://example.com/blog/ai-chip-roundup",
    "content_type": "text/html",
    "redirects_followed": 0
  },
  "extract_credits_used": 2
}

Research API

Research API sits above search and extraction. It discovers source URLs, fetches readable evidence, and returns a synthesized answer plus source metadata so your application gets both the summary and the trail behind it.

Open product page

Endpoint

POST /v1/research

Poll GET /v1/research/:jobId for completed or partial results.

Credits

12 credits per job

The current backend reserves and charges a flat amount per completed research job.

Output

Summary, detailed, or bullets

Choose output_format to shape the synthesis instruction sent to the LLM layer.

Request parameters

NameTypeRequiredDescription
querystringYesResearch prompt or question to investigate.
depthintegerNoResearch depth. Defaults to 5. Allowed range 1-10.
output_formatsummary | detailed | bulletsNoControls the synthesis style. Defaults to summary.
include_sourcesbooleanNoInclude the source list in the final result. Defaults to true.

Response fields

NameTypeDescription
querystringOriginal research query.
summarystringFinal synthesized answer returned by the LLM layer.
sourcesarraySource entries with url, title, snippet, position, and engine. Empty when include_sources is false.
metadataobjectExecution metadata including status, failed_sources, serp_engine_used, and serp_provider_used.
providerstringLLM provider used to generate the answer.
modelstringModel identifier used for synthesis.
research_credits_usedintegerCredits charged for the job.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Research API flow.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/research" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Which AI chip vendors are gaining share in inference workloads?",
    "depth": 5,
    "output_format": "summary",
    "include_sources": true
  }'

curl -X GET "https://api.orbitscraper.com/v1/research/research_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Research API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json
{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "research_123456",
  "status": "queued",
  "research_credits_reserved": 12
}

Completed response

After polling, this is the final payload your app reads.

json
{
  "job_id": "research_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "query": "Which AI chip vendors are gaining share in inference workloads?",
  "summary": "NVIDIA remains dominant, while AMD and hyperscaler silicon are gaining share in targeted inference workloads.",
  "sources": [
    {
      "url": "https://example.com/ai-chip-landscape",
      "title": "Top AI chip hardware and chip-making companies in 2026",
      "snippet": "AMD and hyperscaler custom silicon continue to gain share...",
      "position": 1,
      "engine": "google"
    }
  ],
  "metadata": {
    "status": "completed",
    "failed_sources": [],
    "serp_engine_used": "google",
    "serp_provider_used": "live"
  },
  "provider": "openai",
  "model": "gpt-5-mini",
  "research_credits_used": 12
}

Crawl API

Crawl API handles same-origin discovery, robots-aware crawling, page caps, and progress tracking through one async contract. It is built for teams that need bounded site walks without owning the queueing and crawl-control layer themselves.

Open product page

Endpoint

POST /v1/crawl

Poll GET /v1/crawl/:jobId and cancel queued jobs with DELETE /v1/crawl/:jobId.

Credits

1 credit per completed page

Credits are reserved up front based on max_pages and finalized as the crawl completes.

Output

Progress JSON with page status list

Read crawl progress, job status, and per-page status from the status endpoint.

Request parameters

NameTypeRequiredDescription
domainstringYesStarting domain or URL. HTTPS is assumed if omitted.
max_pagesintegerNoMaximum pages to crawl. Defaults to 50. Range 1-500.
depthintegerNoLink depth from the seed URL. Defaults to 3. Range 1-5.
include_patternsstring[]NoOptional glob-style path allowlist.
exclude_patternsstring[]NoOptional glob-style path denylist.
render_jsbooleanNoUse browser-backed rendering for page fetches when true. Defaults to false.
use_proxyauto | always | neverNoProxy strategy for fetches. Defaults to auto.
webhook_urlstringNoOptional completion webhook target.

Response fields

NameTypeDescription
job_idstringCrawl job identifier.
request_idstringRequest identifier for tracing the public API call.
trace_idstringTrace identifier attached to the crawl lifecycle.
statusstringCurrent job status such as queued, running, completed, failed, cancelled, or expired.
domainstringNormalized crawl seed URL.
max_pagesintegerConfigured crawl page budget.
depthintegerConfigured crawl depth.
pages_foundintegerPages discovered so far.
pages_completedintegerPages completed successfully.
pages_failedintegerPages that failed or were skipped.
credits_reservedintegerCredits reserved from the configured crawl budget.
credits_chargedintegerCredits actually charged so far.
webhook_urlstring | nullConfigured webhook target, if any.
errorobject | undefinedPresent on failed, cancelled, or expired jobs.
pagesarrayPer-page status objects with url, status, title, error_code, and fetched_at.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Crawl API flow.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/crawl" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "https://docs.example.com",
    "max_pages": 25,
    "depth": 2,
    "include_patterns": ["/blog/**", "/docs/**"],
    "exclude_patterns": ["/account/**"],
    "render_js": false,
    "use_proxy": "auto"
  }'

curl -X GET "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

curl -X DELETE "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Crawl API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json
{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "crawl_123456",
  "status": "queued",
  "crawl_credits_reserved": 25
}

Completed response

After polling, this is the final payload your app reads.

json
{
  "job_id": "crawl_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "domain": "https://docs.example.com",
  "max_pages": 25,
  "depth": 2,
  "pages_found": 18,
  "pages_completed": 16,
  "pages_failed": 2,
  "credits_reserved": 25,
  "credits_charged": 16,
  "webhook_url": null,
  "pages": [
    {
      "url": "https://docs.example.com/getting-started",
      "status": "completed",
      "title": "Getting started",
      "error_code": null,
      "fetched_at": "2026-03-27T10:40:00.000Z"
    }
  ]
}

Error reference

CodeMeaningAction
400Bad request or invalid parameter.Validate field names, types, and allowed enum values before retrying.
401Missing or invalid x-api-key.Send a valid x-api-key header and confirm the key is active.
402Insufficient credits.Top up credits or reduce request volume before retrying.
404Job not found for the current tenant or API family.Check the job id, API family, and tenant context before polling again.
409Conflict on the current job state.Only queued crawl jobs can be cancelled. Poll the job instead when it is already running.
429Rate limit exceeded.Back off, spread requests over time, or use a plan with higher limits.
500Internal server error.Safe to retry with backoff.
503Queue or backend dependency temporarily unavailable.Retry after a short backoff. This usually indicates a temporary enqueue issue.

Credits guide

How credits are consumed

  • SERP API: 1 credit per successful request. Pagination and retries that become new successful jobs each bill 1 credit.
  • Extract API: 2 credits per successful request. Credits are reserved when the job is queued and finalized when the job completes.
  • Research API: 12 credits per job. The current backend reserves and charges a flat amount per completed research job.
  • Crawl API: 1 credit per completed page. Credits are reserved up front based on max_pages and finalized as the crawl completes.

Ways to reduce spend

  • Set the smallest page, depth, and crawl budgets that still answer the task.
  • Use search engine forcing only when you need a specific engine; otherwise keep one code path around POST /v1/search.
  • Only request markdown or rendered extraction when your downstream workflow actually needs it.
  • For crawl jobs, keep max_pages close to the real site section size so reserved credits stay predictable.

Start scraping faster - ask Orbit AI.