Documentation

OrbitScraper API reference

OrbitScraper exposes four live API families for search, extraction, research, and crawl workloads. The current backend contract uses queue-backed POST endpoints for job creation, plus poll endpoints for long-running jobs and a cancel endpoint for queued crawl jobs.

Base URL

https://api.orbitscraper.com

All public product requests use the same API host.

Authentication

x-api-key

Send x-api-key: ORS_live_1234567890. The current backend rejects bearer auth on these routes.

Execution model

POST, then poll

Search, extract, research, and crawl all create jobs through POST endpoints in the current contract.

Quick start

1. Create an API key in the OrbitScraper dashboard.
2. Send a POST request to the product endpoint you want to use.
3. Read the queued response and keep the returned job id.
4. Poll the status endpoint until the job completes and then consume the final payload.

First request example using the current SERP API contract:

python

import requests
import time

API_BASE = "https://api.orbitscraper.com"
API_KEY = "ORS_live_1234567890"

enqueue = requests.post(
    f"{API_BASE}/v1/search",
    headers={
        "x-api-key": API_KEY,
        "Content-Type": "application/json",
    },
    json={
        "q": "best ai chips 2026",
        "engine": "google",
        "location": "New York",
        "gl": "us",
        "hl": "en",
        "device": "desktop",
        "num": 10,
        "page": 1,
        "safe": "active",
    },
    timeout=30,
)
enqueue.raise_for_status()
job = enqueue.json()

while True:
    status = requests.get(
        f"{API_BASE}/v1/search/{job['jobId']}",
        headers={"x-api-key": API_KEY},
        timeout=30,
    )
    status.raise_for_status()
    payload = status.json()
    if payload["status"] == "completed":
        print(payload["result"]["organic_results"][:3])
        break
    if payload["status"] == "failed":
        raise RuntimeError(payload["code"])
    time.sleep(1)

Products overview

Product	Endpoint	Credits	Description
SERP API	POST /v1/search	1 credit per successful request	Queue live search jobs across Google, Bing, Brave, and DuckDuckGo and consume normalized results from one API contract.
Extract API	POST /v1/extract	2 credits per successful request	Queue a single-page extraction job and poll for normalized content, metadata, and extracted fields.
Research API	POST /v1/research	12 credits per job	Queue a research job that discovers sources, fetches supporting content, and returns a synthesized summary with source metadata.
Crawl API	POST /v1/crawl	1 credit per completed page	Queue bounded crawl jobs, monitor progress, and retrieve per-page status through one public API contract.

SERP API

SERP API removes parser drift, retry handling, and search engine branching from your application code. Send one search job, poll for completion, and work with normalized search modules instead of raw markup.

Open product page

Endpoint

POST /v1/search

Poll GET /v1/search/:jobId for the final result payload.

Credits

1 credit per successful request

Pagination and retries that become new successful jobs each bill 1 credit.

Output

Structured JSON + optional markdown

Use markdown=true when you need a prompt-ready rendering of the same SERP.

Request parameters

Name	Type	Required	Description
q	string	Yes	Search query text. Maximum length 512 characters.
engine	string	No	Optional engine preference when using POST /v1/search. Supported values: google, bing, brave, duckduckgo.
location	string	No	Location context string. Defaults to Global.
gl	string	No	Two-letter country code. Defaults to us.
hl	string	No	Language code such as en or de-DE. Defaults to en.
device	desktop \| mobile	No	Client device context. Defaults to desktop.
num	integer	No	Results requested per page. Defaults to 10. Range 1-20.
page	integer	No	Results page number. Defaults to 1. Range 1-100.
safe	active \| off	No	Safe search mode. Defaults to active.
tbm	string	No	Optional vertical selector. Supported values: isch, nws, vid, lcl.
time_period	string	No	Optional freshness filter. Supported values: past_day, past_week, past_month, past_year.
markdown	boolean	No	Return a markdown rendering alongside JSON fields when true.

Response fields

Name	Type	Description
search_metadata	object	Request metadata including id, status, created_at, processing_time_ms, credits_used, source, and resolved engine.
search_parameters	object	Normalized request parameters used to run the search job.
search_information	object	Pagination summary including returned_results, page, and num.
organic_results	array	Primary organic result list with position, title, link, displayed_link, and snippet.
local_results	array	Local result entries when the query triggers a local pack.
knowledge_graph	object	Knowledge graph fields when available for the query.
people_also_ask	array	Related question modules returned by the search engine.
related_searches	array	Related query suggestions from the same SERP.
detected_intent	string	Intent label derived from query context, for example product or informational.
markdown	string	Prompt-ready markdown representation. Present only when markdown=true.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same SERP API flow.

Start with the raw HTTP request and poll flow.

bash

curl -X POST "https://api.orbitscraper.com/v1/search" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "best ai chips 2026",
    "engine": "google",
    "location": "New York",
    "gl": "us",
    "hl": "en",
    "device": "desktop",
    "num": 10,
    "page": 1,
    "safe": "active",
    "markdown": true
  }'

curl -X GET "https://api.orbitscraper.com/v1/search/search_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for SERP API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json

{
  "jobId": "search_123456",
  "status": "queued"
}

Completed response

After polling, this is the final payload your app reads.

json

{
  "jobId": "search_123456",
  "status": "completed",
  "result": {
    "search_metadata": {
      "id": "req_9f2b",
      "status": "Success",
      "created_at": "2026-03-27T10:30:00.000Z",
      "processing_time_ms": 322,
      "credits_used": 1,
      "source": "live",
      "engine": "google"
    },
    "search_parameters": {
      "q": "best ai chips 2026",
      "gl": "us",
      "hl": "en",
      "location": "New York",
      "device": "desktop",
      "num": 10,
      "page": 1,
      "safe": "active",
      "markdown": true
    },
    "organic_results": [
      {
        "position": 1,
        "title": "Best AI chips in 2026",
        "link": "https://example.com/ai-chips",
        "displayed_link": "example.com",
        "snippet": "A breakdown of current inference leaders."
      }
    ],
    "related_searches": ["ai accelerator comparison"],
    "markdown": "# Search Results\n..."
  },
  "code": null
}

Extract API

Extract API turns a single URL into normalized content through an async job flow. It supports rendered fetches, proxy mode selection, and structured output that is easier to store or pass downstream than raw page HTML.

Open product page

Endpoint

POST /v1/extract

Poll GET /v1/extract/:jobId for the completed extraction result.

Credits

2 credits per successful request

Credits are reserved when the job is queued and finalized when the job completes.

Output

Markdown, JSON, or text

Choose output_format to shape the returned content field.

Request parameters

Name	Type	Required	Description
url	string	Yes	Target URL to fetch and parse.
render_js	boolean	No	Render the page in a browser-backed path when true. Defaults to false.
use_proxy	auto \| always \| never	No	Proxy strategy for the fetch path. Defaults to auto.
output_format	markdown \| json \| text	No	Shape of the content field returned in the completed payload. Defaults to markdown.
extract_fields	string[]	No	Optional extraction hints such as title, body, author, or price.

Response fields

Name	Type	Description
url	string	Original URL submitted with the job.
title	string	Resolved page title for the extracted document.
content	string	Formatted extraction output based on output_format.
structured	object	Normalized extracted fields. Includes title, body, and any extracted hints.
metadata	object	Fetch details including fetched_at, render_js_used, proxy_used, fetch_path, final_url, content_type, and redirects_followed.
extract_credits_used	integer	Credits charged for the completed job.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Extract API flow.

Start with the raw HTTP request and poll flow.

bash

curl -X POST "https://api.orbitscraper.com/v1/extract" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/ai-chip-roundup",
    "render_js": false,
    "use_proxy": "auto",
    "output_format": "markdown",
    "extract_fields": ["title", "body", "author"]
  }'

curl -X GET "https://api.orbitscraper.com/v1/extract/extract_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Extract API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json

{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "extract_123456",
  "status": "queued",
  "extract_credits_reserved": 2
}

Completed response

After polling, this is the final payload your app reads.

json

{
  "job_id": "extract_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "url": "https://example.com/blog/ai-chip-roundup",
  "title": "AI chip roundup",
  "content": "# AI chip roundup\n...",
  "structured": {
    "url": "https://example.com/blog/ai-chip-roundup",
    "title": "AI chip roundup",
    "author": "Orbit Team",
    "body": "Extracted body text..."
  },
  "metadata": {
    "fetched_at": "2026-03-27T10:35:00.000Z",
    "render_js_used": false,
    "proxy_used": false,
    "fetch_path": "direct",
    "final_url": "https://example.com/blog/ai-chip-roundup",
    "content_type": "text/html",
    "redirects_followed": 0
  },
  "extract_credits_used": 2
}

Research API

Research API sits above search and extraction. It discovers source URLs, fetches readable evidence, and returns a synthesized answer plus source metadata so your application gets both the summary and the trail behind it.

Open product page

Endpoint

POST /v1/research

Poll GET /v1/research/:jobId for completed or partial results.

Credits

12 credits per job

The current backend reserves and charges a flat amount per completed research job.

Output

Summary, detailed, or bullets

Choose output_format to shape the synthesis instruction sent to the LLM layer.

Request parameters

Name	Type	Required	Description
query	string	Yes	Research prompt or question to investigate.
depth	integer	No	Research depth. Defaults to 5. Allowed range 1-10.
output_format	summary \| detailed \| bullets	No	Controls the synthesis style. Defaults to summary.
include_sources	boolean	No	Include the source list in the final result. Defaults to true.

Response fields

Name	Type	Description
query	string	Original research query.
summary	string	Final synthesized answer returned by the LLM layer.
sources	array	Source entries with url, title, snippet, position, and engine. Empty when include_sources is false.
metadata	object	Execution metadata including status, failed_sources, serp_engine_used, and serp_provider_used.
provider	string	LLM provider used to generate the answer.
model	string	Model identifier used for synthesis.
research_credits_used	integer	Credits charged for the job.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Research API flow.

Start with the raw HTTP request and poll flow.

bash

curl -X POST "https://api.orbitscraper.com/v1/research" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Which AI chip vendors are gaining share in inference workloads?",
    "depth": 5,
    "output_format": "summary",
    "include_sources": true
  }'

curl -X GET "https://api.orbitscraper.com/v1/research/research_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Research API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json

{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "research_123456",
  "status": "queued",
  "research_credits_reserved": 12
}

Completed response

After polling, this is the final payload your app reads.

json

{
  "job_id": "research_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "query": "Which AI chip vendors are gaining share in inference workloads?",
  "summary": "NVIDIA remains dominant, while AMD and hyperscaler silicon are gaining share in targeted inference workloads.",
  "sources": [
    {
      "url": "https://example.com/ai-chip-landscape",
      "title": "Top AI chip hardware and chip-making companies in 2026",
      "snippet": "AMD and hyperscaler custom silicon continue to gain share...",
      "position": 1,
      "engine": "google"
    }
  ],
  "metadata": {
    "status": "completed",
    "failed_sources": [],
    "serp_engine_used": "google",
    "serp_provider_used": "live"
  },
  "provider": "openai",
  "model": "gpt-5-mini",
  "research_credits_used": 12
}

Crawl API

Crawl API handles same-origin discovery, robots-aware crawling, page caps, and progress tracking through one async contract. It is built for teams that need bounded site walks without owning the queueing and crawl-control layer themselves.

Open product page

Endpoint

POST /v1/crawl

Poll GET /v1/crawl/:jobId and cancel queued jobs with DELETE /v1/crawl/:jobId.

Credits

1 credit per completed page

Credits are reserved up front based on max_pages and finalized as the crawl completes.

Output

Progress JSON with page status list

Read crawl progress, job status, and per-page status from the status endpoint.

Request parameters

Name	Type	Required	Description
domain	string	Yes	Starting domain or URL. HTTPS is assumed if omitted.
max_pages	integer	No	Maximum pages to crawl. Defaults to 50. Range 1-500.
depth	integer	No	Link depth from the seed URL. Defaults to 3. Range 1-5.
include_patterns	string[]	No	Optional glob-style path allowlist.
exclude_patterns	string[]	No	Optional glob-style path denylist.
render_js	boolean	No	Use browser-backed rendering for page fetches when true. Defaults to false.
use_proxy	auto \| always \| never	No	Proxy strategy for fetches. Defaults to auto.
webhook_url	string	No	Optional completion webhook target.

Response fields

Name	Type	Description
job_id	string	Crawl job identifier.
request_id	string	Request identifier for tracing the public API call.
trace_id	string	Trace identifier attached to the crawl lifecycle.
status	string	Current job status such as queued, running, completed, failed, cancelled, or expired.
domain	string	Normalized crawl seed URL.
max_pages	integer	Configured crawl page budget.
depth	integer	Configured crawl depth.
pages_found	integer	Pages discovered so far.
pages_completed	integer	Pages completed successfully.
pages_failed	integer	Pages that failed or were skipped.
credits_reserved	integer	Credits reserved from the configured crawl budget.
credits_charged	integer	Credits actually charged so far.
webhook_url	string \| null	Configured webhook target, if any.
error	object \| undefined	Present on failed, cancelled, or expired jobs.
pages	array	Per-page status objects with url, status, title, error_code, and fetched_at.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Crawl API flow.

Start with the raw HTTP request and poll flow.

bash

curl -X POST "https://api.orbitscraper.com/v1/crawl" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "https://docs.example.com",
    "max_pages": 25,
    "depth": 2,
    "include_patterns": ["/blog/**", "/docs/**"],
    "exclude_patterns": ["/account/**"],
    "render_js": false,
    "use_proxy": "auto"
  }'

curl -X GET "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

curl -X DELETE "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Crawl API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json

{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "crawl_123456",
  "status": "queued",
  "crawl_credits_reserved": 25
}

Completed response

After polling, this is the final payload your app reads.

json

{
  "job_id": "crawl_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "domain": "https://docs.example.com",
  "max_pages": 25,
  "depth": 2,
  "pages_found": 18,
  "pages_completed": 16,
  "pages_failed": 2,
  "credits_reserved": 25,
  "credits_charged": 16,
  "webhook_url": null,
  "pages": [
    {
      "url": "https://docs.example.com/getting-started",
      "status": "completed",
      "title": "Getting started",
      "error_code": null,
      "fetched_at": "2026-03-27T10:40:00.000Z"
    }
  ]
}

Error reference

Code	Meaning	Action
400	Bad request or invalid parameter.	Validate field names, types, and allowed enum values before retrying.
401	Missing or invalid x-api-key.	Send a valid x-api-key header and confirm the key is active.
402	Insufficient credits.	Top up credits or reduce request volume before retrying.
404	Job not found for the current tenant or API family.	Check the job id, API family, and tenant context before polling again.
409	Conflict on the current job state.	Only queued crawl jobs can be cancelled. Poll the job instead when it is already running.
429	Rate limit exceeded.	Back off, spread requests over time, or use a plan with higher limits.
500	Internal server error.	Safe to retry with backoff.
503	Queue or backend dependency temporarily unavailable.	Retry after a short backoff. This usually indicates a temporary enqueue issue.

Credits guide

How credits are consumed

SERP API: 1 credit per successful request. Pagination and retries that become new successful jobs each bill 1 credit.
Extract API: 2 credits per successful request. Credits are reserved when the job is queued and finalized when the job completes.
Research API: 12 credits per job. The current backend reserves and charges a flat amount per completed research job.
Crawl API: 1 credit per completed page. Credits are reserved up front based on max_pages and finalized as the crawl completes.

Ways to reduce spend

Set the smallest page, depth, and crawl budgets that still answer the task.
Use search engine forcing only when you need a specific engine; otherwise keep one code path around POST /v1/search.
Only request markdown or rendered extraction when your downstream workflow actually needs it.
For crawl jobs, keep max_pages close to the real site section size so reserved credits stay predictable.