OrbitScraper API reference
OrbitScraper exposes four live API families for search, extraction, research, and crawl workloads. The current backend contract uses queue-backed POST endpoints for job creation, plus poll endpoints for long-running jobs and a cancel endpoint for queued crawl jobs.
Base URL
https://api.orbitscraper.com
All public product requests use the same API host.
Authentication
x-api-key
Send x-api-key: ORS_live_1234567890. The current backend rejects bearer auth on these routes.
Execution model
POST, then poll
Search, extract, research, and crawl all create jobs through POST endpoints in the current contract.
Quick start
- 1. Create an API key in the OrbitScraper dashboard.
- 2. Send a POST request to the product endpoint you want to use.
- 3. Read the queued response and keep the returned job id.
- 4. Poll the status endpoint until the job completes and then consume the final payload.
First request example using the current SERP API contract:
import requests
import time
API_BASE = "https://api.orbitscraper.com"
API_KEY = "ORS_live_1234567890"
enqueue = requests.post(
f"{API_BASE}/v1/search",
headers={
"x-api-key": API_KEY,
"Content-Type": "application/json",
},
json={
"q": "best ai chips 2026",
"engine": "google",
"location": "New York",
"gl": "us",
"hl": "en",
"device": "desktop",
"num": 10,
"page": 1,
"safe": "active",
},
timeout=30,
)
enqueue.raise_for_status()
job = enqueue.json()
while True:
status = requests.get(
f"{API_BASE}/v1/search/{job['jobId']}",
headers={"x-api-key": API_KEY},
timeout=30,
)
status.raise_for_status()
payload = status.json()
if payload["status"] == "completed":
print(payload["result"]["organic_results"][:3])
break
if payload["status"] == "failed":
raise RuntimeError(payload["code"])
time.sleep(1)Products overview
| Product | Endpoint | Credits | Description |
|---|---|---|---|
| SERP API | POST /v1/search | 1 credit per successful request | Queue live search jobs across Google, Bing, Brave, and DuckDuckGo and consume normalized results from one API contract. |
| Extract API | POST /v1/extract | 2 credits per successful request | Queue a single-page extraction job and poll for normalized content, metadata, and extracted fields. |
| Research API | POST /v1/research | 12 credits per job | Queue a research job that discovers sources, fetches supporting content, and returns a synthesized summary with source metadata. |
| Crawl API | POST /v1/crawl | 1 credit per completed page | Queue bounded crawl jobs, monitor progress, and retrieve per-page status through one public API contract. |
SERP API
SERP API removes parser drift, retry handling, and search engine branching from your application code. Send one search job, poll for completion, and work with normalized search modules instead of raw markup.
Endpoint
POST /v1/search
Poll GET /v1/search/:jobId for the final result payload.
Credits
1 credit per successful request
Pagination and retries that become new successful jobs each bill 1 credit.
Output
Structured JSON + optional markdown
Use markdown=true when you need a prompt-ready rendering of the same SERP.
Request parameters
| Name | Type | Required | Description |
|---|---|---|---|
| q | string | Yes | Search query text. Maximum length 512 characters. |
| engine | string | No | Optional engine preference when using POST /v1/search. Supported values: google, bing, brave, duckduckgo. |
| location | string | No | Location context string. Defaults to Global. |
| gl | string | No | Two-letter country code. Defaults to us. |
| hl | string | No | Language code such as en or de-DE. Defaults to en. |
| device | desktop | mobile | No | Client device context. Defaults to desktop. |
| num | integer | No | Results requested per page. Defaults to 10. Range 1-20. |
| page | integer | No | Results page number. Defaults to 1. Range 1-100. |
| safe | active | off | No | Safe search mode. Defaults to active. |
| tbm | string | No | Optional vertical selector. Supported values: isch, nws, vid, lcl. |
| time_period | string | No | Optional freshness filter. Supported values: past_day, past_week, past_month, past_year. |
| markdown | boolean | No | Return a markdown rendering alongside JSON fields when true. |
Response fields
| Name | Type | Description |
|---|---|---|
| search_metadata | object | Request metadata including id, status, created_at, processing_time_ms, credits_used, source, and resolved engine. |
| search_parameters | object | Normalized request parameters used to run the search job. |
| search_information | object | Pagination summary including returned_results, page, and num. |
| organic_results | array | Primary organic result list with position, title, link, displayed_link, and snippet. |
| local_results | array | Local result entries when the query triggers a local pack. |
| knowledge_graph | object | Knowledge graph fields when available for the query. |
| people_also_ask | array | Related question modules returned by the search engine. |
| related_searches | array | Related query suggestions from the same SERP. |
| detected_intent | string | Intent label derived from query context, for example product or informational. |
| markdown | string | Prompt-ready markdown representation. Present only when markdown=true. |
Code examples
Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same SERP API flow.
Start with the raw HTTP request and poll flow.
curl -X POST "https://api.orbitscraper.com/v1/search" \
-H "x-api-key: ORS_live_1234567890" \
-H "Content-Type: application/json" \
-d '{
"q": "best ai chips 2026",
"engine": "google",
"location": "New York",
"gl": "us",
"hl": "en",
"device": "desktop",
"num": 10,
"page": 1,
"safe": "active",
"markdown": true
}'
curl -X GET "https://api.orbitscraper.com/v1/search/search_123456" \
-H "x-api-key: ORS_live_1234567890"Response examples
This is the payload shape you get back from the current public API contract for SERP API.
Queued response
The first response confirms the job was accepted and tells you what to poll.
{
"jobId": "search_123456",
"status": "queued"
}Completed response
After polling, this is the final payload your app reads.
{
"jobId": "search_123456",
"status": "completed",
"result": {
"search_metadata": {
"id": "req_9f2b",
"status": "Success",
"created_at": "2026-03-27T10:30:00.000Z",
"processing_time_ms": 322,
"credits_used": 1,
"source": "live",
"engine": "google"
},
"search_parameters": {
"q": "best ai chips 2026",
"gl": "us",
"hl": "en",
"location": "New York",
"device": "desktop",
"num": 10,
"page": 1,
"safe": "active",
"markdown": true
},
"organic_results": [
{
"position": 1,
"title": "Best AI chips in 2026",
"link": "https://example.com/ai-chips",
"displayed_link": "example.com",
"snippet": "A breakdown of current inference leaders."
}
],
"related_searches": ["ai accelerator comparison"],
"markdown": "# Search Results\n..."
},
"code": null
}Extract API
Extract API turns a single URL into normalized content through an async job flow. It supports rendered fetches, proxy mode selection, and structured output that is easier to store or pass downstream than raw page HTML.
Endpoint
POST /v1/extract
Poll GET /v1/extract/:jobId for the completed extraction result.
Credits
2 credits per successful request
Credits are reserved when the job is queued and finalized when the job completes.
Output
Markdown, JSON, or text
Choose output_format to shape the returned content field.
Request parameters
| Name | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | Target URL to fetch and parse. |
| render_js | boolean | No | Render the page in a browser-backed path when true. Defaults to false. |
| use_proxy | auto | always | never | No | Proxy strategy for the fetch path. Defaults to auto. |
| output_format | markdown | json | text | No | Shape of the content field returned in the completed payload. Defaults to markdown. |
| extract_fields | string[] | No | Optional extraction hints such as title, body, author, or price. |
Response fields
| Name | Type | Description |
|---|---|---|
| url | string | Original URL submitted with the job. |
| title | string | Resolved page title for the extracted document. |
| content | string | Formatted extraction output based on output_format. |
| structured | object | Normalized extracted fields. Includes title, body, and any extracted hints. |
| metadata | object | Fetch details including fetched_at, render_js_used, proxy_used, fetch_path, final_url, content_type, and redirects_followed. |
| extract_credits_used | integer | Credits charged for the completed job. |
Code examples
Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Extract API flow.
Start with the raw HTTP request and poll flow.
curl -X POST "https://api.orbitscraper.com/v1/extract" \
-H "x-api-key: ORS_live_1234567890" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog/ai-chip-roundup",
"render_js": false,
"use_proxy": "auto",
"output_format": "markdown",
"extract_fields": ["title", "body", "author"]
}'
curl -X GET "https://api.orbitscraper.com/v1/extract/extract_123456" \
-H "x-api-key: ORS_live_1234567890"Response examples
This is the payload shape you get back from the current public API contract for Extract API.
Queued response
The first response confirms the job was accepted and tells you what to poll.
{
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"job_id": "extract_123456",
"status": "queued",
"extract_credits_reserved": 2
}Completed response
After polling, this is the final payload your app reads.
{
"job_id": "extract_123456",
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"status": "completed",
"url": "https://example.com/blog/ai-chip-roundup",
"title": "AI chip roundup",
"content": "# AI chip roundup\n...",
"structured": {
"url": "https://example.com/blog/ai-chip-roundup",
"title": "AI chip roundup",
"author": "Orbit Team",
"body": "Extracted body text..."
},
"metadata": {
"fetched_at": "2026-03-27T10:35:00.000Z",
"render_js_used": false,
"proxy_used": false,
"fetch_path": "direct",
"final_url": "https://example.com/blog/ai-chip-roundup",
"content_type": "text/html",
"redirects_followed": 0
},
"extract_credits_used": 2
}Research API
Research API sits above search and extraction. It discovers source URLs, fetches readable evidence, and returns a synthesized answer plus source metadata so your application gets both the summary and the trail behind it.
Endpoint
POST /v1/research
Poll GET /v1/research/:jobId for completed or partial results.
Credits
12 credits per job
The current backend reserves and charges a flat amount per completed research job.
Output
Summary, detailed, or bullets
Choose output_format to shape the synthesis instruction sent to the LLM layer.
Request parameters
| Name | Type | Required | Description |
|---|---|---|---|
| query | string | Yes | Research prompt or question to investigate. |
| depth | integer | No | Research depth. Defaults to 5. Allowed range 1-10. |
| output_format | summary | detailed | bullets | No | Controls the synthesis style. Defaults to summary. |
| include_sources | boolean | No | Include the source list in the final result. Defaults to true. |
Response fields
| Name | Type | Description |
|---|---|---|
| query | string | Original research query. |
| summary | string | Final synthesized answer returned by the LLM layer. |
| sources | array | Source entries with url, title, snippet, position, and engine. Empty when include_sources is false. |
| metadata | object | Execution metadata including status, failed_sources, serp_engine_used, and serp_provider_used. |
| provider | string | LLM provider used to generate the answer. |
| model | string | Model identifier used for synthesis. |
| research_credits_used | integer | Credits charged for the job. |
Code examples
Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Research API flow.
Start with the raw HTTP request and poll flow.
curl -X POST "https://api.orbitscraper.com/v1/research" \
-H "x-api-key: ORS_live_1234567890" \
-H "Content-Type: application/json" \
-d '{
"query": "Which AI chip vendors are gaining share in inference workloads?",
"depth": 5,
"output_format": "summary",
"include_sources": true
}'
curl -X GET "https://api.orbitscraper.com/v1/research/research_123456" \
-H "x-api-key: ORS_live_1234567890"Response examples
This is the payload shape you get back from the current public API contract for Research API.
Queued response
The first response confirms the job was accepted and tells you what to poll.
{
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"job_id": "research_123456",
"status": "queued",
"research_credits_reserved": 12
}Completed response
After polling, this is the final payload your app reads.
{
"job_id": "research_123456",
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"status": "completed",
"query": "Which AI chip vendors are gaining share in inference workloads?",
"summary": "NVIDIA remains dominant, while AMD and hyperscaler silicon are gaining share in targeted inference workloads.",
"sources": [
{
"url": "https://example.com/ai-chip-landscape",
"title": "Top AI chip hardware and chip-making companies in 2026",
"snippet": "AMD and hyperscaler custom silicon continue to gain share...",
"position": 1,
"engine": "google"
}
],
"metadata": {
"status": "completed",
"failed_sources": [],
"serp_engine_used": "google",
"serp_provider_used": "live"
},
"provider": "openai",
"model": "gpt-5-mini",
"research_credits_used": 12
}Crawl API
Crawl API handles same-origin discovery, robots-aware crawling, page caps, and progress tracking through one async contract. It is built for teams that need bounded site walks without owning the queueing and crawl-control layer themselves.
Endpoint
POST /v1/crawl
Poll GET /v1/crawl/:jobId and cancel queued jobs with DELETE /v1/crawl/:jobId.
Credits
1 credit per completed page
Credits are reserved up front based on max_pages and finalized as the crawl completes.
Output
Progress JSON with page status list
Read crawl progress, job status, and per-page status from the status endpoint.
Request parameters
| Name | Type | Required | Description |
|---|---|---|---|
| domain | string | Yes | Starting domain or URL. HTTPS is assumed if omitted. |
| max_pages | integer | No | Maximum pages to crawl. Defaults to 50. Range 1-500. |
| depth | integer | No | Link depth from the seed URL. Defaults to 3. Range 1-5. |
| include_patterns | string[] | No | Optional glob-style path allowlist. |
| exclude_patterns | string[] | No | Optional glob-style path denylist. |
| render_js | boolean | No | Use browser-backed rendering for page fetches when true. Defaults to false. |
| use_proxy | auto | always | never | No | Proxy strategy for fetches. Defaults to auto. |
| webhook_url | string | No | Optional completion webhook target. |
Response fields
| Name | Type | Description |
|---|---|---|
| job_id | string | Crawl job identifier. |
| request_id | string | Request identifier for tracing the public API call. |
| trace_id | string | Trace identifier attached to the crawl lifecycle. |
| status | string | Current job status such as queued, running, completed, failed, cancelled, or expired. |
| domain | string | Normalized crawl seed URL. |
| max_pages | integer | Configured crawl page budget. |
| depth | integer | Configured crawl depth. |
| pages_found | integer | Pages discovered so far. |
| pages_completed | integer | Pages completed successfully. |
| pages_failed | integer | Pages that failed or were skipped. |
| credits_reserved | integer | Credits reserved from the configured crawl budget. |
| credits_charged | integer | Credits actually charged so far. |
| webhook_url | string | null | Configured webhook target, if any. |
| error | object | undefined | Present on failed, cancelled, or expired jobs. |
| pages | array | Per-page status objects with url, status, title, error_code, and fetched_at. |
Code examples
Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Crawl API flow.
Start with the raw HTTP request and poll flow.
curl -X POST "https://api.orbitscraper.com/v1/crawl" \
-H "x-api-key: ORS_live_1234567890" \
-H "Content-Type: application/json" \
-d '{
"domain": "https://docs.example.com",
"max_pages": 25,
"depth": 2,
"include_patterns": ["/blog/**", "/docs/**"],
"exclude_patterns": ["/account/**"],
"render_js": false,
"use_proxy": "auto"
}'
curl -X GET "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
-H "x-api-key: ORS_live_1234567890"
curl -X DELETE "https://api.orbitscraper.com/v1/crawl/crawl_123456" \
-H "x-api-key: ORS_live_1234567890"Response examples
This is the payload shape you get back from the current public API contract for Crawl API.
Queued response
The first response confirms the job was accepted and tells you what to poll.
{
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"job_id": "crawl_123456",
"status": "queued",
"crawl_credits_reserved": 25
}Completed response
After polling, this is the final payload your app reads.
{
"job_id": "crawl_123456",
"request_id": "req_xyz",
"trace_id": "trace_xyz",
"status": "completed",
"domain": "https://docs.example.com",
"max_pages": 25,
"depth": 2,
"pages_found": 18,
"pages_completed": 16,
"pages_failed": 2,
"credits_reserved": 25,
"credits_charged": 16,
"webhook_url": null,
"pages": [
{
"url": "https://docs.example.com/getting-started",
"status": "completed",
"title": "Getting started",
"error_code": null,
"fetched_at": "2026-03-27T10:40:00.000Z"
}
]
}Error reference
| Code | Meaning | Action |
|---|---|---|
| 400 | Bad request or invalid parameter. | Validate field names, types, and allowed enum values before retrying. |
| 401 | Missing or invalid x-api-key. | Send a valid x-api-key header and confirm the key is active. |
| 402 | Insufficient credits. | Top up credits or reduce request volume before retrying. |
| 404 | Job not found for the current tenant or API family. | Check the job id, API family, and tenant context before polling again. |
| 409 | Conflict on the current job state. | Only queued crawl jobs can be cancelled. Poll the job instead when it is already running. |
| 429 | Rate limit exceeded. | Back off, spread requests over time, or use a plan with higher limits. |
| 500 | Internal server error. | Safe to retry with backoff. |
| 503 | Queue or backend dependency temporarily unavailable. | Retry after a short backoff. This usually indicates a temporary enqueue issue. |
Credits guide
How credits are consumed
- SERP API: 1 credit per successful request. Pagination and retries that become new successful jobs each bill 1 credit.
- Extract API: 2 credits per successful request. Credits are reserved when the job is queued and finalized when the job completes.
- Research API: 12 credits per job. The current backend reserves and charges a flat amount per completed research job.
- Crawl API: 1 credit per completed page. Credits are reserved up front based on max_pages and finalized as the crawl completes.
Ways to reduce spend
- Set the smallest page, depth, and crawl budgets that still answer the task.
- Use search engine forcing only when you need a specific engine; otherwise keep one code path around POST /v1/search.
- Only request markdown or rendered extraction when your downstream workflow actually needs it.
- For crawl jobs, keep max_pages close to the real site section size so reserved credits stay predictable.