LIVE PRODUCT

Extract API

Queue one URL and get structured content back without parsing raw HTML yourself.

Extract API turns a single URL into normalized content through an async job flow. It supports rendered fetches, proxy mode selection, and structured output that is easier to store or pass downstream than raw page HTML.

Endpoint

POST /v1/extract

Poll GET /v1/extract/:jobId for the completed extraction result.

Credits

2 credits per successful request

Credits are reserved when the job is queued and finalized when the job completes.

Output

Markdown, JSON, or text

Choose output_format to shape the returned content field.

What it's for

  • page extraction for articles, docs, and product content
  • price or author field extraction from listing and product pages
  • normalized content for downstream indexing jobs
  • structured page snapshots for LLM or analytics workflows
  • safe fetch pipelines with explicit render and proxy controls

How it works

  1. 1Submit a URL plus render, proxy, and output settings.
  2. 2OrbitScraper fetches the page, parses readable content, and enriches structured fields when hints are present.
  3. 3Poll the job endpoint for the final content, structured data, and fetch metadata.

Request parameters

These are the fields accepted by the current backend contract for POST /v1/extract.

NameTypeRequiredDescription
urlstringYesTarget URL to fetch and parse.
render_jsbooleanNoRender the page in a browser-backed path when true. Defaults to false.
use_proxyauto | always | neverNoProxy strategy for the fetch path. Defaults to auto.
output_formatmarkdown | json | textNoShape of the content field returned in the completed payload. Defaults to markdown.
extract_fieldsstring[]NoOptional extraction hints such as title, body, author, or price.

Response fields

These fields describe the completed payload you read from the current public API contract.

NameTypeDescription
urlstringOriginal URL submitted with the job.
titlestringResolved page title for the extracted document.
contentstringFormatted extraction output based on output_format.
structuredobjectNormalized extracted fields. Includes title, body, and any extracted hints.
metadataobjectFetch details including fetched_at, render_js_used, proxy_used, fetch_path, final_url, content_type, and redirects_followed.
extract_credits_usedintegerCredits charged for the completed job.

Code examples

Start with cURL, then switch to Python, JavaScript, Java, or PHP for the same Extract API flow.

Start with the raw HTTP request and poll flow.

bash
curl -X POST "https://api.orbitscraper.com/v1/extract" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/ai-chip-roundup",
    "render_js": false,
    "use_proxy": "auto",
    "output_format": "markdown",
    "extract_fields": ["title", "body", "author"]
  }'

curl -X GET "https://api.orbitscraper.com/v1/extract/extract_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the shape you get back from the current public API contract for Extract API.

Queued response

The first response confirms the job was accepted and tells you what to poll next.

json
{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "extract_123456",
  "status": "queued",
  "extract_credits_reserved": 2
}

Completed response

After polling, this is the final payload shape your app reads.

json
{
  "job_id": "extract_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "url": "https://example.com/blog/ai-chip-roundup",
  "title": "AI chip roundup",
  "content": "# AI chip roundup\n...",
  "structured": {
    "url": "https://example.com/blog/ai-chip-roundup",
    "title": "AI chip roundup",
    "author": "Orbit Team",
    "body": "Extracted body text..."
  },
  "metadata": {
    "fetched_at": "2026-03-27T10:35:00.000Z",
    "render_js_used": false,
    "proxy_used": false,
    "fetch_path": "direct",
    "final_url": "https://example.com/blog/ai-chip-roundup",
    "content_type": "text/html",
    "redirects_followed": 0
  },
  "extract_credits_used": 2
}
The current contract is async: POST queues the job and GET returns the completed extraction payload.
API key scope required by the backend: extract:read or search:read.
The current deployment bills 2 credits per successful extract request.

Ready to build on Extract API?

The current backend contract is already live. Use the docs page for request details and the pricing page for credit planning.

Start scraping faster - ask Orbit AI.