LIVE PRODUCT

Extract API

Extract API turns a single URL into normalized content through an async job flow. It supports rendered fetches, proxy mode selection, and structured output that is easier to store or pass downstream than raw page HTML.

Endpoint

POST /v1/extract

Poll GET /v1/extract/:jobId for the completed extraction result.

Credits

2 credits per successful request

Credits are reserved when the job is queued and finalized when the job completes.

Output

Markdown, JSON, or text

Choose output_format to shape the returned content field.

Request parameters

Name	Type	Required	Description
url	string	Yes	Target URL to fetch and parse.
render_js	boolean	No	Render the page in a browser-backed path when true. Defaults to false.
use_proxy	auto \| always \| never	No	Proxy strategy for the fetch path. Defaults to auto.
output_format	markdown \| json \| text	No	Shape of the content field returned in the completed payload. Defaults to markdown.
extract_fields	string[]	No	Optional extraction hints such as title, body, author, or price.

Response fields

Name	Type	Description
url	string	Original URL submitted with the job.
title	string	Resolved page title for the extracted document.
content	string	Formatted extraction output based on output_format.
structured	object	Normalized extracted fields. Includes title, body, and any extracted hints.
metadata	object	Fetch details including fetched_at, render_js_used, proxy_used, fetch_path, final_url, content_type, and redirects_followed.
extract_credits_used	integer	Credits charged for the completed job.

Code examples

Switch languages in one place. The examples below all follow the current Extract API contract.

Start with the raw HTTP request and poll flow.

bash

curl -X POST "https://api.orbitscraper.com/v1/extract" \
  -H "x-api-key: ORS_live_1234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/ai-chip-roundup",
    "render_js": false,
    "use_proxy": "auto",
    "output_format": "markdown",
    "extract_fields": ["title", "body", "author"]
  }'

curl -X GET "https://api.orbitscraper.com/v1/extract/extract_123456" \
  -H "x-api-key: ORS_live_1234567890"

Response examples

This is the payload shape you get back from the current public API contract for Extract API.

Queued response

The first response confirms the job was accepted and tells you what to poll.

json

{
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "job_id": "extract_123456",
  "status": "queued",
  "extract_credits_reserved": 2
}

Completed response

After polling, this is the final payload your app reads.

json

{
  "job_id": "extract_123456",
  "request_id": "req_xyz",
  "trace_id": "trace_xyz",
  "status": "completed",
  "url": "https://example.com/blog/ai-chip-roundup",
  "title": "AI chip roundup",
  "content": "# AI chip roundup\n...",
  "structured": {
    "url": "https://example.com/blog/ai-chip-roundup",
    "title": "AI chip roundup",
    "author": "Orbit Team",
    "body": "Extracted body text..."
  },
  "metadata": {
    "fetched_at": "2026-03-27T10:35:00.000Z",
    "render_js_used": false,
    "proxy_used": false,
    "fetch_path": "direct",
    "final_url": "https://example.com/blog/ai-chip-roundup",
    "content_type": "text/html",
    "redirects_followed": 0
  },
  "extract_credits_used": 2
}

Operational notes

The current contract is async: POST queues the job and GET returns the completed extraction payload.
API key scope required by the backend: extract:read or search:read.
The current deployment bills 2 credits per successful extract request.