API URL
POST /health, GET /metrics, GET /api/v1/batch/scrape/:id (poll batch job), GET /api/v1/crawl/:id (poll crawl job)
Headers
Code
Content-Type: application/jsonBody Parameters
Param
Type
Description
url
required
string
Public http/https URL to scrape crawl or map (SSRF-guarded)
urls
required
array
Array of public URLs for async batch scrape (1-100 items)
formats
optional
array
Output formats: markdown html rawHtml links images screenshot. Default [""markdown""]
format
optional
string
Legacy single-format alias: json markdown html. json maps to markdown
waitUntil
optional
string
Puppeteer navigation wait event: load domcontentloaded networkidle0 networkidle2. Default domcontentloaded
waitFor
optional
integer
Extra wait in ms after navigation (0-10000). Default 0
onlyMainContent
optional
boolean
Apply Mozilla Readability to isolate main article content. Default true
includeTags
optional
array
CSS selectors to keep exclusively in extracted content
excludeTags
optional
array
CSS selectors to remove before extraction
headers
optional
object
Custom HTTP headers forwarded with browser request
mobile
optional
boolean
Emulate mobile viewport. Default false
blockAds
optional
boolean
Block ad-related network requests. Default true
maxAge
optional
integer
Redis cache TTL in ms for this result (0 = bypass). Default 0
timeout
optional
integer
Browser operation timeout in ms (1000-300000). Default 35000. Scrape only.
actions
optional
array
Ordered browser actions before extraction (scrape only max 50): wait click write press scroll screenshot execute-javascript
limit
optional
integer
Max pages for crawl (1-10000 default 100) or max URLs for map (1-100000 default 5000)
maxDepth
optional
integer
Max BFS discovery depth for crawl. Omit for unlimited.
includePaths
optional
array
Regex patterns URL paths must match (crawl and map)
excludePaths
optional
array
Regex patterns URL paths must not match (crawl and map)
allowSubdomains
optional
boolean
Follow subdomain links during crawl. Default false.
allowExternalLinks
optional
boolean
Follow external domain links during crawl. Default false.
scrapeOptions
optional
object
Per-page scrape settings for crawl: formats waitUntil onlyMainContent blockAds
search
optional
string
Keyword to filter and relevance-sort map results by URL title or description
includeMetadata
optional
boolean
Enrich map link entries with title and description. Default false.
skip
optional
integer
Pagination offset for job result arrays (batch and crawl poll). Default 0.
Example Request
Code
POST /api/v1/scrape {"url":"https://example.com/article","formats":["markdown"]}Successful Response
Status
HTML
200 OKBody
Code
{"success":true,"url":"https://example.com/article","title":"Article Title","text":"Plain text content...","excerpt":"Summary...","byline":null,"siteName":"Example","length":1200,"content":"<article>...</article>","markdown":"# Article Title
Plain text content...","metadata":{"timingMs":3500,"cached":false},"correlationId":"req_abc123"}Field
Type
No output fields documented.
No output fields documented.
Error Response
Example
Code
400,ValidationError,Joi validation failure or SSRF-blocked or invalid URL,Request rejected before navigation, 401,AuthenticationError,Missing or invalid API key at gateway layer,Authentication required, 404,NotFound,Unknown endpoint path or async job ID not found,Resource not found, 422,ExtractionError,Page loaded but content not extractable or scrape failed after navigation,Extraction failed after navigation, 429,RateLimitError,Rate limit exceeded or browser pool saturated (POOL_SATURATED),Retry after retryAfter seconds, 500,InternalError,Unexpected failure (message in development only),Internal server error, 504,GatewayTimeout,Route-level timeout exceeded (35s scrape 60s map),Operation timed outLast Updated: May 4, 2026