Getting started

First, find the API key on the API tab of the Parsera web page.
Use it as the value of X-API-KEY header to authenticate your requests.

Agents

Agent that generates reusable custom scrapers which has 2 main steps:

Call generate to build scraper;
Call scrape to run this scraper on a specific URL.

`generate`

Request agent to build a new scraper:

curl https://agents.parsera.org/v1/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter	Type	Default	Description
`name`	`string`	-	Name of the agent
`url`	`string`	-	Website URL
`attributes`	`object`	-	A map of `name` - `description` pairs of data fields to extract from the webpage. Also, you can specify Output Types.
`proxy_country`	`string`	`UnitedStates`	Proxy country, see Proxy Countries
`cookies`	`array`	Empty	Cookies to use during extraction, see Cookies

`scrape`

Apply an existing scraper to the webpage:

curl https://agents.parsera.org/v1/scrape \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11",
}'

You can access pre-built agents by appending public/ to the name, for example public/crunchbase to access crunchbase agent.

Parameters:

Parameter	Type	Default	Description
`name`	`string`	-	Name of the agent
`url`	`string`	-	URL of the webpage to extract data from
`proxy_country`	`string`	`UnitedStates`	Proxy country, see Proxy Countries
`cookies`	`array`	Empty	Cookies to use during extraction, see Cookies

`list`

List all available agents:

curl --location 'https://agents.parsera.org/v1/list' \
--header 'X-API-KEY: <YOUR_API_KEY>'

`remove`

Remove existing agent:

curl --location 'https://agents.parsera.org/v1/remove' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews"
}'

Parameters:

Parameter	Type	Default	Description
`name`	`string`	-	Name of the agent

Extractor

LLM-Powered data extractor is ideal for one-time data extraction and unstructured data.

`extract`

Paste your API key to the X-API-KEY header to send the request to the extract endpoint:

curl https://api.parsera.org/v1/extract \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    },
    "proxy_country": "Germany"
}'

If some data is missing, you can retry with precision mode, which looks into data hidden in HTML tags. For details, see Precision mode.

It's recommended to set the proxy_country parameter to a specific country since a page could be unavailable from some locations.

Parameters:

Parameter	Type	Default	Description
`url`	`string`	-	URL of the webpage to extract data from
`attributes`	`object`	-	A map of `name` - `description` pairs of data fields to extract from the webpage. Also, you can specify Output Types
`mode`	`string`	`standard`	Mode of the extractor, `standard` or `precision`. For details, see Precision mode
`proxy_country`	`string`	`UnitedStates`	Proxy country, see Proxy Countries
`cookies`	`array`	Empty	Cookies to use during extraction, see Cookies

`parse`

In addition to extract, there is a parse endpoint that can be used to parse data generated on your side instead of one from url.
There is a content attribute for passing data, which accepts both raw HTML and string:

curl https://api.parsera.org/v1/parse \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "content": <HTML_OR_TEXT_HERE>,
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter	Type	Default	Description
`content`	`string`	-	Raw HTML or text content to extract data from
`attributes`	`object`	-	A map of `name` - `description` pairs of data fields to extract from the webpage. Also, you can specify Output Types
`mode`	`string`	`standard`	Mode of the extractor, `standard` or `precision`. For details, see Precision mode

`extract_markdown`

You can get a markdown from URL with the extract_markdown endpoint:

curl https://api.parsera.org/v1/extract_markdown \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "proxy_country": "UnitedStates"
}'

Parameters:

Parameter	Type	Default	Description
`url`	`string`	-	URL of the webpage to extract data from
`proxy_country`	`string`	`UnitedStates`	Proxy country, see Proxy Countries
`cookies`	`array`	Empty	Cookies to use during extraction, see Cookies

Credits

standard mode (Default) - 1 Extract per call
precision mode - 10 Extracts per call

Swagger doc

You can also explore the Swagger doc of the Extractor API: https://api.parsera.org/docs#/.

More features

Check out further documentation to explore more features: