Skip to content

Getting started

First, find the API key on the API tab of the Parsera web page.
Use it as the value of X-API-KEY header to authenticate your requests.

Agents

Agent that generates reusable custom scrapers which has 2 main steps:

  1. Call generate to build scraper;
  2. Call scrape to run this scraper on a specific URL.

generate

Request agent to build a new scraper:

curl https://agents.parsera.org/v1/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter Type Default Description
name string - Name of the agent
url string - Website URL
attributes array - A map of name - description pairs of data fields to extract from the webpage
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

scrape

Use an existing scraper on the webpage:

curl https://agents.parsera.org/v1/scrape \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11",
}'
Parameters:

Parameter Type Default Description
name string - Name of the agent
url string - URL of the webpage to extract data from
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

list

List all available agents:

curl --location 'https://agents.parsera.org/v1/list' \
--header 'X-API-KEY: <YOUR_API_KEY>'

remove

Remove existing agent:

curl --location --request DELETE 'https://agents.parsera.org/v1/remove' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews"
}'
Parameters:

Parameter Type Default Description
name string - Name of the agent

Extractor

LLM-Powered data extractor is ideal for one-time data extraction and unstructured data.

extract

Paste your API key to the X-API-KEY header to send the request to the extract endpoint:

curl https://api.parsera.org/v1/extract \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    },
    "proxy_country": "Germany"
}'

If some data is missing, you can retry with precision mode, which looks into data hidden in HTML tags. For details, see Precision mode.

It's recommended to set the proxy_country parameter to a specific country since a page could be unavailable from some locations.

Parameters:

Parameter Type Default Description
url string - URL of the webpage to extract data from
attributes array - A map of name - description pairs of data fields to extract from the webpage
mode string standard Mode of the extractor, standard or precision. For details, see Precision mode
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

parse

In addition to extract, there is a parse endpoint that can be used to parse data generated on your side instead of one from url.
There is a content attribute for passing data, which accepts both raw HTML and string:

curl https://api.parsera.org/v1/parse \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "content": <HTML_OR_TEXT_HERE>,
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter Type Default Description
content string - Raw HTML or text content to extract data from
attributes array - A map of name - description pairs of data fields to extract from the webpage
mode string standard Mode of the extractor, standard or precision. For details, see Precision mode

extract_markdown

You can get a markdown from URL with the extract_markdown endpoint:

curl https://api.parsera.org/v1/extract_markdown \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "proxy_country": "UnitedStates"
}'

Parameters:

Parameter Type Default Description
url string - URL of the webpage to extract data from
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

Credits

  • standard mode (Default) - 1 Extract per call
  • precision mode - 10 Extracts per call

Swagger doc

You can also explore the Swagger doc of the Extractor API: https://api.parsera.org/docs#/.

More features

Check out further documentation to explore more features: