Skip to content

Getting started

First, find the API key on the API tab of the Parsera web page.
Use it as the value of X-API-KEY header to authenticate your requests.

Agents

Agent that generates reusable custom scrapers which has 2 main steps:

  1. Call generate to build scraper;
  2. Call scrape to run this scraper on a specific URL.

generate

Request agent to build a new scraper:

curl https://agents.parsera.org/v1/generate \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter Type Default Description
name string - Name of the agent
url string - Website URL
attributes object - A map of name - description pairs of data fields to extract from the webpage. Also, you can specify Output Types.
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

scrape

Use an existing scraper on the webpage:

curl https://agents.parsera.org/v1/scrape \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews",
    "url": "https://news.ycombinator.com/front?day=2024-09-11",
}'
Parameters:

Parameter Type Default Description
name string - Name of the agent
url string - URL of the webpage to extract data from
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

list

List all available agents:

curl --location 'https://agents.parsera.org/v1/list' \
--header 'X-API-KEY: <YOUR_API_KEY>'

remove

Remove existing agent:

curl --location --request DELETE 'https://agents.parsera.org/v1/remove' \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "name": "hackernews"
}'
Parameters:

Parameter Type Default Description
name string - Name of the agent

Extractor

LLM-Powered data extractor is ideal for one-time data extraction and unstructured data.

extract

Paste your API key to the X-API-KEY header to send the request to the extract endpoint:

curl https://api.parsera.org/v1/extract \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    },
    "proxy_country": "Germany"
}'

If some data is missing, you can retry with precision mode, which looks into data hidden in HTML tags. For details, see Precision mode.

It's recommended to set the proxy_country parameter to a specific country since a page could be unavailable from some locations.

Parameters:

Parameter Type Default Description
url string - URL of the webpage to extract data from
attributes object - A map of name - description pairs of data fields to extract from the webpage. Also, you can specify Output Types
mode string standard Mode of the extractor, standard or precision. For details, see Precision mode
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

parse

In addition to extract, there is a parse endpoint that can be used to parse data generated on your side instead of one from url.
There is a content attribute for passing data, which accepts both raw HTML and string:

curl https://api.parsera.org/v1/parse \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "content": <HTML_OR_TEXT_HERE>,
    "attributes": {
        "title": "News title",
        "points": "Number of points"
    }
}'

Parameters:

Parameter Type Default Description
content string - Raw HTML or text content to extract data from
attributes object - A map of name - description pairs of data fields to extract from the webpage. Also, you can specify Output Types
mode string standard Mode of the extractor, standard or precision. For details, see Precision mode

extract_markdown

You can get a markdown from URL with the extract_markdown endpoint:

curl https://api.parsera.org/v1/extract_markdown \
--header 'Content-Type: application/json' \
--header 'X-API-KEY: <YOUR_API_KEY>' \
--data '{
    "url": "https://news.ycombinator.com/",
    "proxy_country": "UnitedStates"
}'

Parameters:

Parameter Type Default Description
url string - URL of the webpage to extract data from
proxy_country string UnitedStates Proxy country, see Proxy Countries
cookies array Empty Cookies to use during extraction, see Cookies

Credits

  • standard mode (Default) - 1 Extract per call
  • precision mode - 10 Extracts per call

Swagger doc

You can also explore the Swagger doc of the Extractor API: https://api.parsera.org/docs#/.

More features

Check out further documentation to explore more features: