Welcome to Parsera
Parsera is a lightweight Python library for scraping websites with LLMs.
You can run clone and run it locally or use an API, which provides more scalable way and some extra features like built-in proxy.
Installation
Basic usage
If you want to use OpenAI, remember to set up OPENAI_API_KEY
env variable.
You can do this from python with:
Next, you can run a basic version that uses gpt-4o-mini
from parsera import Parsera
url = "https://news.ycombinator.com/"
elements = {
"Title": "News title",
"Points": "Number of points",
"Comments": "Number of comments",
}
scraper = Parsera()
result = scraper.run(url=url, elements=elements)
result
variable will contain a json with a list of records:
[
{
"Title":"Hacking the largest airline and hotel rewards platform (2023)",
"Points":"104",
"Comments":"24"
},
...
]
There is also arun
async method available:
Running with CLI
Before you run Parsera
as command line tool don't forget to put your OPENAI_API_KEY
to env variables or .env
file
Usage
You can configure elements to parse using JSON string
or FILE
.
Optionally, you can provide FILE
to write output.
More features
Check out further documentation to explore more features: