We are launching Crawlee for Python, an open-source library for web scraping and browser automation.
Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.
Hello Hunters and Makers,
I am Saurav, Developer Community Manager of Apify, the company building Crawlee.
I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python.
Finally, after a lot of hard work from our team, we are launching Crawlee for Python today.
It has all of these features:
- Unified interface for HTTP & headless browser crawling.
- Automatic parallel crawling based on available system resources.
- Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
- Automatic retries on errors or when you’re getting blocked.
- Integrated proxy rotation and session management.
- Configurable request routing - direct URLs to the appropriate handlers.
- Persistent queue for URLs to crawl.
- Pluggable storage of both tabular data and files.
- Robust error handling.
Why use Crawlee rather than Scrapy?
- Crawlee has out-of-the-box support for headless browser crawling (Playwright).
- Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code.
- Complete type hint coverage.
- Based on standard Asyncio.
Please pass on your feedback and thoughts in the comments below!
Congratulations on the launch! I love the seamless integration of a headless browser crawling with Playwright. This is fantastic for anyone looking to scrape dynamic content without the hassle of constantly adjusting for JavaScript rendering.
This looks like a powerful tool for web scraping and browser automation. How does Crawlee's proxy rotation and session management compare to other tools on the market? Any plans to add more integrations?
Congrats on the launch, Saurav!
Hey @sauain
Excited to announce Crawlee for Python! This open-source library simplifies web scraping, browser automation, and data storage. Scrape efficiently, avoid blocks, leverage headless browsers, and enjoy smart proxy rotation
Crawlee