Crawlee
p/crawlee
Crawlee helps you build reliable crawlers. Fast.
Saurav Jain
Crawlee for Python — Build reliable scrapers in Python
Featured
20
We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.
Replies
Saurav Jain
Hello Hunters and Makers, I am Saurav, Developer Community Manager of Apify, the company building Crawlee. I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python. Finally, after a lot of hard work from our team, we are launching Crawlee for Python today. It has all of these features: - Unified interface for HTTP & headless browser crawling. - Automatic parallel crawling based on available system resources. - Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking). - Automatic retries on errors or when you’re getting blocked. - Integrated proxy rotation and session management. - Configurable request routing - direct URLs to the appropriate handlers. - Persistent queue for URLs to crawl. - Pluggable storage of both tabular data and files. - Robust error handling. Why use Crawlee rather than Scrapy? - Crawlee has out-of-the-box support for headless browser crawling (Playwright). - Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. - Complete type hint coverage. - Based on standard Asyncio. Please pass on your feedback and thoughts in the comments below!
Saurav Jain
@csaba_kissi thanks for the support, well you never know ;)
Andreas Sohns
Congratulations on the launch! I love the seamless integration of a headless browser crawling with Playwright. This is fantastic for anyone looking to scrape dynamic content without the hassle of constantly adjusting for JavaScript rendering.
Saurav Jain
@andreas_sohns exactly, thanks for your support :D
Khyati Agarwal
Congratulations on the launch🎉 Amazing work👏 Scraping in headless browser had so many gaps!
Saurav Jain
@khyati_tmw thanks for the support, Khyati!
Kyrylo Silin
This looks like a powerful tool for web scraping and browser automation. How does Crawlee's proxy rotation and session management compare to other tools on the market? Any plans to add more integrations? Congrats on the launch, Saurav!
Saurav Jain
hey @kyrylosilin! we use our [Session Pool](https://crawlee.dev/python/api/c...) system to rotate the sessions, and similar to Crawlee TS/JS we are going to use [Tiered Proxies](https://crawlee.dev/blog/proxy-m...) in Crawlee for Python as well.
Toshit Garg
Congrats on launch of Crawlee for Python....
Sharon Workman
I'm keen to learn about it's performance benchmarks and its speed compared to other scraping solutions.
Jayesh Gohel
Hey @sauain Excited to announce Crawlee for Python! This open-source library simplifies web scraping, browser automation, and data storage. Scrape efficiently, avoid blocks, leverage headless browsers, and enjoy smart proxy rotation
Saurav Jain
@jpgohil93 thanks for the support!
Zubair Collier
I'm interested in its compatibility with popular tools used in web development and data analysis.
Tim David
I'd like to know more about it's security features and how it protects against vulnerabilities and data breaches.
Sophia Gartner
I'm curious about its roadmap for future development and community contributions.
Pratham
Congrats on the launch team! I love to see core technical products making their way in the era of AI wrappers.
Saurav Jain
@prathkum thanks for the support! :)
Zaheer Khan
I'm interested in hearing from early adopters about their experiences using it and any tips they have for maximizing its effectiveness.