Crawlee for Python : p/crawlee | Product Hunt

Sign in

p/crawlee Crawlee helps you build reliable crawlers. Fast.

Start new thread

Crawlee for Python - Build reliable scrapers in Python

by

Crawlee

•

8mo ago

We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked, headless browsers, and smart proxy rotation.

Replies

Best

Crawlee

Maker

📌

Hello Hunters and Makers, I am Saurav, Developer Community Manager of Apify, the company building Crawlee. I am happy to hunt Crawlee for Python today. We launched (Crawlee) in August 2022 and received an amazing response from the community, as well as continuous demand for building it in Python. Finally, after a lot of hard work from our team, we are launching Crawlee for Python today. It has all of these features: - Unified interface for HTTP & headless browser crawling. - Automatic parallel crawling based on available system resources. - Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking). - Automatic retries on errors or when you’re getting blocked. - Integrated proxy rotation and session management. - Configurable request routing - direct URLs to the appropriate handlers. - Persistent queue for URLs to crawl. - Pluggable storage of both tabular data and files. - Robust error handling. Why use Crawlee rather than Scrapy? - Crawlee has out-of-the-box support for headless browser crawling (Playwright). - Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code. - Complete type hint coverage. - Based on standard Asyncio. Please pass on your feedback and thoughts in the comments below!

8mo ago

Crawlee

Maker

@csaba_kissi thanks for the support, well you never know ;)

8mo ago

Congratulations on the launch🎉 Amazing work👏 Scraping in headless browser had so many gaps!

8mo ago

Crawlee

Maker

@khyati_tmw thanks for the support, Khyati!

8mo ago

Flag Match

This looks like a powerful tool for web scraping and browser automation. How does Crawlee's proxy rotation and session management compare to other tools on the market? Any plans to add more integrations? Congrats on the launch, Saurav!

8mo ago

Crawlee

Maker

hey @kyrylosilin! we use our [Session Pool](https://crawlee.dev/python/api/c...) system to rotate the sessions, and similar to Crawlee TS/JS we are going to use [Tiered Proxies](https://crawlee.dev/blog/proxy-m...) in Crawlee for Python as well.

8mo ago

Complete Guide to CSS Grid (free)

Congrats on the launch team! I love to see core technical products making their way in the era of AI wrappers.

8mo ago

Crawlee

Maker

@prathkum thanks for the support! :)

8mo ago

Congratulations on the launch! I love the seamless integration of a headless browser crawling with Playwright. This is fantastic for anyone looking to scrape dynamic content without the hassle of constantly adjusting for JavaScript rendering.

8mo ago

Crawlee

Maker

@andreas_sohns exactly, thanks for your support :D

8mo ago

Hey @sauain Excited to announce Crawlee for Python! This open-source library simplifies web scraping, browser automation, and data storage. Scrape efficiently, avoid blocks, leverage headless browsers, and enjoy smart proxy rotation

8mo ago

Crawlee

Maker

@jpgohil93 thanks for the support!

8mo ago

Congrats on launch of Crawlee for Python....

8mo ago

I'm interested in hearing from early adopters about their experiences using it and any tips they have for maximizing its effectiveness.

8mo ago

I'm curious about its roadmap for future development and community contributions.

8mo ago

I'm keen to learn about it's performance benchmarks and its speed compared to other scraping solutions.

8mo ago

I'd like to know more about it's security features and how it protects against vulnerabilities and data breaches.

8mo ago

I'm interested in its compatibility with popular tools used in web development and data analysis.

8mo ago