Product Hunt – The best new products in tech.

Start new thread

Topic Forums

p/general

p/product-recommendations

Product Forums

p/scraping-hub

Turn web content into useful data

Visit Product

Portia — Scrape websites visually

Charlie Irish

Featured

•

9yr ago

Replies

Kumar Thangudu

Crypto Buyer's Guide

@samir_doshi ;)

9yr ago

Dre Durr💡

@datarade sounds like you should make a collection of Scrapers.

9yr ago

Sahil Chaturvedi

Ader

@datarade Jeez that's a lot! Are they all similar, or just different use cases?

9yr ago

Kumar Thangudu

Crypto Buyer's Guide

@giannidalerta follow me and check out my blog. My plan is to continue these type of comments. ;)

9yr ago

Nick Kwan

Pakible

@datarade I've seen a lot of your epic scraper comments on PH, so looks like you are the scraper king haha. What are your go-to scrapers nowadays for social networks like Linkedin + Twitter? Looking for something web based / usable on Mac. Data-miner.io is good, but a bit complex for us non-technicals. Many sites like Portia, Import, etc don't work on these sites.

8yr ago

Gabriel Puliatti

I've worked for Scrapinghub for the past two years, happy to answer any questions about Portia, its big brother Scrapy, or any part of our platform... or even web scraping in general! Us and our users are currently crawling 3.5 pages billion per month, or around 80,000 pages per minute. So we know a little bit about scraping. :)

9yr ago

Jake Miller

@gpuliatti I'm curious to hear your perspective to) Pala tired acquisition and sudden shutdo not of kimono labs and how they handled that?

9yr ago

Gabriel Puliatti

@jpmillions We're open source guys (and gals)… so we're definitely saddened to see users of a closed platform treated this way. OTOH, we've seen a lot of customers coming to try Portia from Kimono. :) We're actively working on ways to help people port their Kimono crawlers, so keen on hearing anyone in this boat! Email me directly (gabriel@scrapinghub.com) or sign up to the mailing list at the bottom of this post (https://goo.gl/CGxsFl). Both Portia and Scrapy are fully open source, and any crawlers (created or running) in our platform are fully exportable and interoperable with open technologies. While we are focused on the long-term and so doubt our platform will be shut down any time soon, if that ever happened, all of our users would be able to export their crawlers and use them on their own infrastructure. We've done this ourselves for some of our Professional Services clients who want us to build scrapers but also run things on their own infrastructure.

9yr ago

Evan Lodge

Higherme

@gpuliatti I tried scraping a list of 27,000 urls... the browser crashed. Is there any easier way of adding URLs to the scrape?

9yr ago

tomkelshaw

ScrapingHub crew have been doing this a long time, and deliver good service. Since the untimely demise/acquihire of KimonoLabs, I'll be giving this a try.

9yr ago

oty

Awesome !!! Curious to know the limit of the data treatment for Big Data use case

9yr ago

Nick Kwan

Pakible

@pablohoffman @gpuliatti doesn't look like Portia can currently handle scraping Twitter. Linkedin is even more secure than Twitter...Am I doing something wrong or is this how it is? Any suggestions for another scraper? Looking for something web based / usable on Mac. Data-miner.io is good, but a bit complex for us non-technicals.

8yr ago

dataflowkit

@pablohoffman @gpuliatti @nwkwan I'm sorry for late response. We've released new service for data scraping https://www.producthunt.com/post... . It is able to extract long infinite scrolled pages. I would really appreciate your expert review of our DFK service

6yr ago

Saijo George

tl;dr Marketing

Nice product. How do you guys compare to https://www.producthunt.com/tech... , that is my go to scraper these days

9yr ago

Christien Louviere

SellPersonal - Dave Williams on Tech Companies and Entrepreneurship

Cool and freaky logo!

9yr ago

Yiğitcan Kutay Güler

Looks great! That's all I can say since I haven't been able to actually open the dashboard.. Is there a problem with the site? @gpuliatti

9yr ago

Gabriel Puliatti

@ykguler We got hit by the Product bump and our dashboard was having some issues… looks like things are working again.

9yr ago

Elia Morling

Swap Ideas

Looks cool. I am curious to learn what the top uses case for Portia are? I understand that people scrape, but what interesting things do they do with the data?

9yr ago

Gabriel Puliatti

@tribaling A few use-cases from our past client projects: - Scrape eCommerce sites that sell your products, to check for price violations and review data. - Build a broad crawler covering thousands of sites to automatically discover contact and profiles information for a specific industry. - Parse all shop locations for a number of big brands to provide a locator for users looking for a specific type of shop. - Build a database of interesting candidates to hire, by matching various sources of internet profiles with a series of filters which you or the HR team are interested in. I know people building boutique businesses on basic web scraping… like someone who uses our platform to offer a service that allows people to monitor Amazon Kindle Books pricing, and get alerted when the price drops or the book goes on sale. In effect, bringing Amazon's data "back to the people" to allow them to make better choices. But of course, most of the $$$ value comes from being a Fortune 500 company and being able to understand a lot more about the world, your industry and your competition. We help both large and small increase their reach and get access to the best technology. :)

9yr ago

John Alexander

do you have a free version? having a hard time figuring out these "plans"

9yr ago

Gabriel Puliatti

@johnalxndr Everything should be back up now! We do have a free plan, it's the big box on http://scrapinghub.com/pricing. You can get shared resources and 1 concurrent crawl for 0$ a month.

9yr ago

Sam Dickie

Softr

Cheers Gabriel! I have been looking at developing an app similar to the likes of Flipboard for a while, but i have been recently looking into how they currently scrap for content. Is this a complex system?

9yr ago

Gabriel Puliatti

@thisdickie For something that needs to grab data from any generic news site like Flipboard, Portia may be a bit limited. I would recommend using Scrapy, and using one of the many content parsing libraries available for Python. Moz has a great write-up of the available ones on their blog: https://moz.com/devblog/benchmar...

9yr ago

Dru Wynings

Sensible Instruct

@thisdickie Hey Sam, you might also want to check out http://www.diffbot.com/ as we power more than a few article reading apps

9yr ago

Kevin Simper

I just tried signing up for Portia. They had a very cool introduction called Ben that wanted me to invite my team members. Funny because I don't think of scrapping as a team sport. Next I tried creating a spider, but there were no help here at all. I had to go to the docs and I still did not understand what I was doing. I clicked some elements on a page that I wanted scraped, but first afterwards discovered that I had to define the fields I wanted scraped and then annotate the page. On top of that it seems like they are out of capacity because I suddenly got a fatal mistake from the crawler when pressing "Test". Looks good, but still a long way! 👍

9yr ago

Heather Redman

Really useful--web data for everyone on an on-demand accessible basis. Would be interested in whether Portia also provides data structuring?

9yr ago

Ken Kaczmarek

Nice to see another option for scraping! Other than the open-source angle, how would you compare Portia to what Import.io offers?

9yr ago

Gabriel Puliatti

@wanderslth Hi Ken, we think Import.io is great… I've personally used their browser tools quite a few times to do something quick. I'd say the biggest benefit you get with Portia is actually our platform… every project has a great API (http://doc.scrapinghub.com/api/o...) which you can use to schedule your crawls and run crawls, as well as review and download data. You can use Crawlera's proxies, send data to various sources like Amazon S3 (plus images and files separately if you want) or even machine learning services for extra text analysis by enabling add-ons with a couple of clicks. Portia crawlers behave quite similarly to Scrapy crawlers, which means joint-projects are possible… Portia tackling the low-hanging fruit without needing engineers, and Scrapy doing the heavy lifting on the sites that need it. Having an easier option helps lower overall project costs, and Scrapy allows engineers to piece all parts together and crawl the tougher sites. This has allowed us to handle much bigger projects that we could with only one or the other.

9yr ago

Ken Kaczmarek

@gpuliatti ah; cool -- I took a poke around the larger site and looks... comprehensive. :) I'll definitely bookmark for a deeper dive here soon. Thanks.

9yr ago

Iverson Dantas

What do you think about tools to monitor updates social medias on twitter, facebook, youtube and instagram and also extract data from them?

6yr ago

Samir Doshi

Relayo

This is a huge improvement on some of those other wysiwyg macro scrapers out there that just don't work.

9yr ago