Screen Shot 2014-11-24 at 12.13.04

Crawlera by Scraping Hub

I came across this tool a few weeks ago, and wanted to share it with you. So far I have not tested it myself, but it is a simple concept- Safely download web pages without the fear of overloading websites or getting banned. You write a crawler script using scruping hub, and they will run through there IP proxies and take care of the technical problems of crawling.

What is Crawlera?

Crawlera is a smart HTTP/HTTPS downloader designed specifically for web crawling and scraping. It routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems. As a scraping user, you no longer have to worry about tinkering with download delays, concurrent requests, user agents, cookies or referrers to avoid getting banned, you just use Crawlera to download pages instead. Some plans provide a standard HTTP proxy API, so you can configure it in your crawler of choice and start crawling.

http://crawlera.com/

Using Crawlera, you should be able to mitigate any problems/overheads associated with crawling websites which is the beauty of this tool for scraping hub users.

FEATURES

  • No need to think about number of IPs or delay, just fetch clean pages, as fast as possible
  • Automatic retrying and throttling to crawl politely and prevent bans
  • Add your own proxies if you need more bandwidth
  • HTTPS support
  • POST support
  • HTTP proxy API for seamless integration

The service is also available via Mashape.com as an API.

As i’ve said; I have not used this service myself, if you wanted to use it, there is a pricing tool you can use to estimate costs on their site.

Screen Shot 2014-11-24 at 12.11.35

http://scrapinghub.com/pricing

If you have experience using this tool, please leave a comment with your experiences.