ScrapeShield – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called ScrapeShield.

ScrapeShield

The ScrapeShield app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

In a nutshell, ScrapeShield’s app includes anti scrape measures such as:

  • content tracking
  • pinterest blocking
  • email obfuscation
  • hotlink protection

more…

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.

BotDefender

BotDefender is a service that protects online e-commerce stores from the automated retrieval of their items’ prices – so their competitors can’t use those prices to outprice them. This anti-scraping tool is crucial if you want to hide your inventory prices from e-store competitors. It’s also interesting to look at from a scraping point of view. more…

Scraping with import.io Magic – The Future?

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.
more…

What is Crawlera?

Screen Shot 2014-11-24 at 12.13.04

Crawlera by Scraping Hub

I came across this tool a few weeks ago, and wanted to share it with you. So far I have not tested it myself, but it is a simple concept- Safely download web pages without the fear of overloading websites or getting banned. You write a crawler script using scruping hub, and they will run through there IP proxies and take care of the technical problems of crawling. more…

Review: import.io’s New Scraping Process and Features

download1

Web scraping Data platform import.io, announced last week that they have secured $3M in funding from investors that include the founders of Yahoo! and MySQL.

They also released a new beta version of the tool that is essentially a better version of their extraction tool, with some new features and a much cleaner and faster user experience. more…

A Simple Email Crawler in Python

Email Crawling I often receive requests asking about email crawling. It is evident that this topic is quite interesting for those who want to scrape contact information from the web (like direct marketers), and previously we have already mentioned GSA Email Spider as an off-the-shelf solution for email crawling. In this article I want to demonstrate how easy it is to build a simple email crawler in Python. This crawler is simple, but you can learn many things from this example (especially if you’re new to scraping in Python). more…

Import.io Enter the Enterprise DaaS Market

Import.io Enterprise
Recently, import.io (a free scraping online tool) announced that they are adding another way to get data from the web: they’ll build it for you. This new “Data as a Service” program is targeted at businesses and organizations who need data, but don’t have the time or resources to devote to using the import.io tool to build it themselves. For these clients, import will curate custom datasets based on their specific requirements as well as develop custom data implementation solutions based on the organization’s in-house software. more…

My Experience in Choosing a Web Scraping Service

Choosing of Web Scraping Services Recently I decided to outsource a web scraping project to another company. I typed “web scraping service” in Google, chose six services from the first two search result pages and sent the project specifications to all of them to get quotes. Eventually I decided to go another way and did not order the services, but my experience may be useful for others who want to entrust web scraping jobs to third party services. more…

Back to top