Scraping JavaScript protected content

Here we come to one new milestone: the JavaScript-driven web sites scrape.

Recently a friend of mine got stumped as he was trying to get content of a website using PHP simplehtmldom library. He was failing to do it and finally found out the site was being saturated with JavaScript code. The anti-scrape JavaScript insertions do a tricky check to see if the page is requested and processed by a real browser and only if that is true, will it render the rest of page’s HTML code. more…

Writing next generation scraping scripts with Web Robots IDE

webRobots_logoMost scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers (Import.io, Content Grabber, CloudScrape, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug. more…

ScrapeShield – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called ScrapeShield.

ScrapeShield

The ScrapeShield app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

In a nutshell, ScrapeShield’s app includes anti scrape measures such as:

  • content tracking
  • pinterest blocking
  • email obfuscation
  • hotlink protection

more…

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.

BotDefender

BotDefender is a service that protects online e-commerce stores from the automated retrieval of their items’ prices – so their competitors can’t use those prices to outprice them. This anti-scraping tool is crucial if you want to hide your inventory prices from e-store competitors. It’s also interesting to look at from a scraping point of view. more…

Scraping with import.io Magic – The Future?

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.
more…

Back to top