Recently I’ve got a note with the question on search engine queries through the web scraping software.
Other posts not belonging to any specific category
Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:
- The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes. more…
As anyone who’s spent any time on the scraping field will know, there are plenty of anti-scraping techniques on the market. And since I regularly get asked what the best way to prevent someone from scraping a site, I thought I’d do a post rounding up some of the most popular methods. If you think I’ve missed any out, please let me know in the comments below!
I recently came across this question in the Q&A section of a forum I belong to:
Sure, if all you want to do is something as lightweight as monitoring a set of target pages for changes, then using a ready monitoring tool is probably way more than you need. You need to keep it simple. So, here’s a quick solution with Google spreadsheet. more…
Recently I received a question in my mail box about scraping data aggregate sites (aka yellow pages) or business directories.
I replied to him directly, but our conversation on business directories was an interesting one that I thought you guys would find useful. more…
In this post we’d like to share an interview with a young service called ScrapeHero. We’ve interviewed Tony Paul (marketing head) and this is what he had to say. more…
Here we come to the next anti-scrape tool, called ScrapeShield.
The ScrapeShield app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.
In a nutshell, ScrapeShield’s app includes anti scrape measures such as:
- content tracking
- pinterest blocking
- email obfuscation
- hotlink protection
Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.
BotDefender is a service that protects online e-commerce stores from the automated retrieval of their items’ prices – so their competitors can’t use those prices to outprice them. This anti-scraping tool is crucial if you want to hide your inventory prices from e-store competitors. It’s also interesting to look at from a scraping point of view. more…
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.
Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.