Miscellaneous

Other posts not belonging to any specific category

My site is being scraped, how can I prevent being scraped?

As anyone who’s spent any time on the scraping field will know, there are plenty of anti-scraping techniques on the market. And since I regularly get asked what the best way to prevent someone from scraping a site, I thought I’d do a post rounding up some of the most popular methods. If you think I’ve missed any out, please let me know in the comments below!

If you are interesting of how to find out if your site is being scraped, then turn to this post: How to detect your site is being scraped?

more…

Simple way HTML change monitoring

Html change monitoring logoI recently came across this question in the Q&A section of a forum I belong to:

“I want to run once a day a script that will check whether the specific part of code has been changed, and if it did, we would get some return message (ideally directly to my email). What would be the easiest, simplest way to do that? I’ve read about web crawlers, web scrappers, but they seem to be doing far more than we need.”

Sure, if all you want to do is something as lightweight as monitoring a set of target pages for changes, then using a ready monitoring tool is probably way more than you need. You need to keep it simple. So, here’s a quick solution with Google spreadsheet. more…

Scraping JavaScript protected content

Here we come to one new milestone: the JavaScript-driven web sites scrape.

Recently a friend of mine got stumped as he was trying to get content of a website using PHP simplehtmldom library. He was failing to do it and finally found out the site was being saturated with JavaScript code. The anti-scrape JavaScript insertions do a tricky check to see if the page is requested and processed by a real browser and only if that is true, will it render the rest of page’s HTML code. more…

ScrapeShield – a limited feature anti-content-duplicate tool

Here we come to the next anti-scrape tool, called ScrapeShield.

ScrapeShield

The ScrapeShield app has been developed by CloudFlare to guard a site’s content. Its features are limited number, but it’s still an interesting tool to look at for anyone interested in web scraping.

In a nutshell, ScrapeShield’s app includes anti scrape measures such as:

  • content tracking
  • pinterest blocking
  • email obfuscation
  • hotlink protection

more…

BotDefender Analysis

Here I’d like you to get familiar with an online scraping protection service called BotDefender. It’s interesting both to know how to use it (in case you want to protect your data) and to understand how it works in case you ever come across it while collecting data.

BotDefender

BotDefender is a service that protects online e-commerce stores from the automated retrieval of their items’ prices – so their competitors can’t use those prices to outprice them. This anti-scraping tool is crucial if you want to hide your inventory prices from e-store competitors. It’s also interesting to look at from a scraping point of view. more…

Scraping with import.io Magic – The Future?

importtop
Over the last one or two years there has been a lot of maturing in the area of visual Web Scrapers. New companies like ParseHub, ScrapingHub and Kimono are bringing new tools to the market, while industry veterans like Outwithub, visual web ripper and Mozenda continue to update their great tooling to annotate/train scrapers and extract web data.

Interestingly, something has changed now. Import.io has created a new tool which is a little bit different on the surface, and having spoken to them, a LOT different under the hood.
more…

7 Ways to Protect Website from Scraping and How to Bypass this Protection

Anti-Scraper In this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.
more…

An Independent Test of 7 Hosting Providers

HistingChoosing a provider is not an easy task, you always want to find something «cheap and cheerful». However, quite often it is hard to find a golden mean and you have to choose between computing power, speed, and cost, not mentioning additional features such as DNS-servers, control panel, etc. In this article, I will present you test results for several providers of various sizes, and I’m hoping that it will guide you in a decision-making process of choosing a hosting. more…

Back to top