After almost 3 years in running this scraping blog and reviewing dozens of products; in this small post I’d like to categorise the tools/means used for web scraping available to end user. Here are the typical examples of scrapers in those categories.
Web scraping tools/means landscape table
Web scraping tool/means Description Example
Desktop scraping software Download and scrape it yourself. Often more advanced tech knowlege is required. Visual Web Ripper and many others. See this post.
Cloud scraping services (sometimes coupled with desktop app.) Self service - no real download needed but no human interaction from a service provider. Import.io, Mozenda, Kimonolab
Data as a service (DaaS) A service does all scrape related jobs, ofter having data on subscription ScrapeHero, Import.io for Enterprise, webRobots.io
Browser plugin software Self-service data gathering and downloading OutwitHub
Small scraping companies /Freelancers Companies that consult and do small to medium size scrape projects plenty of those
As you can see, the landscape of scraping tools consists of a myriad of different options: the pure scraping software, the cloud-based scraping tool, the free browser plugins, and of course the data as a service (DaaS) alternative.
When comparing cloud scraping services with non-desktop browser plugin software, it’s important to remember that with the former, the extracted data is stored on cloud servers making it accessible on demand; while with the latter the data can only be browser bufferized up to 5M in size. Another benefit of cloud web scraping services is that you can often apply data mining techniques directly to data harvested.
The popularity over time worldwide
This web data scraping field is getting more and more on demand (based on Google search queries) and you might even see
To give you an idea about the popularity of scraping, I used Google trends to look at the amount of search traffic for each of the terms above over time. From the corresponding graph, you can see the ‘data scraping’ search queries frequency is on the rise. Interestingly, the term ‘data as service‘ has not appeared in the Google trends results, but I believe it soon will – making it an interesting field for venture capital investment.
These trends show the growing need for web based scraping tools in general and the growing popularity for scraping services (in blue) while the still bigger popularity of scraping software (in yellow) is decreasing.
The cloud era is also influencing web scraping. Why mess with local servers and data storage if a cloud tool – properly set up – can successfully both request data and store for further processing? Perhaps we are not there yet, but we are not far away from cloud tools being useable for the majority of web scraping tasks.
In following posts, we will have a more thorough look at the web scraping tools mentioned above.