As we deal with different web scrapers, a problem appeared: while a custom scraper can be focused to your specific needs and scrape everything, off-the-shelf web scrapers are often quite generic and mostly designed to perform common, simple tasks. In other words, they may appear not to be as flexible and universal as you’d expect. Of course, all web scraper developers try to make their products scrape all kinds of web pages, but we realized some of them are better suited for one type of task and others, for another.
What web scraper should you buy that will serve you for a long time with different future tasks? This question spurred us to start a “Web Scraper Test Drive” project. And here are some points about this project:
- We were kindly presented with web scrapers from 10 companies for us to test them
- A special testing ground was created with several difficult cases, which we are tested within our web harvesting practice (some of them are not available on that page yet, but we’ll open them as our testing goes on)
- We tried to scrape each of those cases with each of the web scrapers and post the results of each trial
- If we couldn’t do it ourselves, we contacted the scraper developers to see how they could help us
We’ve done all the hard work, so you can just examine the results and decide which scraper best fits your needs. Keep in touch!
Ready… Set… GO!
Those who have left the Scraper Test Drive
The Test Drive is open for any software web scraper to participate. But some have left the contest. So, as we perform the Test Drive on the scrapers, we want to share why some scrapers have left the race.
In the very first test with Web Data Extractor, we encountered a problem of grouping scraped cells of data into a result table. Support said: “Information in any case will be given as list of items.” This made us to conclude this scraper is solely for mass gathering of emails, URLs, text and so on, although it has advanced now with a visual user interface. Following that, we excluded it from the Test Drive.
After the first stage (Table Report scrape), the developers were not happy with the performance of the scraper. We asked them to state the cause of their leaving the contest. They said: “We are leaving the Test Drive since WebHarvy is not a general purpose web scraper and not suited to scrape data as per the requirements in many of the test cases. WebHarvy has been built to enable users to scrape data from ‘well formatted’ paginated lists, with minimum amount of interaction from the user’s part.”
If you have any questions or suggestions about the Test Drive, feel free to comment below.