Web Scraping Software
Reviews, Tests, Tutorials, Ratings and Comparisons of Web Scraping Programs and Services
I want to extract the hotel name and the current room price of some hotels daily from https://www.expedia.ca/Hotel-
I am a small hotel owner and want those info quite often, and hope I can do it with codes automatically in someway. You are expert in this field, what is the easiest ways to get those information? Can you give me some example codes? more…
Some may argue that extracting 3 records per minute is not fast enough for an automated scraper (see my last post on Dexi multi-threaded jobs). However, you should realize that Dexi extractor robots behave like a full-blown modern browser and fetch all the resources that crawled pages load (CSS, JS, fonts, etc.).
Octoparse is a new, modern, visual web data extraction software. It has always committed itself to providing users with a more professional data scraping service and to becoming one of the most popular web scraper tools.
It has released a new version of the tool, 6.4.1, in March 2017 with some new features and a much faster and better user experience. more…
UiPath, one of the big providers of robotic process automation software, has some very interesting positioning. Unlike the other players on the market, they provide a free and fully featured community edition of their product for anybody to test and develop. The tool automates any application and is packed with all the web scraping and screen scraping capabilities for both desktop and web. The platform also has a lively community forum featuring jobs, automation contests and knowledge-sharing between UiPath users: www.forum.uipath.com. more…
Mozenda is a cloud web scraping service (SaaS), and we’ve already reviewed it. Since our last review, Mozenda has provided more useful utility features for data extraction. Besides multi-threaded extraction & smart data aggregation, Mozenda allows users to publish extracted data to cloud storage such as Dropbox, Amazon, and Microsoft Azure. In this post we will try to explain the new Mozenda extraction and integration capabilities. more…
Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.
We have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name: Dexi.io.
Pipes for aggregation and post-processing
Dexi.io has relaunched and rebranded from its early-stage name CloudScrape. The company has also released a new product, Pipes. Pipes adds intelligent data transformation to complement the point-&-click data extraction service. In a nutshell, Pipes is a data integration and post-processing engine inside of Dexi.io, that is able to aggregate, sanitize extracted data and a lot more. We’ll share more on it in the following posts.
Driving Innovation is the Key to Success
Stefan Avivson, CEO of Dexi.io explains: “Although Robotic Process Automation (RPA) is not a new concept, service providers have been using so called Robotic Processing for a decade now, but the amount of available data and the technology to process it has evolved tremendously over the past two years. There is basically no real limitation to the use of Big Data and there is no real effort in convincing people that RPA is the future. It’s more a question of knowing how! Utilizing the resources of our innovation team for our clients and partners has without doubt been one of the key drivers to our success!”
Let’s see more in the future of this cutting-edge cloud scrape service.
Recently I got notified of Kimono service finishing its work due to kimono team being joining another project. So many data hunters who were using this prominent free API service are now in search for a good alternative. more…