Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.
Web Scraping Software
Reviews, Tests, Tutorials, Ratings and Comparisons of Web Scraping Programs and Services
We have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name: Dexi.io.
Pipes for aggregation and post-processing
Dexi.io has relaunched and rebranded from its early-stage name CloudScrape. The company has also released a new product, Pipes. Pipes adds intelligent data transformation to complement the point-&-click data extraction service. In a nutshell, Pipes is a data integration and post-processing engine inside of Dexi.io, that is able to aggregate, sanitize extracted data and a lot more. We’ll share more on it in the following posts.
Driving Innovation is the Key to Success
Stefan Avivson, CEO of Dexi.io explains: “Although Robotic Process Automation (RPA) is not a new concept, service providers have been using so called Robotic Processing for a decade now, but the amount of available data and the technology to process it has evolved tremendously over the past two years. There is basically no real limitation to the use of Big Data and there is no real effort in convincing people that RPA is the future. It’s more a question of knowing how! Utilizing the resources of our innovation team for our clients and partners has without doubt been one of the key drivers to our success!”
Let’s see more in the future of this cutting-edge cloud scrape service.
Recently I got notified of Kimono service finishing its work due to kimono team being joining another project. So many data hunters who were using this prominent free API service are now in search for a good alternative. more…
Today I got a question from one of my readers asking if there is a good out-of-the-box solution for crawling multiple websites for contact information. more…
Professional data extraction requires adequate proxying to keep anonymity of scraping robots. When attempting to extract large data sets (over 1M records, ex. business directories) reliable and fast proxy service is needed.
Sequentum has released the Nohodo proxy service integration for Content Grabber. Nohodo provides a free account for Content Grabber users (up to 5000 requests monthly for free). The feature is available for both trial users and regular customers. Here’s how it works… more…
Dexi.io is a powerful scraping suite. This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and runnig them in clouds for user convenience. Yet it includes the API, each scraper being a json definition similar to other services like import.io, kimono lab and parseHub. more…
Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:
- The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes. more…
Anyone should be able to pull data from the web and access it in the format they want. If a website does not have an API available, scraping is one of the only options to get the data you need. But figuring out how to scrape data in the complicated HTML is a pain.