Dexi Pipes: multi-threaded web scraping of site aggregators

Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.


Create and Manage WP Blog With CodeLobster IDE

wordpress at codelobster iedIf you plan to create and maintain large-scale projects using WordPress, you may be challenged with performance and security problems. How do you maintain a large project in WP? Everything depends on the programmer’s skills. To avoid problems with scaling and support, you should be able to write and maintain a well-documented code. An IDE accommodating a WP code might be good choice. more…

Reliable rotating proxies for business directories scrape

logo_rotating_proxiesWe’ve already written about suitable proxy servers for web scraping. Now we want to focus our readers on those for the huge/mass quantities data records scrape, particulary from the business directories. When scraping busines directories, their web servers can identify repetative requesting and put you on hold by looking at the IP address that is used for frequent http requests. Proxy rotation web service is the means for repeatedly changing IP address. Thus, target web server can only see the random IP addresses from rotatign proxies pool at each request. more…

Edit and resend HTTP POST in a browser

Recently I’ve encountered a challenge to make a series of HTTP POST requests with different parameters. This has forced me to look for existing tooling in the marketplace; the features I am looking for are getting POST request,  editing the request and resending it. What I’ve found useful for this is the FireFox browser + dev tools – really quick and usable for this purpose. All other methods are either not full stack (only resend without edit) or require much soft to plug in. more…

Scrapinghub review

4 in 1 of scrapinghub

Scrapinghub is the developer-focused web scraping platform. It provides web scraping tools and services to extract structured information from online sources. The Scrapinghub platform also offers several useful services to collect organized data from the internet. Scrapinghub has four major tools – Scrapy Cloud, Portia, Crawlera, and Splash. We’ve decided to try the service. In this post we’ll review its main functionality and also share our experience with Scrapinghub. more…

CloudScrape to transform into

We have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name:

Pipes for aggregation and post-processing has relaunched and rebranded from its early-stage name CloudScrape. The company has also released a new product, Pipes. Pipes adds intelligent data transformation to complement the point-&-click data extraction service. In a nutshell, Pipes is a data integration and post-processing engine inside of, that is able to aggregate, sanitize extracted data and a lot more. We’ll share more on it in the following posts.

Driving Innovation is the Key to Success

Stefan Avivson, CEO of explains: “Although Robotic Process Automation (RPA) is not a new concept, service providers have been using so called Robotic Processing for a decade now, but the amount of available data and the technology to process it has evolved tremendously over the past two years. There is basically no real limitation to the use of Big Data and there is no real effort in convincing people that RPA is the future. It’s more a question of knowing how! Utilizing the resources of our innovation team for our clients and partners has without doubt been one of the key drivers to our success!

Let’s see more in the future of this cutting-edge cloud scrape service.

Back to top