Various topics related to Web Scraper, Web Crawler and Data Processing development

Dexi Pipes: multi-threaded web scraping of site aggregators

Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.


Reliable rotating proxies for business directories scrape

logo_rotating_proxiesWe’ve already written about suitable proxy servers for web scraping. Now we want to focus our readers on those for the huge/mass quantities data records scrape, particulary from the business directories. When scraping busines directories, their web servers can identify repetative requesting and put you on hold by looking at the IP address that is used for frequent http requests. Proxy rotation web service is the means for repeatedly changing IP address. Thus, target web server can only see the random IP addresses from rotatign proxies pool at each request. more…

Back to top