web scraping

Mozenda web scraping and publishing of data to cloud storage

mozenda-cloud-scraperMozenda is a cloud web scraping service (SaaS), and we’ve already reviewed it. Since our last review, Mozenda has provided more useful utility features for data extraction. Besides multi-threaded extraction & smart data aggregation, Mozenda allows users to publish extracted data to cloud storage such as Dropbox, Amazon, and Microsoft Azure. In this post we will try to explain the new Mozenda extraction and integration capabilities. more…

Dexi Pipes: multi-threaded web scraping of site aggregators

Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.

more…

Reliable rotating proxies for business directories scrape

logo_rotating_proxiesWe’ve already written about suitable proxy servers for web scraping. Now we want to focus our readers on those for the huge/mass quantities data records scrape, particulary from the business directories. When scraping busines directories, their web servers can identify repetative requesting and put you on hold by looking at the IP address that is used for frequent http requests. Proxy rotation web service is the means for repeatedly changing IP address. Thus, target web server can only see the random IP addresses from rotatign proxies pool at each request. more…

Content Grabber self-contained (standalone) agent

self_contained_agentAs web scraping is becoming easier to use, more and more people are able to leverage the world’s web resources. As this trend grows, structured data from the web empower businesses and enable a wave of new business ideas to become a reality. Now there is a new technology on the market called: “self-contained agents” that might just make this a tsunami! more…

Content Grabber with free proxy account integration for business directories scrape

content grabber free nohodoProfessional data extraction requires adequate proxying to keep anonymity of scraping robots. When attempting to extract large data sets (over 1M records, ex. business directories) reliable and fast proxy service is needed.

Sequentum has released the Nohodo proxy service integration for Content Grabber. Nohodo provides a free account for Content Grabber users (up to 5000 requests monthly for free). The feature is available for both trial users and regular customers. Here’s how it works… more…

Back to top