web scraping

Content Grabber self-contained (standalone) agent

self_contained_agentAs web scraping is becoming easier to use, more and more people are able to leverage the world’s web resources. As this trend grows, structured data from the web empower businesses and enable a wave of new business ideas to become a reality. Now there is a new technology on the market called: “self-contained agents” that might just make this a tsunami! more…

Content Grabber with free proxy account integration for business directories scrape

content grabber free nohodoProfessional data extraction requires adequate proxying to keep anonymity of scraping robots. When attempting to extract large data sets (over 1M records, ex. business directories) reliable and fast proxy service is needed.

Sequentum has released the Nohodo proxy service integration for Content Grabber. Nohodo provides a free account for Content Grabber users (up to 5000 requests monthly for free). The feature is available for both trial users and regular customers. Here’s how it works… more…

Dexi.io Review

Dexi.io is a powerful scraping suite. This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and runnig them in clouds for user convenience. Yet it includes the API, each scraper being a json definition similar to other services like import.io, kimono lab and parseHub. more…

No CAPTCHA reCaptcha challenge

No CAPTCHA reCAPTCHA solvingSooner or later a new generation of spam protection methods will emerge to block all unwanted site visitors. The recently launched Google No CAPTHCA reCaptcha could just be such a method. This new “behaviour analysis” tool is getting more and more attention both from the site owners and from scraping engines who are trying to break it. Since Google does not reveal any secrets of its operation, we want to share with you the techniques used in this new smart analysis CAPTCHA that determines between bot and human. Let’s look inside. more…

Turn any interactive website into an API with ParseHub

parsehub
Anyone should be able to pull data from the web and access it in the format they want. If a website does not have an API available, scraping is one of the only options to get the data you need. But figuring out how to scrape data in the complicated HTML is a pain.

ParseHub is a new web browser extension that you can use to turn any dynamic and poorly structured website into an API, without writing code. ParseHub is a scraping tool that is designed to work on websites with JavaScript and Ajax; it is similar to web scraping tools such as Import.io and Kimono Labs.
more…

Back to top