Content Grabber is a powerful, multi-featured web scraping solution with web automation capabilities. It was developed by the folks that brought you Visual Web Ripper and it includes all the VWR features and more. In fact Content Grabber truly has raised the bar.
The software is targeted at companies with a critical reliance on web scraping and second it is for those who want to build, package and sell their own web scraping offerings (independent agents). It is an Enterprise grade solution which has been built from the ground up with a focus on performance, scalability and usability.
Here I would highlight its 4 characteristics.
1. Visual Scraper
Content Grabber truly stands for one new visual web scraper on the market. It has a simple point-and-click UI where users browse the website and click on the data elements in the order of collecting them.
2. Stand alone agents
The thing that strikes me the most is the ability to compose scraping agents and compile them into stand alone Win applications, that are to be run without exterior help. The self contained agents include the actual Content Grabber engine so they can run independent of the licensed version of the Content Grabber software. This enables developers to build self-contained web scraping agents which they can run independently from the licensed software royalty free. If they buy the Premium license they can also white label these as their own.
3. Incorporating agents into web apps
The Premium edition allows users to run agents and display extracted data in their own web applications. Content Grabber can even manage a separate instance of the same agents for each web user. View Programming Interface in the help file for more information.
4. Supports dynamic websites scrape
Feel free to watch a overview video that shows how to compose an agent and get data in 60 seconds.
A Content Grabber web scraping agent is a collection of commands which are executed in serial until completed. These commands are recorded in order of execution. They are displayed in the Agent Explorer area of the Content Grabber screen.
Content Grabber facilitates simple macro automation methods for agent creation, or you can take direct control over the treatment of each command within your agent. This gives you both simplicity and developer-level control as needed.
If you want to make other adjustments or gain more control of your commands, you can make changes in the Configure Agent Command panel.
Management Tools for Developers
Content Grabber includes enterprise level debugging, logging, error handling and error recovery features. This is important to ensure the reliability of the web scraping agents. Also included are centralized management tools for scheduling, database connections, proxies, notifications and script libraries on a per server basis.
Other advanced Features
- From the Application Settings menu you can change from the default menus (called Simple Layout) to Expert Layout. Expert Layout provides a large number of additional controls for developers to manage libraries, proxies, connections, input parameters, scripts, the debugger speed, etc. As a Content Grabber beginner, I like it :-).
- Content Grabber’s web API can also be used to add web automation capabilities to your own desktop applications royalty free.
- Running agents from the command-line using Content Grabber’s command-line program. You can specify command-line parameters that can easily be used as input data by your agents.
- Content Grabber provides agent and command templates for easy reusability, with many included agent templates for popular websites, and command templates such as a fully-fledged web site crawler. What I liked is premade agent templates to extract in some popular for scrape sites:
- All Content Grabber agents run multi-threaded by default and you can control the number of web browsers used to extract data. A Content Grabber agent can use a mix of web browsers that can process dynamic pages and ultra-fast HTML parsers for simple web pages.
These features and more are covered in the features overview online. Especially good for developers there is a detailed online software manual, as well as rich training videos for every level of difficulty.
Professional and Premium Editions
There are two main editions of the software (Professional and Premium) and a third Server Edition ($495) which will save you money if you need a production only license.
Most of the functionality is included in the base license (Professional). The main difference comes with regards to the use of the API and your ability to white-label the self-contained web scraping agents as your own. The Premium Edition also includes integration with Visual Studio 2013 for extra powerful script editing, debugging and unit testing. You can view this features comparisons table for more detail.
The only possible cons I can see are the price (starting at $995).
The Content Grabber in my opinion is the most feature-rich visual scraper being worth of the price it stands for. It is easy to use, catering well to the beginner user, yet it has an extensive feature list and provides great control for experienced users and developers. For difficult projects, users can leverage XPath, Regex and programming scripts. Content Grabber also has an extensive API which is well documented and includes a royalty free runtime so users can add scraping functionality to their desktop applications. Users can also produce stand-alone royalty free web scraping agents.
Organizations that are serious about web scraping will find this tool a must have. It has set a new bench mark in web scraping and, at the time of writing this article, truly does stand on its own.