FMiner is another data extraction tool which has been on the market already for 5 years. Let’s see what features allow it to survive in the tough competitive struggle we have in the web scraping world.

Overview

The main distinctive features of this software are showing the scraping process in a visual way as a diagram and allowing you to record macros by navigating the web using the internal web browser.

In addition to the basic web scraping functionality, it offers some advanced features like AJAX and JavaScript processing, CAPTCHA solving, custom Python code support and Task scheduler with Email reports.

Being written in Python, FMiner can be run both on Windows and Mac OS machines, and as it does scraping using the internal browser, it may be used also as a macro player for simulating user activity on the web.

Characteristics

Usability 4 stars
Functionality 5 stars
Easy to learn 3.5 stars
Customer support forum, email; custom project development
Price $168 – $248
Trial period/Free version 15 days trial
OS (Specifications) Win, Mac
Data Export formats Excel, CSV, XML, SQLLite and other databases (Oracle, MS SQL, MySQL, Postgres, Access, OBDC)
Multi-thread yes; running several web browsers simultaneously
API supports custom Python code
Scheduling yes

Workflow

Let’s see how to scrape with FMiner. The main screen of the application is divided into four sections:

  1. Macro Designer
  2. Web Browser
  3. Action Attributes
  4. Logs, Data, Selections and Variables

FMiner main window

The macro designer section displays a project flow chart where each action is represented by its own element. You can build your flow chart manually or by recording your actions in the browser on the right. Also you can  rearrange and remove the elements as you need.

I’d like to note though that the editor here is quite simple. It doesn’t allow you either to select a group of elements or to copy/paste them. Also the Delete key doesn’t delete elements, and this was the most frustrating thing when I was editing my flow chart.

Here is a flowchart of the project that scrapes IP and Cookie information from our testing ground page, adds this information into the table named “output”, then clears cookies and scrapes this page again, adding the scraped values as a second row into the same table:

FMiner flow chart

Here are the attributes of the IP element of the output table:

FMiner attributes

As you can see from this picture, the data element is defined using an XPath expression that you can edit either by yourself or by using some auxiliary functions like Select target (on the page), Relative selection or Expand/Shrink selection. Also you can work with groups of similar page elements (see the last button under the Target select title).

In the Extract type section you can define what type of data you expect to extract from the page. FMiner supports the following data types: text, html, dom attribute, page attribute, download, regular expression and static data.

Be careful when you switch between the data types! When I switched from text to page attribute and then switched back, I lost my XPath expression and could not get it back after that. And, by the way, FMiner doesn’t have UNDO/REDO functionality and even I didn’t see a SAVE button as well. I suspect that it simply autosaves the project always, so it means you can easily lost your data if you simply make a mistake when editing your flowchart (though, it does have a kind of backup functionality that allows you to save the current state of the project into another file (see File > backup the project to…).

When you click the Run button (on the toolbar) FMiner starts to execute your flowchart in the browser (on the right) and log its actions in the log window (right under the browser):

FMiner log

After it is finished you can see the resulting table in the data window:

FMiner data

Note that the program doesn’t delete the old data from the tables, so you can collect all data extracted from several scraping sessions in the same data table. Also you can manually edit or insert the data rows or even import outward data into the table from any XLS or CVS file. As soon as you’re satisfied with the result you can export it into an external file or database (see Export button on the toolbar).

Conclusion

Undoubtedly, FMiner has its right to life and certainly some customers will find it very useful for their web scraping tasks. In such a brief review I didn’t have the opportunity to cover all the features of this scraper, but it seems like after 5 years of development it has been polished up to solve many sophisticated web scraping tasks. I didn’t cover regex support, post extraction data adjustment, captcha solving and other goodies that are hidden in this scraper.

I definitely recommend that you try it, but unfortunately you have only 15 days to make up your mind. I would suggest to the developer to think about switching to the freemium model, which gives a user more freedom thus making him/her feel more comfortable and subsequently more loyal.

As always you’re welcome to ask any questions or share your experience related to the FMiner in the comments below.

Cheers!