As we already showed you the example of using WebDriver with C#,  in this post we will see how to extract web data using Selenium WebDriver with Java, the native language of Selenium WebDriver.

Selenium is an open source tool for Web Automation, it provides APIs though which we can perform user events programmatically.  For more information about Selenium read here.

To extract data using Selenium you will need to install the following tools and libraries to your computer; if you have them already you can simply skip the following two sections

1. Download Selenium WebDriver

From http://www.seleniumhq.org/download/ you need to download the following libraries and extract them somewhere on your computer:

The provided links might not be valid if a new version has been released. So just download the latest version of each one of those files.

2. Install Eclipse Java IDE and configure Selenium WebDriver

Create the JAVA project and configure Selenium:

  • Open the eclipse IDE from the short cut and create the new java project by navigating to File > New > Java Project
  • Provide the project name and click on the “Finish” button as it is shown in the image below:

New Java Project Window

Then you need to provide paths to the libraries you recently downloaded:

  • Right click on the project we created and go to Build Path > Configure Build Path
  • Open the libraries tab and click on the “add external jars..” and provide the mentioned JARs:

Java Buid Path Window

  • Now, if you have done everything correctly, in the reference libraries you will be able to see those two JAR files as in the image:

Selenium Libraries in Package Explorer

Now we are ready to start writing the program.

3. Write the Scraper

I’ll show you a program that does the following:

  1. Opens the Firefox
  2. Goes to http://testing-ground.scraping.pro/login
  3. Submit the form using username and password
  4. Extracts the message text and saves it to status.txt file
  5. Takes a screenshot of the website and saves it to screenshot.png file
  6. Closes the Firefox

Here I provide for you the complete code of the program and below it you can find the explanation of how it works.

4. How it works

In this section I will explain the purpose of all the functions in the code.

openTestSite() launches the Firefox browser and opens the test website:

login() enters the provided username and password into the corresponding fields and submits the form. It does this by searching the form field elements on the page by their HTML Ids and by sending characters to those elements:

getText () grabs the message appeared after login and saves it to status.txt file:

saveScreenshot() takes a screenshot of the web page and saves it to screenshot.png file:

closeBrowser() closes the Firefox browser:

5. Run the program

To run the program, click on your project, and then select RunRun as Java Application. The program will open the Firefox browser, and once the browser is closed the program execution is finished. To check the screenshot and text file, right click on the browser and click “refresh” button. Inside the project folder you will be able to see one text file and one png file.

That’s it. If you have any comments or questions feel free to ask! For a real-world exaple of scraping with WebDriver in Java look at this article.