Almost all developers have faced a parsing data task. Needs can be different –  from a product catalog to parsing stock pricing. Parsing is a very popular direction in back-end development; there are specialists creating quality parsers and scrapers. Besides, this theme is very interesting and appeals to the tastes of everyone who enjoys web. Today we review php tools used in parsing web content.


is a convenient screen scraping library for PHP, which is based on a Symfony framework. Goutte provides an API for crawling and extracting data from different types of responses. Cross functional, reliable and easy – that makes Goutte the best scraping library.

The usage is quite simple:

Congratulations. You get data from a web page! Other examples of library possibilities are:

For connecting to the library, you should add dependencies to the composer.json file. That’s why Goutte should be used with frameworks, for example, Laravel or Yii , not just with simple, single page sites.


is an engaging library which allows the user to access HTML elements by an SQL syntax. If you love SQL, this experimental library would be the right choice.

In some cases, it’s more convenient to use SQL instead of CSS-selectors. The library is fast, but has a constrained functionality. This library would be an ideal match for trivial tasks and to parse a web page fast.

Unfortunately, the project was abandoned by its creators in 2006, but htmlSQL is still a reliable helper in parsing and scraping.


This is a PHP-library that provides HTML parsing by selectors. It processes an invalid HTML and allows the user to parse a page by jquery-like selectors. Let’s look at the example:

Simple HTML DOM is not as fast as other libraries.


cURL is one of the most popular libraries (an inbuilt php component) for scraping webpages. As it is a standardized PHP-library, there is no need to include third-party files and classes, but that doesn’t make the cURL library any more convenient than Goutte. The above-mentioned Simple HTML DOM and cURL are making a tandem that is the most popular approach in small data parsing.

Let’s look at the ordinary usage of the library:

Also, you can set a cookie using this.


And this how it can be used:


To sum up, the chart below will help you to choose the right library for your needs. For a speed evaluation we used .

Average speed (seconds) Convenience in use (Usability) Special aspects
Goutte 14.3 high Conveniently works with big projects, OOP, medium speed parsing.
htmlSql 11.3 medium Fast parsing, but has a limited functionality.
cURL + SimpleHTML DOM 18.6 medium Works with invalid HTML, slow parsing though.

So, for big-size sites it is better to use Goutte and htmlSql. For the easy task there is no need to include third-party libraries, so it’s better to use cURL along with SimpleHTML DOM for parsing.