SQL (Structured Query Language) is a powerful language for working with relational databases, but quite a few people are in fact ignorant of the dark side of this language, which is called SQL-injection. Anyone who knows this language well enough can extract the needed data from your site by means of SQL – unless developers build defenses against SQL-injection, of course. Let’s discuss how to hack data and how to secure your web resource from these kinds of data leaks! more…
Almost all developers have faced a parsing data task. Needs can be different – from a product catalog to parsing stock pricing. Parsing is a very popular direction in back-end development; there are specialists creating quality parsers and scrapers. Besides, this theme is very interesting and appeals to the tastes of everyone who enjoys web. Today we review php tools used in parsing web content. more…
Sometimes when you are developing a project, it might be necessary to do a parsing of xls documents. To give an example: you do a synchronization between xls worksheets and a website database, and you need to convert xls data to the Mysql and want to do it completely automatically.
If you work with Windows it is simple enough – you just need to use COM objects. However, it is another thing if you work with PHP and need to make it work under the UNIX systems. Fortunately there are many classes and libraries for this purpose. One of them is the class PHPExcel. This library is completely cross-platform, so you will not have problems with portability. more…
We want to show how one can make a Curl download file from a server. See comments in the code as explanations.
// open file descriptor
$fp = fopen ("image.png", 'w+') or die('Unable to write a file');
// file to download
$ch = curl_init('http://scraping.pro/ewd64.png');
// enable SSL if needed
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// output to file descriptor
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// set large timeout to allow curl to run for a longer time
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);
curl_setopt($ch, CURLOPT_USERAGENT, 'any');
// Enable debug output
curl_setopt($ch, CURLOPT_VERBOSE, true);
Recently I was challenged to make a script that would authenticate through a bot-proof login from and redirect to a logged in page. more…
Suppose we want to set only one exception handler function for all exceptions in the scraper program. This exception handler might be working for a multi-level program. Here is how it works in PHP. more…
In this post, I’ll explain how to do a simple web page extraction in PHP using cURL, the ‘Client URL library’.
If you want to use regular expressions in your PHP program the best way is to use so called preg-functions (they wrap Perl-Compatible Regular Expressions library so sometimes they are called PCRE functions). Of course, there’re some other function sets like ereg and mb_ereg but they are quite outdated and in this article we’ll focus on preg functions only.