Development

Various topics related to Web Scraper, Web Crawler and Data Processing development

Crawling web pages with Netpeak Spider in conjunction with NetNut and GeoSurf proxies

NetpeakSpider-logo-owlAgreed, it’s hard to overestimate the importance of information – “Master of information, master of situation”. Nowadays, we have everything we need to become a “master of situation”. We have all the needed tools like spiders and parsers that can scrape various data from websites. Today we will consider scraping Amazon with a web spider equipped with proxy services. more…

Creating REST API with Spring

Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck. more…

Web Scraping with Node.js

nodejs-web-scraping-logoThe web scraping topic has been actively growing in popularity for dozens of years now. Freelance sites are overcrowded with orders connected with this contradictory data extracting process. Today we will combine two new and revolutionary directions in web development. So, let’s consider an elegant and modern way to scrape data from websites with Node.js! more…

JavaScript rendering library for scraping javascript sites

logo-js-rendering-libraryCan you imagine how many scraping instruments are at our service? Though it has a long history, scraping has at last become a multi-lingual and simple approach. Unfortunately, there is a list of non-trivial tasks which can’t be resolved in a snap.

One of these tasks is scraping javascript sites, those that output data using JavaScript. Facing this task, classic scrapers (not all of them though) ignore JS-data and continue their own life-cycle. However, when this little defect becomes a big trouble, developers all over the world take measures. And they did it! Today we consider one of the most awesome tools which scrapes JS-generated data – Splash. more…

Design patterns for hierarchical data storage and effective processing

The hierarchical data storage problem is a non-trivial task in relational database context. For example, your online shop has goods of different categories and subcategories creating tree spans for 5 levels. How should they be stored in a database?

Luckily, there are several approaches (design patterns) that will help the developer to design database structure without both odd tables and code. As a result, the site will work faster and any changes, even on database layer, won’t cause troubles. We will study these approaches below. more…

SQL-injection: how to use them and how to defend against them

sql-injection-logoSQL (Structured Query Language) is a powerful language for working with relational databases, but quite a few people are in fact ignorant of the dark side of this language, which is called SQL-injection. Anyone who knows this language well enough can extract the needed data from your site by means of SQL – unless developers build defenses against SQL-injection, of course. Let’s discuss how to hack data and how to secure your web resource from these kinds of data leaks! more…

Web parsing php tools

Almost all developers have faced a parsing data task. Needs can be different –  from a product catalog to parsing stock pricing. Parsing is a very popular direction in back-end development; there are specialists creating quality parsers and scrapers. Besides, this theme is very interesting and appeals to the tastes of everyone who enjoys web. Today we review php tools used in parsing web content. more…

PHPxcel for importing and exporting data in working with Excel

Sometimes when you are developing a project, it might be necessary to do a parsing of xls documents. To give an example: you do a synchronization between xls worksheets and a website database, and you need to convert xls data to the Mysql and want to do it completely automatically.

If you work with Windows it is simple enough – you just need to use COM objects. However, it is another thing if you work with PHP and need to make it work under the UNIX systems. Fortunately there are many classes and libraries for this purpose. One of them is the class PHPExcel. This library is completely cross-platform, so you will not have problems with portability.  more…

Web Scraping with Java and HtmlUnit

java-htmlunit-post-front-cover-smallWeb scraping or crawling is the act of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be done manually, but generally this term refers to the automated process of downloading the HTML content of a page, parsing/extracting the data, and saving it into a database for further analysis or use. more…

Back to top