Development

Various topics related to Web Scraper, Web Crawler and Data Processing development

Import.Io Magic Method API

Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:

  • The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
  • It seems to have trouble with some javascript pages.

But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes. more…

Scrape with Google App Script

In this post I want to let you how I’ve managed to complete the challenge of scraping a site with Google Apps Script (GAS).

The Challenge

The challenge was to scrape arbitrary sites and save all the site’s pure text (stripping all the html markup) into a single file. Originally I was going to use python and PHP solutions, but then I thought I’d try using Google App Script instead. And it turned out pretty well. more…

A Simple Email Crawler in Python

Email Crawling I often receive requests asking about email crawling. It is evident that this topic is quite interesting for those who want to scrape contact information from the web (like direct marketers), and previously we have already mentioned GSA Email Spider as an off-the-shelf solution for email crawling. In this article I want to demonstrate how easy it is to build a simple email crawler in Python. This crawler is simple, but you can learn many things from this example (especially if you’re new to scraping in Python). more…

Where is NoSQL practically used?

For over four decades now, Relational Database Management Systems (RDMS) have dominated the enterprise market. However, the trend seems to change with the introduction of NoSQL databases. In this article, we are going to highlight practical examples where NoSQL systems have been deployed. We will also go further and point out other applications where implementation of such systems might be necessary. more…

Back to top