Development

Various topics related to Web Scraper, Web Crawler and Data Processing development

Scrape with Google App Script

In this post I want to let you how I’ve managed to complete the challenge of scraping a site with Google Apps Script (GAS).

The Challenge

The challenge was to scrape arbitrary sites and save all the site’s pure text (stripping all the html markup) into a single file. Originally I was going to use python and PHP solutions, but then I thought I’d try using Google App Script instead. And it turned out pretty well. more…

A Simple Email Crawler in Python

Email Crawling I often receive requests asking about email crawling. It is evident that this topic is quite interesting for those who want to scrape contact information from the web (like direct marketers), and previously we have already mentioned GSA Email Spider as an off-the-shelf solution for email crawling. In this article I want to demonstrate how easy it is to build a simple email crawler in Python. This crawler is simple, but you can learn many things from this example (especially if you’re new to scraping in Python). more…

Where is NoSQL practically used?

For over four decades now, Relational Database Management Systems (RDMS) have dominated the enterprise market. However, the trend seems to change with the introduction of NoSQL databases. In this article, we are going to highlight practical examples where NoSQL systems have been deployed. We will also go further and point out other applications where implementation of such systems might be necessary. more…

What is MongoDB?

MongoDB LogoMongoDB, an open-source document database written in C++, is classified as a NoSQL database. Because it avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), it facilitates quick-and-easy data integration in various applications. more…

Back to top