Development

Various topics related to Web Scraper, Web Crawler and Data Processing development

Headless browser python scraper at pythonanywhere

Recently I decided to work with pythonanywhere.com for running python scripts on JS stuffed websites.

Originally I tried to leverage the dryscrape library, but I failed to do it, and a nice support explained to me: “…unfortunately dryscrape depends on WebKit, and WebKit doesn’t work with our virtualisation system.”

A headless browser is by definition a web browser without a graphical user interface (GUI).
more…

Dexi Pipes: multi-threaded web scraping of site aggregators

Today I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.

more…

Back to top