ScraperWiki Logo

ScraperWiki is a unique, state-of-the-art scraping cloud service providing developers an extensive tool and meeting the users’ needs on demand. ScraperWiki, a free social tool, is a professionally designed playground for amateur custom scrapers to be launched, as well as for business use on request data with certain services charged. Other web scraping and web harvesting software and services are overviewed in here.

ScraperWiki categories

Views Editor Libraries
Guides Request Data Scheduling
Manage Data Scraped User Profile Types of Scrapers
Programmers’ Interaction in Development

 

The following picture gives you a clear impression of what ScraperWiki is about:

User Friendly Profile

ScraperWiki provides user friendly profile management, including Emailer, with service updates to notify the user of the scraper’s latest runs and data gathered.

Create and Run Custom Scraper at the Editor

The simple yet fully functional Editor is a good field for new developers. ScraperWiki allows creation of scrapers using one of following languages: Python, Ruby or PHP. The Run and Save options, data and sources view, and console output make this online service very attractive.

The Editor’s shortcoming is scaling. At the scraper Editor area, if scaling it up the scroll bar isn’t visible, meaning the console/data output window is not accessible. Also, there is difficult navigation/moving within the text code bar (no scrolling again).

Public, Protected and Private Scrapers

ScraperWiki  manages scrapers as Private, Protected or Public, the latter being a subject of registered users editing. Through simple “copying a code” you may copy an existing public scraper to your dashboard to save, make changes to and use it.

Programmers’ Collaboration on Scraper Development

Programmers’ collaboration on scraper development is possible through the “Chat” tab at Editor, this tab allowing developers to simultaneously be on the same scraper. Each scraper ‘History’ option is a useful feature of the service, allowing backtracking code stages, runs and saves of a scraper. The scraper license is managed there. No doubt, you will be the beneficiary of a multitude of public scrapers that pop up oftener than one per hour; just follow Browse scrapers.

Libraries Plug-in

ScraperWiki provides some useful scraping libraries’ support (urllib2, Mechanize), as well as its own – scraperwiki.

Manage Data Scraped

The easiest way to get scraped data is to download it as a CSV file. Other options are to use API to integrate data into a custom application or download the SQLite db. Actually, if you a programmer, you can do all sorts of data management with SQLite, such as SQL queries; read about it here. Moreover, with API, a user may access all kinds of service info related to scrapers, views, user profile and so on.

Views

Views show how the scraped data will be presented. Several views might be built for a certain dataset.  You can build a view out of any public or protected scraper. The View is attached to the scraper through any internal inbuilt API. To create a view, just click Visualize or choose New View and complete. Python, Ruby, PHP and HTML view layouts are possible to generate. In the following pictures, there is a simple Python view, followed by HTML.


Scheduling

Scheduling may be on a daily basis for free accounts. The  shortcoming is not being able to specify a running time. For paid accounts (starting with Business $29 per month), scheduling may be hourly.

Guides

Online Guides will take you through an intro, composing your first scraper and view (30 min each for non-programmer) and some other tutorials and documentation. I strongly recommend you work through them when you first connect with ScraperWiki.

Request Data

For business use, the customer may apply with these steps:

  •  Explain the data you require.
  • A data request service will investigate your request. You’ll be supplied with a report outlining the way to get data, how data will be delivered to you, and the cost involved. This investigation will take time, with a nominal charge of $150.
  • Code will be written to get your data.

For non-profit or low-budget entities, you may post your data request on ScraperWiki’s community email list, which is actually a ScraperWiki Google group. Mention why you want to obtain the data and mention URLs or source documents and some other info with the hope that support team will respond to you to help.