Octoparse – a scraping tool designed for non-programmers

Octoparse is an easy and powerful visual web scraper enabling anyone, even those without much programming background, to collect and extract data from the web. Octoparse is designed in a way to help users easily deal with complex website structures, such as those with JavaScript; it can be compared to other web scraping tools such as Import.io and Mozenda. more…

Php Curl download file

We want to show how one can make a Curl download file from a server. See comments in the code as explanations.

 

Hotel: scrape prices, Q&A

Question

I want to extract the hotel name and the current room price of some hotels daily from https://www.expedia.ca/Hotel-Search?#&destination=Quebec,%20Quebec,%20Canada&startDate=06/11/2016&endDate=07/11/2016&regionId=&adults=2

I am a small hotel owner and want those info quite often, and hope I can do it with codes automatically in someway.  You are expert in this field, what is the easiest ways to get those information?  Can you give me some example codes? more…

Dexi.io – how to improve performance

dexi-improve-speedIntro

Some may argue that extracting 3 records per minute is not fast enough for an automated scraper (see my last post on Dexi multi-threaded jobs). However, you should realize that Dexi extractor robots behave like a full-blown modern browser and fetch all the resources that crawled pages load (CSS, JS, fonts, etc.).
In terms of performance, an extractor robot might not be as fast as a pure HTTP scraping script, but its advantage is the ability to extract data from dynamic websites which require running JavaScript code in order to generate a user-facing content. It will also be harder for anti-bot mechanisms to detect and block it. more…

Python – parameterized storing into db to prevent SQL injection example

test.py

db_config.py (place it in the same directory as the test.py file)

Is this a legal method of acquiring insurance leads?

Recently I received a question on insurance leads:

Is this a legal method of acquiring insurance leads [from the web]? Are there any agent testimonials on the efficiency of this type of service?

Legality issue in web scraping

With the matter of legality in web scraping, there should be a clear approach –  it depends on the website and its privacy policy. There could be at least 2 cases:

  1. Public info (prices, inventory info, public offers), i.e. everything that is not protected by copyright and available for scraping.
  2. The copyright protected info –  website Terms of Use or Terms of Service restrictions make copying and therefore web scraping illegal.

So far I have no insurance agent testimonies on the efficiency of any insurance lead scrape service. The web sites I searched [on the insurance leads] have given me the impression that the customer info they gather is highly secured (not viewable). I doubt that any sites are going to expose insurance leads. In most of them the leads are available by paid subscription plans.

If there are any such websites like insurance leads directories (public insurance quotes), we might develop a scraper that consistently grabs fresh or new info for further analysis. It does save the agent’s time for re-searching, re-visiting and so on. One scraper might work with multiple directory pages for scrape.

You might find it interesting to read about web page change tracking if you only need to see updates (no data storing applied).

Death By Captcha Updated API clients

Death By Captcha is a reputable CAPTCHA solving service with more than 7 years in the Captcha Solving business. They have recently updated all their API clients, so users can experience maximum efficiency and faster solving times.

They enthusiastically recommend that users and software developers visit the API page and update their DBC API implementation in order to get the most out of it (the API and docs are available for registered users only).  The free credits are provided for users to test or implement the new client API!

If you tell them you saw this info through the scraping.pro blog, they’ll give you a 1K free CAPTCHAs additional credit!
For further info, you may contact them directly.

Back to top