In this post I’d like to share my experience with the residential proxy of the Luminati proxy provider.

Residential proxies’ advantages

The traditional proxies’ disadvantage is that they are provided by data centers. Web services can easily recognize that those IPs originate from a dc and thus block it as a web robot (not a regular user) visit. Even with a decent proxy, websites can cloak their data or modify them when detecting a bot visit.

Residential IP is an IP provided to a home user by an Internet Service Provider (ISP). Users [of web apps] give their consent to allow access to their residential IPs while a device is idle, connected to the internet and not in use and has enough power (eg. luminati.io/sdk). Luminati does not collect any user data, rather it is interested only in IPs. To date, Luminati connects to over 30 million residential IPs, located across the world.

Luminati Proxy Manager

What does the proxy manager do? It is a [open-source] software for managing multiple proxies seamlessly via API and admin UI.
To see all the ways to install it, use Tools->Proxy Manager on the Luminati’s Dashboard left-side panel.

The advantages of using of the proxy manager:

  • One entry point
  • Concurrent connections
  • Auto retry rules
  • Real time statistics

Besides, proxy manager can be used with someone’s phone for scraping (link).

Note that residential proxies require special approval, so it took me 3 working ways till I could get that zone working.
 The personal Luminati manager (assistant) has been helpful in getting acquainted with LPM, creating zones and making it to work.

Test

We decided to test the Luminati service, particularly its residential proxies (residential, limited to 7 days only).

Setup

First of all we set up the Local Proxy Manager (LPM) and the residential proxies zone. Read more about LPM. Zones are a service’s custom configurations of parameters for [proxided] requests (zones’ board inside Luminati account).

Note that residential proxies require special approval, so it took me 3 working ways till I could get that zone working.
The personal Luminati manager (assistant) has been helpful in getting acquainted with LPM, creating zones and making it to work.

We’ve set up 4 zones inside of LPM:

  • data-center proxies zone, port 24000 (port number is assigned automatically or set manually)
  • residential proxies zone, port 24001
  • city asn proxies zone, port 24002
  • gip proxies zone, port 24003

In the test code we were using port 24001, corresponding to the residential zone of my LPM, running at the address http://127.0.0.1:22999/. Basically the process looks like this:
luminati-proxy-process

Below you can see the proxy ports utilizing different proxy zones.

luminati-proxy-manager

Note: There is also a mobile IPs option [proxies zone]. Mobile IPs are almost unblockable. Mobile proxies usage covers (1) website performance, (2) retail/travel: prices fetching, app promotions (Adverification).

So, we started to test YellowPages.com to gather links thru a simple GET request of hotel keyword and 2 letter state abbreviation (NY, CA, etc.)

https://www.yellowpages.com/search?search_terms={0}&geo_location_terms={1}&page={2}

We performed consequent requests to the YP site and extracted all US hotels rotating thru the 50 US states abbreviations array (as the scraper had not gotten new hotel items for a given state).

Test code

Results

All hotel links/items amount: 7147
Total requests: 263
Process time (seconds): 1267.1

We counted total links to assure that the aggregator does not expose the same hotel links in order to spoof a scrape-bot. The test result has shown us that for each request to a web page we got an average 7147/263 = 27 items per page (32 items on a page). The proxy extraction time was 1267/263 =~ 5 seconds per request.

Other service figures

Luminati has a 4-second timeout for DNS lookup.

Network uptime 99.9%, and it can be viewed live at https://luminati.io/cp/status

 

Conclusion

The Luminati proxy provider proved to be reliable in scraping from a challenging site aggregator. Its residential proxies proved to be high output proxies, and the scraper ran seamlessly using them.