auth-form-logoRecently I was challenged to make a script that would authenticate through a bot-proof login from and redirect to a logged in page. 

Main concept

The form is here. Through a web sniffer I’ve found out the form is sent as POST xhr/ajax request with on-fly generated security parameter – hash. See a figure below:post payload

My script authentication concept is put down in the following steps:

  1. Load a form into a browser (PHP cURL with CURLOPT_RETURNTRANSFER => true).
  2. Fill the form out with login/password (on page JavaScript).
  3. Generate security hash based on the form serialized inputs. Append a hash input field (on page JavaScript).
  4. Run on page JavaScript to automatically submit the extended form. We submit the form to the same script as a POST request.
  5. Having been rerun, the script gathers the POST payload and submits it as POST xhr to the target server for authentication.
  6. Then with PHP cURL, we request logged.php file to get the result (test or fail).

Let’s expose each of the steps with some code. Or you might want to jump to the whole code.

Load form into browser

Fill the form out

We inject JavaScript code to be run with GET parameters (login/password).

Generate security md5 hash

The form’s Javascript file (md5.js) produces hash value out of serialized form upon Login button click:

To load md5.js in our script we included it in the page’s head section. Now we create hash value and append a new hash input field:

Now the form is extended with a hash input.

Auto-submit the extended form

To send the form to our own auth script we change form’s target action attribute to #. Then we may trigger from submission.

Submit as POST XHR to the target server and fetch auth result

We simply submit to auth.gripon.ru for authentication. To make it XHR as in the original form we add this:
curl_setopt($ch, CURLOPT_HTTPHEADER, array(...));

The whole code

Disclaimer

I admit, it might not be the most optimal way to authenticate through this bot-proof form, yet I think some may get some ideas of how to handle tough logging-in cases in the web scraping. Welcome to suggest some better solutions.