cookieMost of developers stuck with the cookie handlng in web scraping. Sure it’s a tricky thing and this once has been my stumbling stone too. So here mainly for new scraing engineers i’d like to share of how to handle cookie in web scraping when using PHP. We’ve already done the post on scrape by cURL in PHP, so here we’ll only focus on a cookie side. The cookie is a small piece of data sent from a website and stored in a user’s web browser while the user is browsing that website. So when browser requests a page and along with web content cookie is returned browser does all the dirty job to store cookie and later send them back to server which rendered that web page in following web requests.

Cookie Jar

With server side web scraping a script must handle all the abovementioned processes. The cURL library allows to handle requests with cookie thru using a “cookie jar”. “Cookie jar” is a simple text file stored on scraping server to save and yield cookie in http requests. With such a cookie handler scraping jobs will be done perfectly seamless. After we’ve set the cURL options we set a certain file as “cookie jar” that will contain cookie:

Now the library will automatically create such a file and handle all the cookie thru it. An example of a cookie file:

Code

Now we put the whole function using cURL library handling cookie:

Update

For the second and following calls you just execute the same function and it will refer to the same cookie.txt file, passing cookies to a server and saving new values (if any) from the server in the same cookie file. That’s the convenience of the cURL cookie jar.

That’s it, welcome your comments or questions.