Both my partner and I were asking: what factors influence website traffic? How does one find any correlations in business intelligence related to organic searches? This post was born out of my attempt to join together both traffic data from the business blog (data source being Google Analytics) and real organic queries done in Google, in order to get some insight into which items my traffic correlates with, in specific how these items ( i.e. those which people are searching for) might have influenced website traffic.  

The Google Lab project called Google Correlate is not as well known to most website owners as Google Analytics or Google Adwords are.

Google Correlate trial idea

Google has not just indexed the whole web but also, being the leading search engine, has accumulated organic search term queries over the years. These search terms vary in popularity over time. Why not use Google’s courtesy to find correlations between the organic traffic flow of my site and these organic queries? Now I can get the site’s time series from Google Analytics and insert it into Correlate Labs to find out what the “chemistry” is.  What’s going to be the result? The relevant organic queries that are closest to my site incoming traffic data in time span. This might give me more insight into what has influenced the traffic, and what I might do to improve the content strategy or the AdWords campaign direction.

Of course there might be some correlations with weird search terms, yet it does not eliminate the overall relationship between people’s online searches and the organic website traffic. In addition, Google Correlate gives an opportunity for any arbitrary search query to find correlated queries based on the same Pearson’s linear correlation coefficient.

1. Prepare data

The Google Correlate tutorial is here. First thing is to upload and save the site’s web traffic data from Google Analyitcs onto a spreadsheet (of course you can export other analytics tool data if preferred). If you want to evaluate only search traffic visitors for correlation, then in your account at Google Analytics dashboard go to Traffic Sources -> Sources -> Search -> Overview. Then at the Chart upper menu click Export -> Google Spreadsheets.

The raw data appear in Google Docs in this way:

Remove the header lines and preceding empty lines. Then manually enter the dates for the first and second week like 10/6/2012 and 10/13/2012 and, selecting both, auto fill them down. Don’t forget to delete the last sum value in the visitors column. Now your data should look like this:

Your data/time series is ready for evaluation.

2. Upload data

Now Ctrl+A (Cmd + A) and Ctrl + C to copy entire set and open Google Correlate in new tab. In Google Correlate Labs on the upper line click “Enter your own data” link to upload the time series.

Now you just choose the right time interval, whether Monthly Time Series or Weekly Time Series, and in the Edit area (at bottom) insert the copied data set. Unfortunately the Google Lab has not yet provided a direct way for data upload from the Analytics account. So now your data are in the correlation engine, which is ready to start.

3. Correlation computing

Google Correlate will compute the Pearson Correlation Coefficient between any given time series and the time series for every query in its database. The queries that Correlate engine shows you are the ones with the highest correlation coefficient (i.e. closest to r=1.0). Results with the highest coefficients give useful insight. Thus the search engine associates the data you provided with organic search queries from Google database. The result might not look very encouraging,  yet it still gives food for brainstorming on how to relate with some public searches, improving requests for content, developing strategy and so on. In my case I got some information about seasonal traffic changes and new key words for exploring.

4. Correlation by State

Google Correlate also provides search term statistics for US States. Here the popularity of the search terms correlates with a data set related to a certain US area. From the Google tutorial page: “Search terms are often popular in some states and less popular in others. To find terms whose pattern of activity across the United States reflects your own US states dataset, enter your data using the link above”. In this case you need to choose ‘US States’ and enter or upload custom data to get the information about which search queries are popular in which states. Please read the tutorial to see how to do it.

In the next post I’ll share on finding the frequency correlation for any given search term with other database located search terms queries using the same Google Correlate.

Conclusion.

This experimental tool in Google Labs has been a handy means for finding data relations in time series and even time series filtered by areas (limited by US states so far). This is by no means the most powerful web traffic information tool, but it might provide some hints for seo and business bloggers regarding other factors which influence web traffic in addition to traffic analytics data and search statistics from Google Webmaster Tools.