How to Server-side Google Analytics

There are many users who surf the web without a JavaScript enabled client - particularly users of the mobile web, and some portal users. The question arises: how do you track visitors with Google Analytics (hereafter "GA") that don't have a JavaScript enabled client? I recently took a stab at this problem by reverse-engineering urchin.js and re-writing it in Java. But it didn't work. This is my attempt to rectify the situation.

A bit of background: GA is normally installed on a site via a piece of JavaScript that executes onload. The script (which used to be called urchin.js and now is ga.js) examines the page and generates an HTTP request for a GIF from GA with lots of URL parameters. Based on those parameters (and probably the request headers) GA recieves all of its data. So our problem is in correctly constructing that URL.

But figuring out how to correctly construct that URL is tricky, and it's mostly undocumented. Google has taken pains to wrap the API in object-oriented JavaScript - but this is little help if you need to call that API from another language. Plus, the script is "minified" - and therefore unreadable.

After a little research I found a troubleshooting page that describes the HTTP request URL in more detail.

This is very helpful, but not sufficient for creating server-side verstion of the ga.js script. In particular the following questions are begged by the documentation:
  1. Most of the fields have to do with e-commerce, and can be ignored for basic analytics. Are they really optional?
  2. The fields that describe client capability are more problematic - can they be omitted?
  3. What use GA makes of the request headers, especially when the information is duplicated in URL parms?
  4. Why are encoded cookies sent at all? Can they be omitted?
  5. What is the "X10 data parameter"?
To answer these questions I will be constructing URLs with a text editor, sending them with curl, and seeing what impact it has on the GA reports. (Hopefully there won't be much latency or testing will be slow!)

[It occured to me while writing this that other mobile web developers must run into the same issue, so I search for "mobile analytics". Some interesting hits. But remarkably it seems like other folks concentrate on doing things like log file analysis. I personally think that's a bad idea. It also occured to me that we could use another analytics software, like the open source piwik, that has a more open tracking API. However switching analytics providers is only a last resort.]

[Google is probably wise to not provide server-side access to their analytics - they'd have to support a few more languages, and integration is a lot more difficult making the support load higher.]

4 comments:

Andy Bovingdon said...

It will be interesting to see what you can achieve with Google Analytics Josh. At Bango we immediately ruled out JavaScript from our mobile web analytics product given that such a small percentage of mobile browsers fully support it.

Server side log file analysis can work to a degree but does not capture all the information relevant to mobile browsing. In fact there is a lot of stuff you would like to know about mobile browsers that is not consistently passed in headers today. Also the use of cookies is unreliable on mobile and IP addresses typically refer to carrier gateway machines rather than the client device.

Bango Analytics uses two methods for recording traffic. Firstly we use the familiar 1x1 image tag for tracking page views and secondly we use URL redirection for the most accurate tracking of ad clicks.

In both these cases a Bango server is involved - either for the redirect or the image serving. This allows us to use a mobile fingerprint algorithm which gives us a persistent unique user ID which resolves the cookie issues. It means we can count unique visitors with some accuracy and identify not only the MNO but MVNOs (Sprint has 27 now I believe). This fingerprint algorithm uses everything from device settings, carrier and country data as well as other information provides securely to Bango by partners, which includes carriers as well as browser and handset manufacturers.

Tracking the very latest mobile devices is quite easy, but tracking the other 98%, the real mass market mobile browsing, is very difficult as you are discovering. This is why none of the major desktop web suppliers currently provide more than basic handset and carrier detection.

I'd recommend trying our free product to see how this works - visit bango.com/analytics. Our more advanced products also let you extract the data into your favourite analysis tools.

Dean Collins said...

Hi, you're a little incorrect in your comments.

The majority of mobile analytics vendors (8 out of 9) use page tags Admob, Bango, Mobilytics etc.

The company I work for www.Amethon.com uses wireless capture (packet sniffing) which offers significant advantages over both log file and page tagging.

Feel free to check out the pdf download on our site for more information.

Or alternatively check out http://www.brysonmeunier.com/the-mobile-seo-s-guide-to-mobile-analytics for a review on the major providers.

Cheers,
Dean

josh said...

@andy: Thanks for explaining some of the intricacies involved - I can see that I will indeed be loosing potentially significant information. However, WRT to session and user tracking there is an (admittedly invasive) solution: URL rewriting. That said, I'd be happy to check out bango.

@dean: Not sure what you mean by "page tags" or "wireless packet sniffing". Perhaps you'd be kind enough to explain what you mean by that?

Eugene John Park said...

Any luck answering these questions? Do the GA cookies matter at all? Are the rest of these parameters optional?