The old challenge: Single Page App JavaScript frameworks
We’ll get this out the way first. Pageview tracking with an SPA JavaScript framework is a well-known and well-documented challenge, and for many typical PWAs it’s the most immediately obvious adaptation which is required.
In short, if your PWA uses a framework like React or Angular to handle navigation events, then standard pageview tracking will not function. In contrast to traditional websites which do a round-trip to the server every time a page is loaded, SPAs load content and manipulate the URL with the History API. In other words, everything is done client-side and you need to account for that.
To get around this, you’ll need to hook your tracking code into your app’s routing to ensure ‘virtual’ pageviews are fired at the appropriate time. The Google Developers site has some basic documentation, but regardless of whether you use gtag.js or Google Tag Manager to load analytics.js, care and thorough testing are required to avoid the most common issues:
- Subsequent pageviews not being tracked
- Mismatch between page paths / titles and actual application state
- Mis-attribution of sessions (the HTTP referrer vs gclid problem)
- Pageviews split across multiple URLs (lack of canonical paths)
Things get even more complex when you’re working with advanced implementations, such as sites which use ecommerce tracking. Tag sequencing and data layer merges will frequently pose obstacles, while proper measurement planning and data layer governance become more important than ever.
This is a hefty topic in itself, and I’ll be writing up a full guide for the next post in this series on PWAs & analytics. Watch this space!
The new challenge(s): push notifications, offline usage, and more
Adapting your analytics implementation to function with your JavaScript framework is essential for data integrity, but what about data relevance? PWAs bring a whole host of exciting new capabilities to the web, including push notifications, background sync, home screen installation and offline usage. How can we capture data about these events in Google Analytics?
Key Distinction: Window vs Service Worker
Actions which can be initiated by the page – such as subscribing to push notifications – are easily tracked using conventional methods. Let’s say your app’s main script includes a function which initiates push subscriptions and your analytics implementation uses GTM; in this case, you could simply add a data layer event to your code as so:
window.dataLayer = window.dataLayer || []
window.dataLayer.push({'event': 'subscribed'})
You can then listen for this in GTM with a Custom Event trigger and use it to fire tags to your heart’s content. This is easy. But what about things that are initiated by your PWA’s service worker, like notifications?
The key thing to remember here is that your service worker – the special script which sits in-between the browser and network – runs outside the main thread and does not have access to the Window object, meaning that it cannot access the data layer or the ga command queue to create trackers. In short, the actions of a service worker cannot be tracked using the normal on-page, JavaScript-based tracking snippets. What we can do, however, is configure our service worker to send HTTP hits directly to GA.
Ultimately, all that analytics.js does is translate on-site behaviour into POST requests. The interactions and events we care about are encoded into attribute-value pairs which can be sent to Google as query parameters on a URL (e.g. https://www.google-analytics.com/collect?t=pageview). That’s all analytics hits really are. We can sidestep analytics.js and execute the translation process ourselves by building and sending the requests with our service worker using the Measurement Protocol API.
How to send hits from the Service Worker
Google’s PWA training course includes a practical example of how this can be done. A helper script, analytics-helper.js, defines a sendAnalyticsEvent function which takes two parameters, event action and event category, while your tracking ID is set as a constant. The script then uses these three values to assemble a hit and send it directly to the GA endpoint.
I’d suggest reading the (well-commented and ES6-y) script on GitHub to get an idea of how this could work, but for now I’ve pulled out a few code snippets to illustrate the essential functionality. After checking that the client is subscribed to push, the function creates a payload object using the provided parameters:
const payloadData = {
v: 1, // Version Number
cid: subscription.endpoint, // Client ID
tid: trackingId, // Tracking ID
t: 'event', // Hit Type
ec: eventCategory, // Event Category
ea: eventAction, // Event Action
el: 'serviceworker' // Event Label
}
They’re then formatted into a complete URI and sent to the Measurement Protocol API endpoint:
// Format hit data into URI
const payloadString = Object.keys(payloadData)
.filter(analyticsKey => payloadData[analyticsKey])
.map(analyticsKey => analyticsKey + '=' + encodeURIComponent(payloadData[analyticsKey]))
.join('&')
// Post to Google Analytics endpoint
return fetch('https://www.google-analytics.com/collect', {
method: 'post',
body: payloadString
})
Configuring your worker to send hits using this function is as simple as importing the complete helper script into your service worker file (sw.js or whatever) and then calling the function with the correct parameters:
// Import helper script at top of our service worker:
importScripts('js/analytics-helper.js')
// Call this function inside a relevant event listener:
sendAnalyticsEvent('click', 'notification')
Test your code thoroughly, and remember that the Measurement Protocol does not return HTTP error codes by default. To test your hits, change the endpoint from ‘/collect’ to ‘/debug/collect’ and review the JSON responses (complete guide is available here). Alternatively, try the handy Hit Builder.
The final challenge: what if the user is offline?
Offline functionality is one of the biggest draws for PWAs, and undoubtedly one of the most powerful features of service workers. Your site’s core assets (commonly an app shell) can be aggressively cached upon installation, thereby ensuring basic functionality will continue when network connectivity is poor or non-existent. Smaller static PWAs may even be able to comfortably preload the entire website on the first page load, all required assets being stored for offline use in the browser cache.
But where does this leave analytics? If a user is able to continue interacting with our PWA when the network has been temporarily disconnected, any hits generated by our tracking code won’t reach the Analytics endpoint. They’ll simply fail, and it will be as if those interactions never took place.
Thanks to the Fetch API, we have the ability to listen for – and dynamically respond to – requests using our service worker. In this instance, we’ll listen for unsuccessful HTTP requests to the Measurement Protocol, store them in IndexedDB, then resend them later. The Queue Time value (the ‘qt’ parameter) can be used to adjust the timestamp based on how long the hit was cached for (up to a max of 4 hours), thus ensuring that the events in our reports reflect the timing of actual user behaviour.
The best thing of all: Workbox has a module which does all of this in a single line of code.
If you haven’t used Workbox before, it’s a powerful library from Google which – to borrow their description – “bakes in a set of best practices and removes the boilerplate every developer writes when working with service workers.” You can watch the cool introduction video from Chrome Dev Summit, or just review the docs, but in a nutshell it provides a set of command line tools which make it easy to generate precache lists, define strategies for runtime caching, carry out debugging, and lots more.
Note: As cool as it is, I wouldn’t recommend playing with Workbox until you’ve got a firm grasp of how to build service workers manually (perhaps using Google’s PWA training materials). Workbox includes a really powerful set of tools, but they’ll make more sense and you’ll use them far more effectively if you’re confident with underlying mechanisms.
Enabling offline analytics in Workbox is as simple as calling the initialize()
method:
// Import the Workbox library at the top of your service worker
importScripts('https://storage.googleapis.com/workbox-cdn/releases/4.2.0/workbox-sw.js')
// Enable the Analytics module
workbox.googleAnalytics.initialize()
That’s all that’s required! Note that this system won’t work for hits sent from within the service worker, such as the push notifications example we discussed in the previous section; this is for ‘normal’ hits only. This is isn’t typically a problem, since hits sent from the service worker generally require connectivity in the first place.
There’s one piece of the puzzle remaining: how can we differentiate between offline and online interactions? And how can we find out how long hits were queued for?
To do this, we can use custom dimensions and metrics. All that’s required is a few quick edits to our tracking code, analytics property, and Workbox config:
- First, create a new dimension in Google Analytics with a scope of ‘hit’. We’ll call this dimension ‘Network’.
- Next, create a new metric with a scope of ‘hit’ and formatting of ‘time’. We’ll call it ‘Queue Time’.
- Modify your normal on-page tracking code to set your Network dimension to a default value of ‘online’. For GTM-based implementations you can simply edit your GA settings variable, while for on-page (i.e. traditional snippet-based) analytics you can set it on the tracker like so:
ga('set', 'dimension1', 'online')
. Be sure to use the correct dimension index number. - Modify your Workbox analytics code. We’ll set our new dimension to a value of ‘offline’ for all hits it catches and stores, and set our new metric to be the value of the ‘qt’ parameter divided by 1000 (i.e. convert from milliseconds to seconds). Again, be sure to use the correct index numbers for your dimensions and metrics:
// Enable the Analytics module
workbox.googleAnalytics.initialize({
// Set custom dimension in slot index 1 to a value of 'offline'
parameterOverrides: {
cd1: 'offline'
},
// Set custom metric in slot index 1 to qt/1000 (i.e. queued seconds)
hitFilter: params => {
const queueTimeInSeconds = Math.round(params.get('qt') / 1000)
params.set('cm1', queueTimeInSeconds)
}
})
Full documentation on each of these two techniques (which don’t necessarily have to be used together) is available in the Workbox documentation.
To test whether offline analytics functionality is working as expected, you’ll need to familiarise yourself with Workbox Background Sync. This involves checking that the relevant requests are being queued in your browser’s IndexedDB, and that they are being appropriately modified and sent when connectivity is restored. For further guidance on doing this, check out this tutorial.
Wrapping up
Exploring the possibilities of PWA analytics is exciting because it touches on both integrity and relevance, the two cornerstones of good analysis. Progressive web apps are one of the most powerful new additions to the web platform in its history; if we, as analysts, are going to continue to deliver the meaningful insight that clients expect, it’s vital to start experimenting with the opportunities and challenges that are posed by this new set of technologies.
If you’re thinking of adopting any of these technologies – whether that’s a full-blown PWA or a subset of service worker-enabled features such as offline functionality – you can get in touch about building a brief. We can help you create a measurement plan and implement the necessary tracking to ensure your setup delivers the insights you need.
Thanks for reading! If you have any questions or comments, you can reach me on Twitter @tomcbennet.