“Game Analytics – Maximising the Value of Player Data” – book review

Executive summary

Oh, go on.   Do it.   If you’re working in this area, or nearby, you need to check this book out.  Find a bit of budget, or a well stocked library, and do it.

Why?

Although this topic has generated many mega-tonnes of slideware over the past several years,  there just isn’t much that’s been written about the topic that’s in actual honest to goodness sentences.   Game Analytics – Maximising the Value of Player Data is, to my knowledge, the first book on this topic ever, in the known universe.   So if you’re in the field, how could you possibly not be curious?

What will I be able to do once I’ve got this book that I can’t do now?

I don’t know.     Perhaps you know it all already.  But can you afford to be complacent about that?   Probably not.  So there you are:  you need to read it, if only to reassure yourself that you don’t need it.

Nitty gritty details

In UK, the book costs £90 in hardback,  £72 as an e-book.    The work has qualities that suit each format, and qualities that make each format awkward in its own special way – – but unfortunately, unlike with O’Reilly, there isn’t a dual format purchase option.  (Springer-Verlag:  it would be great if you could sort that out.)

The book weighs in at 800 pages and has, I think,  52 authors, give or take a few.  Some of the book’s most frustrating and most useful aspects flow directly from its form factor.  I bet you can guess what they are without even looking at the book.

On the upside, the book has wide expanses of rich leafy ground to truffle around in.  There is a spread of industry contributors, industry interviews, and academic contributions.  Topic coverage ranges through basics like metrics terminology, and practical issues to do with sampling, through to less ubiquitous techniques such as physiological measurement.    I think it’s unlikely you could have a really good rummage and not come up with something that you’d want to earmark for action – or at least contemplation.  If your mileage differs – let me know in the comments.

On the frustrating side,  the diversity of topics and at times unexpected differences in the surprisingness of the content make it difficult to predict exactly where in the book you might find something that has high potential value.  I have an electronic review copy and I am finding it difficult to interact with the text in the way it calls out for.   I find electronic copies brilliant for structured or keyword based retrieval, for portability, and for sharing.    But  when I want to get to grips with something that spans 800 pages,  and isn’t amenable to keyword searching,  I want a hard copy.  I think this is a book to be flicked at and dipped into on a rainy Tuesday, sitting in a comfy chair,  with a bunch of coloured sticky labels and markers to hand, for when things get good, and an ample supply of tea and biscuits.

Juicy crunchy bits

Interestingly, many of my favourite chapters are authored or co-authored by the editors.  I enjoyed the industry contributions, but my “desert island” picks – the ones I would take with me to a desert island –  would be the core content hidden in the middle in Part Three, Game Data Analysis, and in Part Four, Game Metrics Visualisation.  These are the ones that give you the tools to forge your own path. Of the five chapters in Part Three, my top picks are the ones on Game Data Mining, which gives an overview of data mining techniques as applied to game data, the chapter on Meaning in Gameplay: Filtering Variables, Defining Metrics, Extracting Features, which addresses the ever-so-key question of what to look at, and the chapter on Creating Models for Gameplay Analysis.  These are well complemented by a chapter containing an interview with Digital Chocolate, and two chapters of case studies.   In Part Four there is interesting work on Spatial Game Analytics, Visual Game Analytics, and Visual Analytics tools for analysing temporal progression and behaviour.    I am a sucker for a nice visualisation.   If it comes with a biscuit so much the better.

But there are also gems which catch the light in unexpected ways, such BioWare’s benefits from adopting developer-facing telemetry.   And it’s  certainly interesting to hear such a variety of industry voices – from Sony to Zynga.   Doing interviews to supplement practitioner-authored chapters is a method that mostly works well as a way of capturing insights from practitioners who might not otherwise contribute, either because they are too busy or too pencil-shy.

Surprises

Mercifully – for them and for us – the editors take a pragmatic approach to their subject, and seem to have almost entirely avoided the horror of getting tangled up in academic theories about the nature of games, or play.   This is a literature that has yet to deliver any delight to me – so for me it was a happy surprise that they mostly didn’t go there.

There are also things missing that I find surprising.    One is controversy.   Perhaps expert issues in the field are not yet well enough defined for any clear battle lines to be drawn.   But this situation on paper contrasts pretty sharply with the what I see on the ground (see e.g. my notes on a recent Bafta games event).   Some of these issues are covered in a chapter on stakeholders but this is a relatively passionless structural treatment.

I also wouldn’t know from reading (most of!) the book which techniques are routine, and which are hugely innovative, or about whether there is a strong mapping between measurement and analytical techniques to genres and questions which are being asked.

Wish list

I think the book would benefit from more high level conceptual organisation, a kind of graphical map that positions the other contributions, and the directions the topic is trending in.    While individual topics (e.g. the chapter on Metrics terminology)  are often well structured, there isn’t anything that lets me see at a glance what’s going on with the whole book.   Something like the O’Reilly’s recent  “Analyzing the analyzers” analysis would be interesting to see.

It would also be valuable to see  more focus on the range of actions which can be taken as the outcomes from analysis.   The book illustrates the traditional knowledge cycle of questions, answers, and new questions.    But there is not very much treatment of the role of multivariate testing in game analytics.    The focus is very much on description of phenomena, rather than analytics tightly coupled to design intervention.

Also, there is surprisingly little focus on the nuts and bolts of using analytics to support freemium (or paymium) game models.    In way this is refreshing.   But, like it or not,  and there are loud voices on both sides of the house, the drive to incorporate analytics into every new product launch is largely powered by this business model.   And there is very little in the book in the way of practical business-focussed case studies looking at how analytics can be applied to product management.    The book has its heart in games user research,  rather than business intelligence, or CRM automation as a component of design.

Yah, it's me!

From an applied, commercial point of view, the fact that there are commercial third party offerings for advanced game analytics, such as those from Playnomics and Games Analytics,  is an aspect of the topic that deserves more than a passing reference  – though of course the risk here is that the content is likely to date very quickly as vendors evolve their offerings and their presentation at the same ultra-rapid pace as seen in the underlying games market itself.

Next steps

I am pretty sure the authors are or will shortly be at work on some kind of sequel, after they have mopped their brows.   From the comfort of a safe distance,  I think that one potential follow up is a really short, highly focussed, yet challenging introductory book.    That would be a great place to put the already done work on definitions,  and basic material on sampling, stakeholder analysis,  analytical workflows, and architectures.  The material also potentially suits online exercises and examples – which is not quite such a  low-effort offering,  terms of content development costs, but would be a great way of showing, hands on,  what can be done.

The big question, I think, is about the viability and usefulness of this option for future work is whether analytics is a game discipline that is going to make its way onto game training curricula, and if it does, whether that training is going to be relevant to actual praxis.  At the moment I see a trend to hire quant jocks, data scientists, strategy consultants, and MBAs for data-intensive roles.  The assumption is that  that they will pick up – or create – relevant industry trends while swimming happily around in the deep end, buoyed up by commercial and analytical experience won elsewhere.

The best way to surf this demand curve, content-wise,  is probably not via an undergraduate-friendly curriculum offering.  A hard core high pressure boot camp pitting teams of  MBAs and ML specialists against each other could be a better fit to current market zeitgeist.  Whether or not there is potential for creating a community of practice in a market where everyone is devoted to stirring their own secret sauce is debatable.   But the same holds for algorithmic trading, and they seem to manage it.

Understanding matching estimators using jelly beans

Why understand matching estimators?

What piqued my interest in matching estimators was Christophe Safferling from Ubisoft’s  talk at a recent Games Industry Analytics Forum event.  He talked about using matching estimators as an alternative to A/B testing, and showed how Ubisoft’s insights into the impact of a new, additional payment option were transformed by looking at the problem in the right way.  This made me realise I should probably understand what he was saying better than I did, and that is the reason I’m writing this post.  Trying to explain something is a great way of trying to understand it.

Christophe motivated the discussion by explaining that although A/B testing is usually the best way to assess the impact of a design change, one of the problems with it is that it isn’t always possible to do it.  How a design moves forward, in practice,  is often via the addition of a shiny new feature that is presented to the user an optional extra.  Once the addition is implemented, we then need to understand whether or not it is an improvement. To do that, so we observe what becomes of the people who engage with it, compared to the people who did not.

The difficulty with this approach lies in what to make of the result.  Which is where matching estimators come in.  (And jellybeans.)

Why use jellybeans?

http://www.flickr.com/photos/nifmus/426478050/

Personally, I find jellybeans a useful formalism.   But the risk of beans going missing during a calculation can be reduced by using different types of beans.

Random assignment

One way to understand the pitfalls involved in interpretation of outcomes from self-selected groupings is to contrast it with the interpretation of outcomes from A/B testing.

A/B  and  multivariate testing typically employs random assignment of people to experimental treatments (e.g. varying something about the game design), and compares the outcome to that of a control group.   You then observe the outcomes in the different groups.  When you see that the treatment  vs control groups have different outcomes, it’s relatively straightforward to figure out what that means.

If you see differences, it does not necessarily mean that the differences  you see  are due to the different treatments the groups received.  It is possible that the observed differences are due to inherent population variation or measurement error.   There are well understood ways of assessing this.  You should give your favourite statistician a jellybean and observe the result.   This flow is shown in Figure 1.

Figure 1

This is classic stuff.  It’s what Google, Amazon, Facebook, and the other big platform providers do a lot of – every day.

But it’s not necessarily an appropriate way of analysing outcomes when people have self-selected into different groups.

Self-selecting groups

What happens with self-selecting groups is similar in some ways to what happens in a randomised trial: the groups get different experiences, and the outcome is observed.  But the interpretation of the result isn’t as straightforward.   The complication is that the groups might be different because of characteristics of people who make one choice rather than the other, rather than because of the consequence of the choice itself.

In the example in Figure 2, we can see that the people who chose to “sparkle” are different from those who don’t.   So it’s not obvious whether what’s responsible for the outcome is the attribute, or the effect of the treatment (or, indeed, a combination of the two).

Figure 2

Matching estimators

The core mechanic of the matching estimator method for evaluating the effect of a self-selected treatment condition is to match each individual in the treatment group with a best-matching individual (or composite individual) in the  control group, and then compare these matched pairs using established statistical methods for matched samples.  There are a huge number of variations and refinements on this theme but they all share this basic idea of finding an appropriate match for each individual in the treatment group.

This method can reveal differences that are due to the difference in experience, rather than self-selection, because it tries to select, post-hoc, a control group which very closely matches the characteristics of of the self-selected group.  In the case of the jelly-beans, a matching estimator could be constructed by comparing the yellow beans in each group.

Figure 3

Figure 3

The devil in the detail of using matching estimators lies in choosing which of many possible attributes to consider, and which similarity functions to use, when assessing similarity and finding a match.  Even if you consider a relatively simple entity such as jellybean, it has a lot of attributes which could be a potential basis for a match, as you can see in Figure 3.   And people are much more complicated than jellybeans.

This is where the matching algorithms such as Mahalanobis matching mentioned by Christophe come into play: it is one of the many different similarity metrics  and similarity-revealing analytical methods that can be used as a basis for finding the most useful match.

As to the question of selecting which attributes to match on, there is conflicting guidance about whether the attributes used for matching should be strong covariates of the treatment assignment or of the outcome measure.  But there is agreement that the attributes used should not themselves be affected by the treatment, should not be perfectly predictive of the treatment,   and should have a similar or at least overlapping distribution in both the treatment and the control group.

Horses for courses

Matching estimators are a useful, established  technique which is used extensively in progamme evaluation and econometrics.   There is a big literature around the topic.   If you want to know a bit more, I’d recommend starting out, as I did, by looking at the chapter by Professor Petra Todd in Palgrave’s Dictionary of Economics.

Sometimes what is most interesting about the effect of an extra option is the way the option acts to filter people into groups with different characteristics.   In this case, abstracting away from differences between groups is precisely not what you want.   Matching the investigative technique to what’s interesting is an important form of matching, too.   When doing analytical work it is important to stop and smell the flowers down the garden path.

More information

If you want to know even more, here are some online lecture notes I’ve found useful for exploring this topic:

  • Guido Imbens, Stanford, talking at UCL
  • Michael Roberts, Wharton, from his course on Empirical Methods in Corporate Finance
  • Scott Rozelle, Stanford, from his course at the LICOS Centre for Institutions and Economic Performance
  • Elizabeth Stuart, Johns Hopkins, lecture for the Society for Prevention Research
  • Jeff Wooldridge, Michigan, course for the International Institute of Labour course in Microeconomics

If I need to know more, I think Paul Rosenbaum’s book on the Design of Observational Studies will be my next stop.

When there is no there there: going large with A/B testing and MVT

Gertrude Stein was right.   There is no there there.   There is no Facebook.  There is no Google.  There is no Amazon.  There is no such thing as a Zynga game.   There isn’t even a  Bing.

I’m not talking about what it’s like living off-grid, by choice or necessity.

I’m talking about the fact that when we interact online with any of these major services, we interact in one of the local reality zones of their multiverses.   The dominant large-scale consumer internet apps and platforms do not exist in a single version.  They all simultaneously deploy multiply variant versions of themselves at once, to different people, and pit them against each other to see which ones work better.   The results of these tests are used to improve the design.  Or so goes the theory.  (It should be borne in mind that such a process was probably responsible for the “design” of the human backbone.)

This test-driven approach to design development and refinement has been promoted and democratised as a “must-have” for all software-based startups.   Eric Reis, of Lean Startup, is probably the most famous.  (Andrew Chen is worth checking out, too, for a pragmatic view from the trenches.)

How do the big platform providers do it? Lashings of secret sauce are probably involved.    But there is a lot of published commentary lying around from which the main ingredients of the sauce can be discerned –  even if the exact formulation isn’t printed on the label.   Here are some resources I’ve found useful:

  • Greg Linden, the inventor of Amazon’s first recommendation engine, has a nice collection of fireside tales about his early work at Amazon in his blog, including how he got shopping cart recommendations deployed (spoiler:  by disobeying an order – and testing it in the wild)
  • Josh Wills, ex-Google,  now Director of Data Science at Cloudera,  talks about  Experimenting at Scale at the 2012 Workshop on Algorithms for Modern Massive Data Sets, and provides some analytical and experimental techniques for meeting the challenges involved
  • Ron Kohavi, ex-Google, now Microsoft, has a recent talk and a recent paper about puzzling results from experimentation and how his team has solved them, in his 2012 ACM Recsys Keynote speech, and his 2012 KDD paper.

There are some commonalities of approach from the ex-Googlers.   Assigment of people to experiments, and experimental treatments, is done via a system of independent layers, so that an individual user can be in multiple experimental treatments at once.   Kohavi talks about how this can go wrong,  and some ways of designing around it using a modified localised layer structure.

Another efficiency-boosting practice is the use of Bayesian Bandit algorithms to decide on the size of experimental groups, and length of experiment.   This practice is most familiar in clinical trials, where adaptive experimentation is used to ensure that as soon as a robust effect has been found, the trial is halted, enabling the ethically desirable outcome that beneficial treatments are not withheld from those who would benefit, and injurious treatments are stopped as soon as they are identified as such.   It’s so much flavour of the month that there is now a SaaS provider, Conductrics,  which will enable you to use it as a plugin.  They also have a great blog so check it out if you’re interested in this topic.   Google Analytics Content Experiments also provide support for this, in a more constrained way.

So there are lots of hints and tips about optimising the mechanics of running a test.   But there isn’t as much talked about what to test, and how to organise a series of tests.  Which is, for most people, the $64 million question.   This is something I’ve done some thinking on and talking about and advising on.   I’m still working it through, though  – and if you are too, and you know of any interesting resources I’ve missed -do share them with us.