Understanding matching estimators using jelly beans

Why understand matching estimators?

What piqued my interest in matching estimators was Christophe Safferling from Ubisoft’s  talk at a recent Games Industry Analytics Forum event.  He talked about using matching estimators as an alternative to A/B testing, and showed how Ubisoft’s insights into the impact of a new, additional payment option were transformed by looking at the problem in the right way.  This made me realise I should probably understand what he was saying better than I did, and that is the reason I’m writing this post.  Trying to explain something is a great way of trying to understand it.

Christophe motivated the discussion by explaining that although A/B testing is usually the best way to assess the impact of a design change, one of the problems with it is that it isn’t always possible to do it.  How a design moves forward, in practice,  is often via the addition of a shiny new feature that is presented to the user an optional extra.  Once the addition is implemented, we then need to understand whether or not it is an improvement. To do that, so we observe what becomes of the people who engage with it, compared to the people who did not.

The difficulty with this approach lies in what to make of the result.  Which is where matching estimators come in.  (And jellybeans.)

Why use jellybeans?

http://www.flickr.com/photos/nifmus/426478050/

Personally, I find jellybeans a useful formalism.   But the risk of beans going missing during a calculation can be reduced by using different types of beans.

Random assignment

One way to understand the pitfalls involved in interpretation of outcomes from self-selected groupings is to contrast it with the interpretation of outcomes from A/B testing.

A/B  and  multivariate testing typically employs random assignment of people to experimental treatments (e.g. varying something about the game design), and compares the outcome to that of a control group.   You then observe the outcomes in the different groups.  When you see that the treatment  vs control groups have different outcomes, it’s relatively straightforward to figure out what that means.

If you see differences, it does not necessarily mean that the differences  you see  are due to the different treatments the groups received.  It is possible that the observed differences are due to inherent population variation or measurement error.   There are well understood ways of assessing this.  You should give your favourite statistician a jellybean and observe the result.   This flow is shown in Figure 1.

Figure 1

This is classic stuff.  It’s what Google, Amazon, Facebook, and the other big platform providers do a lot of – every day.

But it’s not necessarily an appropriate way of analysing outcomes when people have self-selected into different groups.

Self-selecting groups

What happens with self-selecting groups is similar in some ways to what happens in a randomised trial: the groups get different experiences, and the outcome is observed.  But the interpretation of the result isn’t as straightforward.   The complication is that the groups might be different because of characteristics of people who make one choice rather than the other, rather than because of the consequence of the choice itself.

In the example in Figure 2, we can see that the people who chose to “sparkle” are different from those who don’t.   So it’s not obvious whether what’s responsible for the outcome is the attribute, or the effect of the treatment (or, indeed, a combination of the two).

Figure 2

Matching estimators

The core mechanic of the matching estimator method for evaluating the effect of a self-selected treatment condition is to match each individual in the treatment group with a best-matching individual (or composite individual) in the  control group, and then compare these matched pairs using established statistical methods for matched samples.  There are a huge number of variations and refinements on this theme but they all share this basic idea of finding an appropriate match for each individual in the treatment group.

This method can reveal differences that are due to the difference in experience, rather than self-selection, because it tries to select, post-hoc, a control group which very closely matches the characteristics of of the self-selected group.  In the case of the jelly-beans, a matching estimator could be constructed by comparing the yellow beans in each group.

Figure 3

Figure 3

The devil in the detail of using matching estimators lies in choosing which of many possible attributes to consider, and which similarity functions to use, when assessing similarity and finding a match.  Even if you consider a relatively simple entity such as jellybean, it has a lot of attributes which could be a potential basis for a match, as you can see in Figure 3.   And people are much more complicated than jellybeans.

This is where the matching algorithms such as Mahalanobis matching mentioned by Christophe come into play: it is one of the many different similarity metrics  and similarity-revealing analytical methods that can be used as a basis for finding the most useful match.

As to the question of selecting which attributes to match on, there is conflicting guidance about whether the attributes used for matching should be strong covariates of the treatment assignment or of the outcome measure.  But there is agreement that the attributes used should not themselves be affected by the treatment, should not be perfectly predictive of the treatment,   and should have a similar or at least overlapping distribution in both the treatment and the control group.

Horses for courses

Matching estimators are a useful, established  technique which is used extensively in progamme evaluation and econometrics.   There is a big literature around the topic.   If you want to know a bit more, I’d recommend starting out, as I did, by looking at the chapter by Professor Petra Todd in Palgrave’s Dictionary of Economics.

Sometimes what is most interesting about the effect of an extra option is the way the option acts to filter people into groups with different characteristics.   In this case, abstracting away from differences between groups is precisely not what you want.   Matching the investigative technique to what’s interesting is an important form of matching, too.   When doing analytical work it is important to stop and smell the flowers down the garden path.

More information

If you want to know even more, here are some online lecture notes I’ve found useful for exploring this topic:

  • Guido Imbens, Stanford, talking at UCL
  • Michael Roberts, Wharton, from his course on Empirical Methods in Corporate Finance
  • Scott Rozelle, Stanford, from his course at the LICOS Centre for Institutions and Economic Performance
  • Elizabeth Stuart, Johns Hopkins, lecture for the Society for Prevention Research
  • Jeff Wooldridge, Michigan, course for the International Institute of Labour course in Microeconomics

If I need to know more, I think Paul Rosenbaum’s book on the Design of Observational Studies will be my next stop.

Fab Gamecamp input – ideas about location-based games

In my  last post I summarised where I’d got to so far with my search for the active weather fronts of location-based games – and yesterday’s pickings at Gamecamp were so rich that I had to put them into their own post.

By the way – if you are within day trip range of London and interested in a mellow, friendly, interesting games unconference, check out Gamecamp.  From what I can figure, participants are mix of indie inventors, academics, and enthusiasts.   The content is whatever people want to say on the day.  There are about 10 parallel sessions of about a half hour each – so always lots going on! The lunch, from Princi, is in itself is worth the (beguilingly modest) price of admission.  But I think the best thing about it is the positive vibe in the sessions.  People offer their expertise and opinion in a constructive and supportive way.  They are genuinely nice to each other.  They work to make the session work.

And you will find yourself playing games as well as talking about them.   Last year – my first year attending – I went to a session on playground games expecting a lecture and I ended up leaping around a classroom like a 6 year old.  Well, not exactly.  Best effort though.   This year I went for something that required less agility and tried out a card game prototype called Drunken Prophets.  (Reader, I won.  No comment.)

I had been thinking about whether to do a session and almost didn’t as I’d had a pretty busy week and I felt a temptation to chill in low revs.  But one of the organisers nudged me and I’m glad she did, because so many fab people came to my session, and added a lot to my growing as yet unorganised warehouse of Useful Stuff.   Thanks everyone!

Here’s a rundown of ideas and suggestions (n.b. if I got something wrong please sing out, either via the comments form or the contact form on the About tab, on twitter or wherever):

  • we had at least 4 Ingress players in the session – one super-expert (level 8!!), the others dabblers – people  agreed there was an easy enough onramp for starting play, but to really get the most out of it you had to be committed, and battery life was a huge issue with a full charge only giving about an hour of play.  pro tip: our expert player said that people who wanted to play for more than an hour took chargers and chargepacks with them.
  • it was noted with interest that Ingress populates Google’s Field Trip app with geolocated content – a nice (and probably enitrely intentional) side effect for Google
  • we had a geocacher who updated me that geocaching has moved on to mobile – he said it was a great way to make transient, fun social connections via a shared goal
  • we had several Zombie’s Run! users, there was debate about whether it was really a location-based game (answer = probably not), many commented on its simplicity and lack of features but despite that, it had got several people back into running and was very atmospheric (in a scary way) – running alone on the moors or in a forest to the sound track of zombie pursuit was frightening
  • one player played a game at the British Museum, then realised he was ignoring the richness around him, which
  • the use of maps as a platform for games was seen by some as reductive and limiting – because of the cultural preconceptions about what abstractions the maps should support – and because of it inability to capture some important features (think “Edinburgh”) – though clearly also a huge enabler

People also recommended I check out:

  • Arcade Fire’s use of Google Maps and HTML 5 for their customised video soundtrack
  • MotiRoti
  • Haunted Planet
  • Blast Theory’s Fixing point
  • Amblr’s work on delivering “geo-located audio-visual experiences through mobile devices” (which is a quote from their graphic designer)

More than a handful of people were making their own location-based games, of various styles and flavours, one based on Google StreetView, nobody mentioned insurmountable technological hassles, but someone mentioned it would be nice to have a WordPress for location-based games.   Tool-wise, people mentioned ConductRR’s transmedia story telling app as a facilitator for cross-media content.

So, enough to be getting on with for the moment.    Thanks everyone!!

Location-based games: where are we now and where are we going?

Lucky me, I’m speaking about opportunities in location-based games @developconf coming up rather, er,  rapidly in early July.  Since I like my content to be fresh I haven’t worked out exactly what I’m going to say.    But I like figuring stuff out so bring on the fun.

I’m interested in learning as much as I can reasonably expect to about (a) nifty examples (b) pain points for building (c) evolution of enabling techology & services (d) the changing shape of the design space (e) stuff that hasn’t worked (f) commercial models.

Here are some things people tell me I should look at:

  • Ingress
  • Zombies Run, Zombies Run 2
  • Scramboo’s games
  • Geosophic
  • http://geoguessr.com/
  • Spywars – (currently on Kickstarter)
  • Gowalla – acq-hired by FB, and shut down
  • survivalthegame.com
  • your idea here…

Newsflash – more great suggestions May 18th from my fellow un-conferencers at Gamecamp.   Deserves a separate post and gets it here.

I’m onto it!  And Google’s shiny GoogleI/O conference keynotes gave me some extra things to think about too.

If you know stuff I should know about – tell me.  Your name in lights for my DevelopConf credits roll.  And here:

Thanks, in alphabetical order,  AJ Glaser,  Anthony Applebee,  Byron Atkinson,  David Nattriss,  Daniel Doherty,  James Turner, Owen Blacker,  Pablo Valcarel ,  Scramboo,   Tanausú Cerdeña, Todd Chaffee,  YOU – YOUR NAME HERE.

Happy to chat.  Happy to share what I’ve learned.   Drop me a comment or msg me @HAStark on twitter or LI or FB.