Bandit algorithms and you

When people ask, ‘so, how does Myna work?’, we can normally answer the question in a few seconds. But sometimes they go on to ask ‘how does Myna really work. How does it actually optimise my site, and why is it a better solution than traditional A/B testing?’.

This page is for those people. We’ll show a little of how A/B testing works, and then demonstrate how Myna is even better.

A/B testing

A/B testing works by taking two or more variants, showing them an equal number of times, and collating the number of conversions in order to show you the best variant. Once each variant has been shown a set number of times, or for a set period of time, you can see the conversion rate for each variant, and make the decision on which one to use going forward.

This graph plots conversion rate over time for an A/B test, with the shaded areas indicating total conversions. The yellow shaded area shows total conversions for the baseline — that is, the conversions we'd achieve without running the test. The green shaded area indicates the extra conversions achieved by running the test. Note how the experiment's conversion rate jumps after we make a decision and show the better variant exclusively. Finally, the lightly shaded area indicates conversions we miss out on by switching immediately to the better variant.

This has two major drawbacks:

  • you have to decide in advance how many times to show the pages in order to gain a statistically significant majority

  • if one variant is a dramatic improvement on the other, every time you show the other page, you’re showing a sub-standard page to your users, and losing out on what could be a large number of conversions

Myna’s improvements

What Myna does is continually reweight the variants according to their success; so if variant 1 starts to be clicked more frequently than variant 2, the algorithm will show variant 1 more frequently.

This graph plots conversion rate over time for an bandit test, with the shaded areas indicating total conversions. The yellow shaded area show total conversions for the baseline -- that is, the conversions we'd achieve without running the test. The green shaded area indicates the extra conversions achieved by running the test. Note how the experiment's conversion rate rapidly rises as soon as we start running the test. Finally, the lightly shaded area indicates conversions we miss out on by not switching immediately to the better variant.

‘Well, hang on there,’ I hear you saying, ‘doesn’t that bias the results? Of course variant 1 is going to be clicked more often if it’s shown more often.’

You would think so, but Myna is a smart little bird. Er… algorithm.

Myna doesn’t work with the absolute number of conversions, but rather the conversion rate. Myna is constantly balancing its estimate of conversion rate against its certainty in that estimate. If Myna has shown a variant once and had a single conversion, Myna doesn’t immediately believe the conversion rate is 100%. On the other hand, after one thousand views and conversions, there is strong evidence for this. Myna will suggest variants in proportion to its estimate of conversion rate using all the historical information available.

So in the example above, if Myna chooses to show variant 1 more often, but variant 1 does not perform well, variant 2 will get more of a look in, rebalancing the results.

After a while Myna will be confident enough in its estimates that you can choose a single variant as the winner, if you desire. On the other hand you can keep adding and removing variants indefinitely, and let Myna optimise continuously.

In contrast, the parameters of an A/B test are set up in advance; the variants, the reward, and the number of views for which it will run. But how do you know if you’ll get a usable result after the test has expired? How do you know if it will take 5000 views to establish a statistically significant and confidence-inspiring result, or 50000? In a sense you need to know the results of your test before you’ve even started it.

Multi-armed bandit algorithms

The power behind all this is what is known as a multi-armed bandit algorithm.

The multi-armed bandit problem provides a mathematical model of A/B testing. Imagine a slot machine (a one-armed bandit) with many arms. The player wants to discover the best arm to pull to get the most money. This is in essence the multi-armed bandit problem. If we replace arms with variants and money with conversions, we have the A/B testing problem.

Bandit algorithms have been studied since the 1950s, but only in the last few years have efficient and general purpose algorithms been created. Remarkably, optimal (up to constant factors) algorithms are now known for the basic bandit problem. Myna employs one of these algorithms, along with our own improvements specific to the A/B testing domain for which Myna is built.