07/02/2023 • Jon Carrick
Optimize testing is the cornerstone of CRO, but how exactly does it determine the probability of a winner in an A/B test? It was not until I was running my first test and the results started coming in that I began to appreciate Optimize’s sophisticated approach to data modelling and what exactly the results are telling us.
Bayesian inference, modelled conversion rate, 95% performance range - just some of the jargon that crops up when you look into the report details of your A/B tests. But what exactly do they mean and how are they used?
When testing the viability of an original webpage against its variant(s), it is unlikely to be sufficient to determine which one is better for your business by simply counting conversions on each and choosing the one with the most. A winner on one day may be a loser on the next. Thankfully, there is a statistical method that will choose the most likely winner for us! Note that we define it as only the most likely since this approach, Bayesian inference, takes into account the inherent randomness of such a test, calculating the winning probability.
In the simplest terms, Bayesian inference calculates the probability of a hypothesis (i.e. the original or variant being the winner) based on the data available (i.e. conversion rates). It is part of a wider field of statistics, first developed way back in the 18th century by Thomas Bayes in which the degree of belief in a specific occurrence can be quantified.
Running a test across more days will allow Optimize to converge to a more probable solution; the more data, the easier it is for Optimize to model a conversion rate. Any large variations in the number of conversions will eventually smooth out into a probability distribution as more data becomes available. As a rule of thumb, two weeks should be an absolute minimum for the testing duration (unless you are accreting traffic data in volumes similar to that of Amazon…).
Ultimately, test length should be dictated by whenever the output produces an unambiguous result; would you make a bet on the given probabilities? 60% vs 40%... maybe not. 80% vs 20%... much more favourable. This, however, requires some individual judgement and weighing up the potential impacts of your final choice.
Modelling conversions and TIE Fighter diagrams
The Optimize test output is a numerical description of the probability distributions for both the original webpage and its variant. They are described by the following:
The median is defined as the threshold at which the probabilities of the conversion rate being above or below are 50% each. The 50th and 95th percentiles (or performance ranges) are the ranges that include the conversion rate with 50% and 95% certainty respectively. Using this information, Optimize calculates the probability of both the original and the variant being the best. Such a distribution is presented in the aptly named ‘TIE Fighter’ diagram, with an example shown below in the ‘Modeled Conversion Rate’ column.
The TIE Fighter diagram is the most recent snapshot in time of the modelled conversion rate, which has been developed over all previous days. An example of a modelled conversion rate across time is shown below. Notice how the probability distributions narrow over more and more days. This indicates that the uncertainties in original and variant conversion rates are decreasing, as the models converge towards more accurate conversion rates.
While Optimize conveniently spits out its results for us to make our own conclusions, it is useful to have an understanding of where these results are coming from. Unfortunately, we do not have the means to look under the bonnet at the exact calculations performed to produce these probability distributions, however, the statistical principles behind our results are neatly demonstrated over the duration of our A/B tests in Optimize.
A monthly round up of our expert insights, tips and careers - straight to your inbox.
43 Clicks North will use this information to be in touch and to provide updates and marketing. Please see our privacy policy.