Imagine you have two variables in your mobile app that you think may be important for retaining new customers: emails vs. push notifications, and red vs. blue signup screen background color. Let’s say that currently, you use red backgrounds and emails, but you think you might be able to do better, so first you A/B test the messaging method, and you find out that push notifications actually retain users better by 5%. So you switch everyone to those.
Next, you A/B test for color, and you find out that red is still the best, so you make no change. Now you have the best combination of color and messaging method, right? Well no, not necessarily. Because there are four possible combinations, and you only ever tested three of them:
Red + push: first test
Red + email: first (and maybe second) test
Blue + push: second test
Blue + email: ?
It might be that blue backgrounds and emails actually give you the best retention of all, but you wouldn’t know, because you didn’t check (as a side note, if you were not being careful, notice you may have also wastefully tested one combination twice, which you should avoid as well).
If color and messaging type actually have nothing to do with one another and would not influence one another’s effectiveness, then this is not a problem, and that last combination will not be the best. If you strongly believe this is true for two variables, don’t worry about those remaining combinations.
But if you suspect that two variables might possibly have anything to do with one another, that they might alter each others’ effects on customers in any way, then this is a problem, and you can solve it by simply running one of each of the possible combinations once, from the beginning. Now you can be confident you have THE best combination of the two, whether or not the variables interact with one another.
Why not do this for every variable you think might matter at all, not just two? The number of groups exponentially increases, making this a limited strategy. If you had five variables with A/B options instead of two, for example, you would need to test 2^5 = 32 whole groups of test subjects in order to guarantee you find the best combination despite interactions. If you have 7 variables with A/B/C testing, you need 3^7 = 2,187 groups of test subjects. This obviously becomes impractical quickly. Even if you can run A/B tests with a few lines of code, you will still run out of customers to add to experiment groups before you find your answers.
If you limit this strategy to variables that you have some actual suspicion of being related to one another, however, it can pay meaningful dividends for not too much investment in extra testing. Maybe out of five A/B variables, the first three might logically be related, but the last two are probably independent of the others. If so, you can test a limited subset of combinations: 2^3 + 2 + 2 = 10 groups, instead of 32, and you will still catch any of the plausible best combinations.
Next time you are considering A/B testing your product, think ahead, ask yourself which testing variables might logically influence one another, and consider running all of those variables’ combinations up front. You may find a golden mixture that could otherwise have slipped past!