4 Reasons Your A/B Split Test Results Aren’t Statistically Significant—And How to Fix Them
You designed a split test. You launched it. You waited patiently (or maybe not so patiently). You crunched the numbers... and came up empty. No statistical significance. No learning. Just... a shrug from your data.
Sound familiar?
If so, you’re not alone. You may not be “doing it wrong,” but you may not be doing it optimally.
A/B split testing is a core element of most of the consulting I do and have done over the past 20+ years. I often have new clients, well-meaning email marketers, who have tried and failed to get statistically significant results time and again.
Let’s break down the four key reasons your test results might be coming up within the margin of error – and what you can do to get the insights you’re looking for in the future.
1. You’re Testing the Wrong Thing
I’ll just say it: Not everything is test-worthy. If the difference between your A and B versions doesn’t meaningfully impact user behavior, you won’t see a meaningful result—no matter how perfect your math is.
This is where the scientific method comes in handy. Before you build a test, ask yourself: What am I trying to prove or disprove?
That’s your hypothesis. And it should be specific.
For example, “I believe that adding a testimonial to our email will increase conversion rate by at least 10%, because social proof builds trust.”
That’s testable. That’s measurable. And bonus -- it gives you a foundation to evaluate your results. If you’re just swapping the shade of blue on a CTA button without any rationale? You’re probably wasting time (and test cells).
Pro Tip: If your test plan doesn’t pass the “why would this change behavior?” gut check, pause and refine it.
2. You’re Using the Wrong KPI
Here’s where I get a little feisty. Because if I see one more case study claiming a “significant improvement in open rate” as a reason to declare a subject line test the winner, I might just scream into the void.
Open rate is not a reliable KPI.
Neither is click-through rate, in many cases. I’ve written about this here and here on my own blog, and I’ll say it again:
The most reliable KPI for A/B split testing is conversion rate; if it’s a revenue-based conversion you might also look at revenue-per-email-sent (RPE).
Why? Because if a conversion and/or revenue is the goal, then it doesn’t make sense to use a different metric for your KPI. Your KPI should reflect the thing that actually moves your business forward, whether that’s sales, sign-ups, downloads, or donations.
Pro Tip: If you’re testing to optimize for your bottom line, conversion rate and/or RPE is your best bet. Period.
3. You’re Misunderstanding (or Miscalculating) Lift
Let’s say your control group converts at a rate of 2%. Your test group converts at a 3% rate.
What’s your lift?
If you said “1%,” close your spreadsheet and back away slowly.
That’s a 1 percentage point increase, but it’s actually a 50% lift in conversion rate. That’s a big difference.
The correct formula to use for lift/loss is:
(Test – Control) ÷ Control = Lift %
In this case:
(3% – 2%) ÷ 2% = 0.50 = 50% lift
Too often I see marketers downplaying strong performance because they’re looking at the wrong math—or overhyping weak results because they haven’t checked if the observed lift exceeds the margin of error.
Bottom line: Know your formulas. Use a proper testing calculator (I like these from Zettasphere). And be honest with yourself about whether your data is actionable or just interesting.
4. Your Sample Size Is Too Small
Here’s another reason your A/B test might be yielding statistically insignificant results: your sample size is just too small.
I get it. Sometimes the list you’re sending to isn’t massive. But that doesn’t mean you should settle for murky insights.
My rule of thumb? A minimum of 20,000 recipients per cell. That’s 20,000 in your control and 20,000 in your test, for a total of 40,000.
Why so many? Because smaller sample sizes increase your margin of error and make it much harder to detect real differences between versions. You may think you saw a bump, but with only a few hundred or thousand people in each group, it’s hard to say whether that lift was luck or legitimate.
If you’re working with a smaller audience, don’t abandon your test idea. Just take a phased approach. Test the same element across multiple sends to accumulate a large enough total sample to draw meaningful conclusions.
Bottom line: If your sample size is too small to reach significance, the test isn’t invalid—it’s just incomplete. Commit to reaching that critical mass of recipients so you can learn something that actually informs future decisions.
Final Thoughts (and a Friendly Rant)
Split testing is one of the most powerful tools in an email marketer’s toolkit. But like any tool, it’s only effective if you know how—and when—to use it.
If you’re feeling frustrated with your results, it might not be you. It might be the test design, the KPI, the math—or the sample size.
The good news? These are all fixable.
The even better news? Once you start testing smarter, you’ll not only get better results—you’ll build a culture of learning, iteration, and continuous improvement.
And that, friends, is where the real magic happens.
Got questions about test design? Want a second set of eyes on your hypothesis or your math? I’m here for it. Drop me a note or catch up with me at one of Thursday OI-Members-Only Zoom Discussions. I also teach a workshop on A/B Split Testing – register to join me Online (various dates) or In-real-life after the Email Innovations World Event in Phoenix, AZ, June 5, 2025.
Let’s make testing meaningful again.
Until next time,
Jeanne
Photo by Nichika Sakurai on Unsplash