Site icon UnderConstructionPage

A/B Testing Pitfalls: Power, MDE, and Peeking

Person testing mobile design

You’re testing two versions of your website. Version A is the original. Version B has something new. Maybe a different button color. Or a new headline. You want to see which one works better. Sounds simple, right?

Welcome to the world of A/B testing. It’s powerful. But it can be tricky. If you’re not careful, you can end up with completely wrong results. Today, we’re going to talk about three common A/B testing pitfalls:

Don’t worry—we’ll keep it fun and simple. Let’s start with the first pitfall.

Power: Not Just a Gym Word

In A/B testing, power means your test’s ability to find a real difference if there is one. It’s usually set to 80% or 90%. This means that if B truly is better than A, you have an 80–90% chance of detecting that.

But here’s the problem. If your test has low power, it might say there’s no difference—even if B is truly a rockstar. It’s like trying to find a needle in the haystack… with a blindfold on.

Power is tied to:

Low power is like whispering in a concert. Nobody’s going to hear your signal. To fix that, you might need to test for longer or accept only big changes.

MDE: The Change Worth Noticing

MDE stands for Minimum Detectable Effect. It’s the smallest change you’re trying to detect in your test. Think of it as: “What size of improvement would justify this change?”

Let’s say your current conversion rate is 10%. You set an MDE of 2%. This means your test is designed to detect a change to at least 12% or 8% (up or down).

If the real effect is smaller, your test might miss it completely. Your test is just not sensitive enough. It’s like trying to weigh a feather using a rusty scale from 1920.

People go wrong with MDE when they expect to detect tiny changes without increasing their sample size. You can’t have it both ways.

Here’s the trade-off:

What’s the fix? Before starting your test, decide what level of change matters for your business. If a 0.5% improvement isn’t worth the effort to redesign a page, don’t try to detect that small of an effect.

Peeking: The Silent Test Killer

You set up your test. It’s been running two days. You’re curious. You check the results.

Oh wow! Version B is winning by a lot! Should you stop the test now?

No. Nope. Never.

That’s called peeking, and it’s one of the easiest ways to ruin your test. Every time you peek at your data, you increase the chance of a false positive. That means thinking there’s a difference when there isn’t one.

Imagine flipping a coin 10 times. Sometimes, you might get 6 heads and 4 tails. Is that coin broken? Probably not. You just haven’t flipped it enough.

When you peek at your A/B test too early, you’re making decisions based on not enough data. Your test might yell “Significant!” when it’s just random noise.

Here’s what you should do:

Let’s Put It All Together

Here’s what a healthy A/B test setup looks like:

  1. Decide on an MDE. What’s the smallest change that matters?
  2. Do a power calculation. Figure out how many users you need.
  3. Set a fixed duration for your test. Don’t peek early!

Let’s break it down with a story.

Emma wants to test a new sign-up button. Her current conversion rate is 8%. She wants to know if a new purple button will get at least a 1.5% improvement. That’s her MDE.

She plugs numbers into an online sample size calculator. With an 80% power and a significance level of 5%, she learns that she needs 20,000 visitors.

Emma sets her test to run for two weeks. She does not peek. It’s hard. She’s tempted every single day.

After two weeks, she checks the results. The new purple button improved conversions by 1.7%. It’s statistically significant. Time to paint the internet purple.

Quick Tip Checklist

Keep this list handy the next time you’re setting up an A/B test:

Following these rules helps you make decisions based on truth—not chance. And when your decisions are based on truth, your results actually mean something.

Final Thoughts

A/B testing is amazing. But if you ignore power, MDE, or peek too early—you’re gambling, not testing. The good news? These mistakes are easy to avoid once you know what to watch for.

Start every test with a clear plan. Respect your data. And let patience be your secret weapon.

Because the button color might matter—only if your test was done right.

Exit mobile version