A/B testing lets teams compare two versions objectively. Master the core concepts that keep experiments reliable.
In a standard two‑tail A/B test, what p‑value threshold is most commonly used to claim statistical significance?
0.05
0.5
0.95
0.005
Random assignment of users into control and variant primarily prevents what threat to validity?
Maturation
History effects
Selection bias
Instrumentation bias
Which metric indicates the minimum effect size an experiment can reliably detect?
Lift
R‑squared
MDE (minimum detectable effect)
p‑value
If a test ends early because it looks significant, the biggest statistical risk is ______.
missing data
reduced sample variance
an inflated Type I error
confounding
A/A tests are mainly run to check ______ before launching true experiments.
the experimentation platform’s randomness
segment lift
seasonality
creative performance
In most SaaS apps, which user level is the safest unit of randomisation?
Session ID cookie
Click event
Account or user‑ID
Pageview
Power increases when you ______ sample size, all else equal.
halve
decrease
ignore
increase
If the baseline conversion is 2 % and you want 80 % power to detect a 10 % relative lift, you should focus on adjusting ______ in a calculator.
sample size per variant
confidence interval
color palette
traffic source mix
Multi‑armed bandit algorithms trade some statistical purity for faster ______.
reward optimisation
p‑value control
UX wireframes
SQL queries
The term ‘lift’ in an A/B result most often refers to ______.
standard deviation
absolute traffic
time on page
percentage change between control and variant metrics
Starter
Keep learning the testing basics.
Solid
You’re on your way to optimisation mastery.
Expert!
Phenomenal command of experimentation.