A/B & Multivariate Testing

A/B Testing Fundamentals

A/B testing lets teams compare two versions objectively. Master the core concepts that keep experiments reliable.

In a standard two‑tail A/B test, what p‑value threshold is most commonly used to claim statistical significance?

0.05

0.5

0.95

0.005

Most experimentation playbooks still use a 5 % error rate; anything lower reduces false wins at the cost of power.

Random assignment of users into control and variant primarily prevents what threat to validity?

Maturation

History effects

Selection bias

Instrumentation bias

When users self‑select, differences could stem from who joined rather than the change; randomisation removes that bias.

Which metric indicates the minimum effect size an experiment can reliably detect?

Lift

R‑squared

MDE (minimum detectable effect)

p‑value

Power analyses output the smallest percent change worth detecting, known as the minimum detectable effect.

If a test ends early because it looks significant, the biggest statistical risk is ______.

missing data

reduced sample variance

an inflated Type I error

confounding

Peeking multiplies the chance of a false positive beyond the nominal threshold.

A/A tests are mainly run to check ______ before launching true experiments.

the experimentation platform’s randomness

segment lift

seasonality

creative performance

If two identical variants differ, something is wrong with assignment or measurement.

In most SaaS apps, which user level is the safest unit of randomisation?

Session ID cookie

Click event

Account or user‑ID

Pageview

Assigning at the stable identity avoids cross‑contamination across sessions or devices.

Power increases when you ______ sample size, all else equal.

halve

decrease

ignore

increase

Bigger samples shrink standard error, making true differences easier to detect.

If the baseline conversion is 2 % and you want 80 % power to detect a 10 % relative lift, you should focus on adjusting ______ in a calculator.

sample size per variant

confidence interval

color palette

traffic source mix

Calculators output required users per arm based on baseline, lift, alpha and power.

Multi‑armed bandit algorithms trade some statistical purity for faster ______.

reward optimisation

p‑value control

UX wireframes

SQL queries

Bandits shift traffic toward winners on the fly, maximising conversions during the test.

The term ‘lift’ in an A/B result most often refers to ______.

standard deviation

absolute traffic

time on page

percentage change between control and variant metrics

Reporting lift as a percent communicates the proportional improvement over baseline.

Starter

Keep learning the testing basics.

Solid

You’re on your way to optimisation mastery.

Expert!

Phenomenal command of experimentation.

What's your reaction?

Related Quizzes

1 of 10

Leave A Reply

Your email address will not be published. Required fields are marked *