A/B & Multivariate Testing

Multi‑Armed Bandit Algorithms

From streaming thumbnails to pricing widgets, adaptive bandits chase rewards in real time. Let’s test how well you know the algorithms behind the buzz.

A multi‑armed bandit balances exploration with ______ to maximise cumulative reward.

denormalisation

exploitation

pagination

compression

Exploration gathers knowledge; exploitation leverages that knowledge to serve the best‑known option.

The regret of a bandit algorithm measures the gap between the reward it actually earned and the reward from the ______ arm.

inactive

deleted

optimal

random

Lower regret means the algorithm quickly converged on showing the best‑performing choice.

Thompson Sampling draws a random value from each arm’s ______ distribution to select the next winner.

sprite sheet

progressive JPEG

posterior

DNS cache

Bayesian posterior beliefs encode the algorithm’s uncertainty about true conversion rates.

Upper Confidence Bound (UCB) methods add an uncertainty bonus to each arm’s mean to favour those with ______ data.

compressed

fewer DOM nodes

less

TLS

By favouring arms with sparse observations, UCB maintains exploration that decays over time.

Business Insider’s 2025 report says HBO Max uses a bandit to optimise ______ images shown to viewers.

thumbnail

alt‑text

DNS A‑records

subtitle

Different poster art is served and the bandit learns which drives more clicks into each show.

Contextual bandits extend the classic model by incorporating ______ features when choosing an arm.

checksum

ISO date

voltage

user or session

By conditioning on real‑time context, the algorithm can serve different winners to different segments.

In a rapidly changing market, algorithms with sliding‑window Thompson Sampling handle ______ environments better.

lossless

deterministic

monolithic

non‑stationary

They discount stale data so new trends dominate the decision process.

Compared with a 50‑50 A/B split, bandits usually expose fewer users to ______ variants.

losing

XML

SSL

PNG

Because allocation shifts toward the leading arm, under‑performing options see less traffic over time.

The explore‑then‑commit strategy runs a short exploration phase and then ______.

locks in the current best arm

restarts the test hourly

drops all arms

switches to random

Once confidence is sufficient, traffic is permanently assigned to the top performer.

A key downside of bandits versus classic hypothesis tests is the difficulty of computing simple ______ values.

RGB

p‑

SHA‑256

TTL

Adaptive allocation breaks many assumptions behind fixed‑sample statistical significance.

Starter

You’re new to multi‑armed bandit algorithms. Revisit the fundamentals and try running a few simple tests to build confidence.

Solid

Solid grasp of multi‑armed bandit algorithms concepts—refine the details with more hands‑on practice.

Expert!

Expert level! You can design, run, and interpret advanced multi‑armed bandit algorithms experiments like a pro.

What's your reaction?

Related Quizzes

1 of 10

Leave A Reply

Your email address will not be published. Required fields are marked *