Skip to main contentSkip to navigation

Multi-Armed Bandit

A class of algorithms (Thompson, UCB, ε-greedy) that balance exploring uncertain options against exploiting known winners.

Last updated: 2026-05-04

Definition

The metaphor: a row of slot machines (bandits) with unknown payouts. You want to maximize total reward. Spend too long exploring, you waste time on losers. Switch to exploitation too early, you might miss a winner. The bandit algorithm decides when to explore vs exploit based on confidence in each arm's posterior.

How it applies in India

No India-specific behavior.

Related terms

Try WatEase free

Run your WhatsApp commerce on the platform built for India — Cloud API, GST invoices, UPI checkout, opt-in tracking, and a Free Forever plan.

Start Free Today