Multi-Armed Bandit
A class of algorithms (Thompson, UCB, ε-greedy) that balance exploring uncertain options against exploiting known winners.
Last updated: 2026-05-04
Definition
The metaphor: a row of slot machines (bandits) with unknown payouts. You want to maximize total reward. Spend too long exploring, you waste time on losers. Switch to exploitation too early, you might miss a winner. The bandit algorithm decides when to explore vs exploit based on confidence in each arm's posterior.
How it applies in India
No India-specific behavior.
Related terms
Try WatEase free
Run your WhatsApp commerce on the platform built for India — Cloud API, GST invoices, UPI checkout, opt-in tracking, and a Free Forever plan.
Start Free Today