The Multi-Armed Bandit Problem

I wrote an article in the May 2016 issue of Microsoft MSDN Magazine, titled ”The Multi-Armed Bandit Problem”. See

Imagine you’re in Las Vegas, standing in front of three slot machines. You have 20 tokens to use, where you drop a token into any of the three machines, pull the handle and are paid a random amount. The machines pay out differently, but you initially have no knowledge of what kind of payout schedules the machines follow. What strategies can you use to try and maximize your gain?

This is an example of what’s called the multi-armed bandit problem, so named because a slot machine is informally called a one-armed bandit. The problem is not as whimsical as it might first seem. There are many important real-life problems, such as drug clinical trials, that are similar to the slot machine example.


In my article I present a short but complete demo program written in C#. There are several algorithms that can be used on the multi-armed bandit problem. My demo uses the simplest reasonable algorithm, which is called explore-exploit.

The multi-armed bandit problem — an interesting combination of math, economics, and computer science.

This entry was posted in Machine Learning. Bookmark the permalink.