We consider multiarmed bandit problems where the expected reward isunimodal over a partially ordered set of arms. In particular, thearms may belong to a continuous interval or correspond to verticesin a graph. We present efficient algorithms to minimize the regretin these bandit problems and also to detect abrupt changes in thereward distributions. The unimodality assumption has an importantadvantage: we can determine if a given arm is optimal by samplingthe possible directions around it. This property allows us toquickly find the optimal arm in a graph and detect changes. Notably,our method incurs only a regret that depends logarithmically on thediameter of the graph.
- SMILE in Paris