Journal of Applied Mathematics and Stochastic Analysis
Volume 12 (1999), Issue 2, Pages 151-160
doi:10.1155/S1048953399000155
Exact solution of the Bellman equation for a β-discounted reward in a two-armed bandit with switching arms
Higher Institute of Food and Flavor Industries, 26, Maritza str., Plovdiv 4002, Bulgaria
Received 1 May 1998; Revised 1 October 1998
Copyright © 1999 Doncho S. Donchev. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
We consider the symmetric Poissonian two-armed bandit problem. For
the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a
myopic policy is optimal.