A central learning problem in dynamic environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the notion value of information (VoI), i.e., the expected improvement in future decision quality that might arise from the information acquired by exploration.
In this chapter we study games with (numerical) noisy payoffs. The payoff-learning is sometimes referred to as Q-learning [188, 189]. Here we focus on specific classes of stochastic games with incomplete information in which the state transitions are action-independent. We develop fully distributed iterative schemes to learn expected ...
Get Distributed Strategic Learning for Wireless Engineers now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.