4 Balancing the gathering and use of information

In this chapter

You will learn about the challenges of learning from evaluative feedback and how to properly balance the gathering and utilization of information.
You will develop exploration strategies that accumulate low levels of regret in problems with unknown transition function and reward signals.
You will write code with trial-and-error learning agents that learn to optimize their behavior through their own experiences in many-options, one-choice environments known as multi-armed bandits.

Uncertainty and expectation are the joys of life. Security is an insipid thing.

— William Congreve English playwright and poet of the Restoration period and political figure in the British Whig Party ...

Get Grokking Deep Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Grokking Deep Reinforcement Learning by Miguel Morales