3 Balancing immediate and long-term goals

In this chapter

You will learn about the challenges of learning from sequential feedback and how to properly balance immediate and long-term goals.
You will develop algorithms that can find the best policies of behavior in sequential decision-making problems modeled with MDPs.
You will find the optimal policies for all environments for which you built MDPs in the previous chapter.

In preparing for battle I have always found that plans are useless, but planning is indispensable.

— Dwight D. Eisenhower United States Army five-star general and 34th President of the United States

In the last chapter, you built an MDP for the BW, BSW, and FL environments. MDPs are the motors moving RL environments. They ...

Get Grokking Deep Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Grokking Deep Reinforcement Learning by Miguel Morales