Chapter 8. Improving How an Agent Learns

Complex industrial problems can often be decomposed into directed acyclic graphs (DAGs). This helps development productivity, by splitting one long project into many smaller projects that are easier to solve. It helps reinforcement learning (RL) because smaller components are often easier and quicker to train and can be more robust. “Hierarchical Reinforcement Learning” shows one common formalism, which is to derive a hierarchy of policies, where low-level policies are responsible for fine-grained “skills” and high-level policies do long-term planning.

So far in this book I have only considered single-agent problems. Some problems need teams of agents, or at least there may be multiple agents operating within the same environment. “Multi-Agent Reinforcement Learning” shows you how agents can cooperate or compete to solve multi-agent problems with global or local rewards.

Another rapidly evolving area of RL is redefining how you should think about rewards. Traditionally, it is the sole responsibility of the agent to use the reward signal to learn a policy. But you can augment this process by providing extra, potentially external information in the form of expert guidance. “Expert Guidance” discusses how to incorporate expertise into policies that ultimately help to speed up learning by improving exploration. It’s even possible to learn the optimal reward, which aims to negate the need for reward engineering.

But first, RL is inextricably ...

Get Reinforcement Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.