As discussed in Chapter 1, reinforcement learning involves sequential decision-making. This chapter formalizes the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. Although most of the problems you’ll study in reinforcement learning are modeled as Markov decision processes (MDP), this chapter starts by introducing Markov chains (MC) followed by Markov reward processes (MRP). Next, the chapter discusses MDP in-depth while ...
2. The Foundation: Markov Decision Processes
Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.