© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2024
N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_2

2. The Foundation: Markov Decision Processes

Nimish Sanghi1  
(1)
Bangalore, India
 

As discussed in Chapter 1, reinforcement learning involves sequential decision-making. This chapter formalizes the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. Although most of the problems you’ll study in reinforcement learning are modeled as Markov decision processes (MDP), this chapter starts by introducing Markov chains (MC) followed by Markov reward processes (MRP). Next, the chapter discusses MDP in-depth while ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.