N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_2

2. The Foundation: Markov Decision Processes

Nimish Sanghi¹

(1)

Bangalore, India

As discussed in Chapter 1, reinforcement learning involves sequential decision-making. This chapter formalizes the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. Although most of the problems you’ll study in reinforcement learning are modeled as Markov decision processes (MDP), this chapter starts by introducing Markov chains (MC) followed by Markov reward processes (MRP). Next, the chapter discusses MDP in-depth while ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models by Nimish Sanghi

2. The Foundation: Markov Decision Processes

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly