© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2024
N. SanghiDeep Reinforcement Learning with Pythonhttps://doi.org/10.1007/979-8-8688-0273-7_3

3. Model-Based Approaches

Nimish Sanghi1  
(1)
Bangalore, India
 

Chapter 2 talked about the parts of the setup that form the agent and the part that forms the environment. To recap, the agent gets the state St = s and follows a policy π(s| a) that maps states to actions. The agent uses this policy to take an action At = a when in state St = s. The system transitions to the next time instant of t + 1. The environment responds to the action (At = a) by putting the agent in a new state of St + 1 = s and providing feedback to the agent in terms of a reward, Rt + 1. The agent ...

Get Deep Reinforcement Learning with Python: RLHF for Chatbots and Large Language Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.