State-action value function (Q function)

A state-action value function is also called the Q function. It specifies how good it is for an agent to perform a particular action in a state with a policy π. The Q function is denoted by Q(s). It denotes the value of taking an action in a state following a policy π.

We can define Q function as follows:

This specifies the expected return starting from state s with the action a according to policy π. We can substitute the value of Rt in the Q function from (2) as follows:

The difference between the ...

Get Hands-On Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.