Though Monte-Carlo methods and Temporal Difference learning have similarities, there are inherent advantages of TD-learning over Monte Carlo methods.
Monte Carlo methods |
Temporal Difference learning |
MC must wait until the end of the episode before the return is known. |
TD can learn online after every step and does not need to wait until the end of episode. |
MC has high variance and low bias. |
TD has low variance and some decent bias. |
MC does not exploit the Markov property. |
TD exploits the Markov property. |