In the previous chapter, we formalized the agent-environment interaction as a Markov decision process (MDP).
We began with the Markov property, which states that the future depends on the past only through the present state. This is the assumption that makes the entire framework tractable: once you know the current state, you can discard the history.
We then...