Reinforcement learning has become a powerful learning framework now capable of learning complex policies in high dimensional environments. At every time step t, the agent observes the current state of the environment, sₜ , and uses the policy it is learning π(s), to inform a decision about which action, aₜ…