MDPs are frequently encountered when they are used as a framework for setting up reinforcement learning problems. In this post, we will define what they are, highlight their universality, and present two methods to solve them.
ReadHow can an agent learn a policy when it doesn't have access to the underlying reward structure of it's environment? We cover one method to solve this problem called Behavioral Cloning, while providing the required theory and an implemented example of it in practice.
ReadDoes information theory have a role to play in reinforcement learning? Find out how adding entropy to the reinforcement learning formulation can help robots deal with unexpected obstacles and more.
Read