Episode 5, demystifying exploration-exploitation dilemma, greedy, ε-greedy, and UCB algorithms in the multi-armed bandit setting.
Episode 4, demystifying dynamic programming, policy evaluation, policy iteration, and value iteration with code examples.
Episode 3, demystifying Bellman Expectation Equation, Bellman Optimality Equation, Optimal Policy, and Optimal Value Function.
Exploring the terrors of these terrifying symbols used to differentiate and integrate.
Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.
Episode 1, demystifying agent/environment interaction, and the components of a reinforcement learning agent.