Episode 6, demystifying model-free prediction, MC methods, TD Learning, and various properties of both algorithms in RL problems.
Episode 5, demystifying exploration-exploitation dilemma, greedy, ε-greedy, and UCB algorithms in the multi-armed bandit setting.
Episode 4, demystifying dynamic programming, policy evaluation, policy iteration, and value iteration with code examples.