Episode 6, demystifying model-free prediction, MC methods, TD Learning, and various properties of both algorithms in RL problems.
Episode 5, demystifying exploration-exploitation dilemma, greedy, ε-greedy, and UCB algorithms in the multi-armed bandit setting.
Episode 4, demystifying dynamic programming, policy evaluation, policy iteration, and value iteration with code examples.
Episode 3, demystifying Bellman Expectation Equation, Bellman Optimality Equation, Optimal Policy, and Optimal Value Function.
Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.
Episode 1, demystifying agent/environment interaction, and the components of a reinforcement learning agent.