Lecture 1: Markov Chain, Part I

Note: Lecture Note 1 from Prof. Dimitrios Katselis

Lecture 2: Markov Chain, Part II

Note: Lecture Note 2 from Prof. Dimitrios Katselis

Lecture 3: MDPs, Part I

Note: Sections 9.1-9.2 in Lecture Note 9 from Prof. Dimitrios Katselis

Lecture 4: MDPs, Part II

Note: Sections 9.2-9.3 in Lecture Note 9 from Prof. Dimitrios Katselis

Lecture 5: MDPs, Part III

Note: Section 9.3 in Lecture Note 9 from Prof. Dimitrios Katselis

Lecture 6: VI, PI, and Neural Network

Note: Section 9.4 in Lecture Note 9 from Prof. Dimitrios Katselis, Section 7.1 in Lecture Note 7 from Prof. Dimitrios Katselis

Lecture 7: A High-Level Introduction of RL, Q-Learning, Policy Optimization, and Zeroth-Order Optimization

Ref: Lilian Weng's Blog on RL: A (Long) Peek into Reinforcement Learning

Lecture 8: TD Learning for Policy Evaluation

Note: Section 10.6 in Lecture Note 10 from Prof. Dimitrios Katselis, Section 3.1 of Prof. Srikant's Paper on TD Learning

Lecture 9: Q Factors

Lecture 10: Q-Learning, SARSA, Approximate PI

Notes: Algorithms 1-8 in the survey paper by Busoniu et.al., Sections 1.6-1.7 in the LQR note, Algorithm 1 in Prof. Matni's Class Note on API