ECE586RL: Markov Decision Processes and Reinforcement Learning

Course Information

Instructor: Bin Hu (binhu7@illinois.edu)

Office Hours: M/Th 3-4pm, 145 CSL

Lectures: M/W 11:00-12:20pm, Room 3081 ECEB

For a complete syllabus, see here.

Course Description

The course will discuss techniques to solve dynamic optimization problems where the system dynamics are unknown. The course will first introduce dynamic programming techniques for Markov decision process (MDP) problems and then focus on solving the dynamic programming equations approximately when the underlying parameters of the Markov chain are unknown. While the emphasis will be on techniques for which one can prove performance bounds, heuristics used in reinforcement learning will also be presented to show their relationship to existing theory, and to identify open theoretical problems.

Outline

Markov Chains

Markov Decision Processes

Dynamic Programming

Value and Policy Iteration

Temporal Difference Learning, Q-Learning, SARSA

Linear Approximation: ODE Method, Finite Sample Bounds, etc

Neural Networks and DQN

Policy Gradient for Control: Reinforce, Natural Policy Gradient, DDPG, TRPO, PPO, Actor-Critic, GAE

RL for Linear Quadratic Regulator

Robust Adversarial RL and Robust Control

Required Materials

There is no required textbook for the class. All course material will be presented in class and/or provided online as notes. Links for relevant papers will be listed in the course website. One useful reference is the book “Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming” by D. Bertsekas.

Prerequisites

ECE 534; ECE 555 is recommended, but not required.

Grading

10% class participation; 30% homework (2 sets); 60% final project