ECE586RL: Markov Decision Processes and Reinforcement Learning

Course Information

  • Office Hours: M/Th 3-4pm, 145 CSL

  • Lectures: M/W 11:00-12:20pm, Room 3081 ECEB

  • For a complete syllabus, see here.

Course Description

The course will discuss techniques to solve dynamic optimization problems where the system dynamics are unknown. The course will first introduce dynamic programming techniques for Markov decision process (MDP) problems and then focus on solving the dynamic programming equations approximately when the underlying parameters of the Markov chain are unknown. While the emphasis will be on techniques for which one can prove performance bounds, heuristics used in reinforcement learning will also be presented to show their relationship to existing theory, and to identify open theoretical problems.


  • Markov Chains

  • Markov Decision Processes

  • Dynamic Programming

  • Value and Policy Iteration

  • Temporal Difference Learning, Q-Learning, SARSA

  • Linear Approximation: ODE Method, Finite Sample Bounds, etc

  • Neural Networks and DQN

  • Policy Gradient for Control: Reinforce, Natural Policy Gradient, DDPG, TRPO, PPO, Actor-Critic, GAE

  • RL for Linear Quadratic Regulator

  • Robust Adversarial RL and Robust Control

Required Materials

There is no required textbook for the class. All course material will be presented in class and/or provided online as notes. Links for relevant papers will be listed in the course website. One useful reference is the book “Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming” by D. Bertsekas.


ECE 534; ECE 555 is recommended, but not required.


10% class participation; 30% homework (2 sets); 60% final project