Joshua's slides on TRPO:
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_13_advanced_pg.pdf

Original NPG paper from Sham Kakade:
https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf

Deep RL survey:
https://arxiv.org/pdf/1811.12560.pdf?fbclid=IwAR3sArXMkmy7yL_baZJ2QLa1Ud8LaC8x6s-0VvXb8WEkdjzBOnck-DI-yBA

Lilian's blog on policy gradient:
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html

Lilian's blog on ES (zeroth-order optimization):
https://lilianweng.github.io/lil-log/2019/09/05/evolution-strategies.html