That was the exact thing I said to myself when I finished the lecture. Btw, in case you didn't know, professor Jordan recommended this list of book to Machine Learning scientists:
http://www.statsblogs.com/2014/12/3...-suggested-by-michael-i-jordan-from-berkeley/
I have started to read Casella's book on statistics and Golub's book on linear algebra, but they are far from trivial, and the others are even more difficult (actually read the first 9 chapters or so of Cover and Cover book in Information Theory a couple of years ago). My bet is that professor Jordan has read too many mathematics during his life, that not he finds trivial stuff that other researchers in the field haven't ever heard about.
On a side note, he mentioned quite a lot the accelerated gradient descent developed from Nesterov a lifetime ago. It is a gem that the community in US and Europe had overlooked, until 2012 or so, when Ilya Sutskever (back then a PhD student of Geoffrey Hinton, now leading researcher at Open AI) refound it, and used it instead of the common momentum on neural networks. While at the moment other methods seems to be preferred (Adam which is essentially just RMSProp with momentum), Nesterov's momentum was state of the art on optimization for a couple of years or so. I think that a student of Ng developed an algorithm called NAdam which replaces momentum with Nesterov momentum in Adam, but I haven't ever used it. Funnily enough, that was done just as project for Ng's course in Stanford CS229.
Btw, applying for a summer school this year when Jordan is one of the lecturers (the other famous ones are Ghamarani, Leskoves, Schkolkopf and Salakhutdinov).