These are Machine Learning at Berkeley’s notes for the Fall 2017 session of Berkeley’s CS 189 class, taught by Professor Anant Sahai and Stella Yu. These notes are unaffiliated with the class, but as there (as of now) will not be any webcasts or class notes we thought it might be helpful if these existed.
That being said, we are also just students with very busy lives. We’ll try to keep this as updated as possible and devoid of errors. If you spot an error here please feel free to email us at ml.at.berkeley@gmail.com.
Office Hours are after lecture, 400 Cory right now but is subject to change
Dicussions are on Friday only, all day, first come first serve. Check the website for full list of discussion times and locations
There is no required book for the classes. However, Elements of Statistical Learning is suggested if you’re into that kind of stuff (textbooks that is…)
The grading can be found here
The midterm is Oct. 13, 7 pm
The final is Dec. 14, 3 pm
“This is an ‘Advanced Upper Division Course’” quoth Sahai
That means you should probably have taken (mastered) EE16A, EE16B, CS70, and math 53 for sure
CS170, CS126, and CS127 would be nice to have as well, and your “maturity” (again quoth Sahai) should certainly be at the level of those classes
This is not a programming class, although it is assumed you have knowledge of material taught in the CS61 series
Python will be used in this course
The structure of the course has shifted to be more conceptual and focuses a bit more on advanced neural network topics
Almost all ML problems are solved through these (increasingly detailed) levels of abstraction:
Let’s say we have a set of data points , where is and is . We believe , where is . This is our model.
To turn this into an optimization problem we would like to have something of the form
To do this we let be the following (monstrosity of a) matrix
Note that the ’s are of dimensional and not just scalars.
We let be
(pardon the pun)
So must be of dimension , and is of dimension (pardon the pun again…)
Finally let be
which is also dimensional (note this is also a block matrix)
Now if we minimize we’ll be actually be minimizing the sum of the squared errors for each data point (do the math!).
There are two approaches to minimizing this quantity. One involves vector calculus (which is the topic of discussion), and one involves projections. Let’s look at projections.
In order to minimize the above, we note that must be perpendicular to the column space of . Thus we can write
Then, rearranging we arrive at the normal equations
Share us on social media!