Machine Learning Notes: Logistic Regression
This is a collection of notes I’m taking as I progress through the Introduction to Machine Learning
course provided
by Duke University at Coursera. This is a working document, it might eventually be split down
into multiple documents at some point, but this is largely to aid in my memorisation of key knowledge from this course
(and a chance to try out the cool Math and Diagram rendering in Hugo using Mermaid and Latex).
Machine Learning Outcomes
Given a training set data @x@, and a outcome set @y@, we want a model that can predict @y@ given @x@, or @P(y|x)@.
Logistic Regression
Given we have @x_{i1} … x_{iM}@ features (data), and the set @b1 … b_M@ is the parameter set, and a bias @b_0@.
The linear model is defined as such: @@(b_1 \times x_{i1}) + (b_2 \times x_{i2}) + … + (b_M \times x_{iM}) + b_0 @@ After applying this to every feature, the set @z@ is created, where @z_1 … z_M@ are the outputs of applying the logistic regression
We now want to take this and calculate @P(y_i = 1|x_i)@ using @z@ and the sigmoid function, @P(y_i = 1|x_i) =\sigma(z_i)@
The sigmoid function’s output always lives between @0 .. 1@, which we can use to determine the confidence.
Diagram of Logistic Regression
graph BT sigma_z(("$$\sigma(z_i)$$")) zi(("$$z_i$$")) zi --> sigma_z xi1(("$$x_{i1}$$")) -->|"$$b_1$$"| zi xi2(("$$x_{i2}$$")) --> zi xi3(("$$ ... $$")) --> zi xi4(("$$x_{iM}$$")) -->|"$$b_M$$"| zi
Applying this to a Real Dataset
A common example of applying this would be to Optical Character Recognition (OCR), where the feature set would be the pixels of a given image representing a character, a popular dataset for this is the MNIST dataset.
This allows for taking each image - where a pixel is @x_M@ , and applying the learned parameters - @b_M@, and using logistic regression to determine the confidence that a given image is of a particular character.
Inner Product Notation
The forms of notation of the inner product (@\odot@) of the vector @x_i@ and the vector @b@:
Full Linear Regression (without final sigmoid function application): @@ z_i = (b_1 \times x_{i1}) + (b_2 \times x_{i2}) + … + (b_M \times x_{iM}) + b_0 @@
Inner Product: @\displaystyle\sum_{m=1}^M x_{im} \times b_m@
Compact Notation: @x_i \odot b@
Full plus bias: @b_0 + x_i \odot b@
Why move on from the Logistic Regression?
Logistic Regression can only classify binary data (data that is either class @1@ or @0@).