A method of linear regression that aims to reduce the sum of the square of the errors.
The sum of the square of errors for data-points is notated as:
Bad notation
is not . This is just an example of shitty notation, because we should all know that:
Similiarly:
Using Linear Algebra
Recall that
can be written as:
where:
Then, is reduced whenever is close to . Since we can’t actually change the values of , we will have to adjust (and through it, and ).
We do so by taking the column space of . Let be the column space of . We assume that all values are real, so is a subspace of
We also defined to be an inner product space, with it’s inner product being the dot product.
Then, our goal is to find a value of such that the distance between and is minimised.
We can use the properties of orthogonal projection:
So the vector closest to in is . Let . Another property of orthogonal projection can be used:
Let (i.e. ). Then
Since the inner product of is the dot product:
Now we use a property unique to the dot product:
Then using properties of matrix transposition:
Remember that is any vector in . That means can also be . In that case, we can take the positivity of the inner product:
(Since we just look at the case where )
And, if is an invertible matrix, then: