A method of linear regression that aims to reduce the sum of the square of the errors.

The sum of the square of errors for data-points is notated as:

Bad notation

is not . This is just an example of shitty notation, because we should all know that:

Similiarly:

Using Linear Algebra

Recall that

can be written as:

where:

Then, is reduced whenever is close to . Since we can’t actually change the values of , we will have to adjust (and through it, and ).

We do so by taking the column space of . Let be the column space of . We assume that all values are real, so is a subspace of

We also defined to be an inner product space, with it’s inner product being the dot product.

Then, our goal is to find a value of such that the distance between and is minimised.

We can use the properties of orthogonal projection:

So the vector closest to in is . Let . Another property of orthogonal projection can be used:

Let (i.e. ). Then

Since the inner product of is the dot product:

Now we use a property unique to the dot product: Matrix Formula

Then using properties of matrix transposition:

Remember that is any vector in . That means can also be . In that case, we can take the positivity of the inner product: 4. Positivity

(Since we just look at the case where )

And, if is an invertible matrix, then: