There are various methods to find the global minimum of a loss function:

Gradient descent

Gradient descent is an iterative numerical process used to train neural networks to try to find the minimum of a multivariable loss function, which represents the errors by adjusting the weights on the inputs.

Because a neural network’s final result is the activation value, which is the weighted sum of the inputs, gradient descent uses chain rule derivates:

  • L = Loss Function
  • w = weight of a given input
  • A = Activation value (output)

Gradient descent optimisation

#todo

Stochastic Gradient Descent