There are various methods to find the global minimum of a loss function:
Gradient descent
Gradient descent is an iterative numerical process used to train neural networks to try to find the minimum of a multivariable loss function, which represents the errors by adjusting the weights on the inputs.
Because a neural network’s final result is the activation value, which is the weighted sum of the inputs, gradient descent uses chain rule derivates:
- L = Loss Function
- w = weight of a given input
- A = Activation value (output)