Loss functions¶
- class mumott.optimization.loss_functions.SquaredLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1)[source]¶
Class object for obtaining the squared loss function and gradient from a given residual_calculator.
This loss function can be written as \(L(r(x, d)) = 0.5 r(x, d)^2\), where \(r\) is the residual, a function of \(x\), the optimization coefficients, and \(d\), the data. The gradient with respect to \(x\) is then \(\frac{\partial r}{\partial x}\). The partial derivative of \(r\) with respect to \(x\) is the responsibility of the
residual_calculator
to compute.Generally speaking, the squared loss function is easy to compute and has a well-behaved gradient, but it is not robust against outliers in the data. Using weights to normalize residuals by the variance can mitigate this somewhat.
- Parameters
residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.
use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is
False
.preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as
residual_calculator.coefficients
or it must be possible to broadcast by multiplication.residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.
- add_regularizer(name, regularizer, regularization_weight)¶
Add a regularizer to the loss function.
- Parameters
- Return type
- get_loss(coefficients=None, get_gradient=False)¶
Returns loss function value and possibly gradient based on the given
coefficients
.Notes
This method simply calls the methods
get_residual_norm()
andget_regularization_norm()
and sums up their respective contributions.- Parameters
coefficients (
Optional
[ndarray
[Any
,dtype
[float
]]]) – AnNDArray
of values of the same shape asresidual_calculator.coefficients
. Default value isNone
, which leaves it up toget_residual_norm()
to handle the choice of coefficients, which in general defaults to using the coefficients of the attachedresidual_calculator
.get_gradient (
bool
) – IfTrue
, returns a'gradient'
of the same shape asresidual_calculator.coefficients
. Otherwise, the entry'gradient'
will beNone
.
- Returns
A dictionary with at least two entries,
loss
andgradient
.
- get_regularization_norm(coefficients, get_gradient=False)¶
Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attr`coefficients`. If no coefficients are provided, the ones from the attached
residual_calculator
are used.- Parameters
- Return type
- Returns
A dictionary with one entry for each regularizer in
regularizers
, containing'regularization_norm'
and'gradient'
as entries.
- get_residual_norm(coefficients=None, get_gradient=False)¶
Returns residual norm and possibly gradient based on the attached
residual_calculator
. Ifcoefficients
is given,residual_calculator.coefficients
will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.- Parameters
- Return type
- Returns
A dictionary with at least two entries,
residual_norm
andgradient
.
- property initial_values: ndarray[Any, dtype[float]]¶
Initial coefficient values for optimizer; defaults to zeros.
- property preconditioner: ndarray[Any, dtype[float]]¶
Preconditioner that is applied to the gradient by multiplication.
- property regularization_weights: Dict[str, float]¶
The dictionary of regularization weights appended to this loss function.
- property regularizers: Dict[str, Regularizer]¶
The dictionary of regularizers appended to this loss function.
- property residual_norm_multiplier: float¶
Multiplicative factor by which the residual norm will be scaled. Can be used, together with any
regularization_weights
, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.
- class mumott.optimization.loss_functions.HuberLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1.0, delta=1.0)[source]¶
Class object for obtaining the Huber loss function and gradient from a given residual_calculator.
This loss function is used for so-called robust regression and can be written as
\[\begin{split}L(r(x, D)) = \begin{Bmatrix} \vert r(x, D) \vert - 0.5 \delta & \quad \text{if } \vert r(x, D) \vert > \delta \\ \dfrac{r(x, D)^2}{2 \delta} & \quad \text{if } \vert r(x, D) \vert < \delta \end{Bmatrix},\end{split}\]where \(r\) is the residual, a function of \(x\), the optimization coefficients, and \(D\), the data. The gradient with respect to \(x\) is then \(\sigma(\frac{\partial r}{\partial x})\) for large \(r\), where \(\sigma(x)\) is the sign function, and \(\frac{\partial r}{\partial x}\) for small \(r\). The partial derivative of \(r\) with respect to \(x\) is the responsibility of the
residual_calculator
to compute.Broadly speaking, the Huber loss function is less sensitive to outliers than the squared (or \(L_2\)) loss function, while it is easier to minimize than the \(L_1\) loss function since it its derivative is continuous in the entire domain.
See also the Wikipedia articles on robust regression and the Huber loss.
- Parameters
residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.
use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is
False
.preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as
residual_calculator.coefficients
or it must be possible to broadcast by multiplication.residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.
delta (float) – The cutoff value where the \(L_1\) loss function is spliced with the \(L_2\) loss function. The default value is
1.
, but the appropriate value to use depends on the data and the chosen representation.
- add_regularizer(name, regularizer, regularization_weight)¶
Add a regularizer to the loss function.
- Parameters
- Return type
- get_loss(coefficients=None, get_gradient=False)¶
Returns loss function value and possibly gradient based on the given
coefficients
.Notes
This method simply calls the methods
get_residual_norm()
andget_regularization_norm()
and sums up their respective contributions.- Parameters
coefficients (
Optional
[ndarray
[Any
,dtype
[float
]]]) – AnNDArray
of values of the same shape asresidual_calculator.coefficients
. Default value isNone
, which leaves it up toget_residual_norm()
to handle the choice of coefficients, which in general defaults to using the coefficients of the attachedresidual_calculator
.get_gradient (
bool
) – IfTrue
, returns a'gradient'
of the same shape asresidual_calculator.coefficients
. Otherwise, the entry'gradient'
will beNone
.
- Returns
A dictionary with at least two entries,
loss
andgradient
.
- get_regularization_norm(coefficients, get_gradient=False)¶
Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attr`coefficients`. If no coefficients are provided, the ones from the attached
residual_calculator
are used.- Parameters
- Return type
- Returns
A dictionary with one entry for each regularizer in
regularizers
, containing'regularization_norm'
and'gradient'
as entries.
- get_residual_norm(coefficients=None, get_gradient=False)¶
Returns residual norm and possibly gradient based on the attached
residual_calculator
. Ifcoefficients
is given,residual_calculator.coefficients
will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.- Parameters
- Return type
- Returns
A dictionary with at least two entries,
residual_norm
andgradient
.
- property initial_values: ndarray[Any, dtype[float]]¶
Initial coefficient values for optimizer; defaults to zeros.
- property preconditioner: ndarray[Any, dtype[float]]¶
Preconditioner that is applied to the gradient by multiplication.
- property regularization_weights: Dict[str, float]¶
The dictionary of regularization weights appended to this loss function.
- property regularizers: Dict[str, Regularizer]¶
The dictionary of regularizers appended to this loss function.
- property residual_norm_multiplier: float¶
Multiplicative factor by which the residual norm will be scaled. Can be used, together with any
regularization_weights
, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.