# Loss functions¶

class mumott.optimization.loss_functions.SquaredLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1)[source]

Class object for obtaining the squared loss function and gradient from a given residual_calculator.

This loss function can be written as $$L(r(x, d)) = 0.5 r(x, d)^2$$, where $$r$$ is the residual, a function of $$x$$, the optimization coefficients, and $$d$$, the data. The gradient with respect to $$x$$ is then $$\frac{\partial r}{\partial x}$$. The partial derivative of $$r$$ with respect to $$x$$ is the responsibility of the residual_calculator to compute.

Generally speaking, the squared loss function is easy to compute and has a well-behaved gradient, but it is not robust against outliers in the data. Using weights to normalize residuals by the variance can mitigate this somewhat.

Parameters
• residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.

• use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is False.

• preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as residual_calculator.coefficients or it must be possible to broadcast by multiplication.

• residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.

Add a regularizer to the loss function.

Parameters
• name (str) – Name of the regularizer, to be used as its key.

• regularizer (Regularizer) – The Regularizer instance to be attached.

• regularization_weight (float) – The regularization weight (often denoted $$\lambda$$), by which the residual norm and gradient will be scaled.

Return type

None

Returns loss function value and possibly gradient based on the given coefficients.

Notes

This method simply calls the methods get_residual_norm() and get_regularization_norm() and sums up their respective contributions.

Parameters
Returns

A dictionary with at least two entries, loss and gradient.

Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attrcoefficients. If no coefficients are provided, the ones from the attached residual_calculator are used.

Parameters
Return type
Returns

A dictionary with one entry for each regularizer in regularizers, containing 'regularization_norm' and 'gradient' as entries.

Returns residual norm and possibly gradient based on the attached residual_calculator. If coefficients is given, residual_calculator.coefficients will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.

Parameters
Return type

Dict

Returns

A dictionary with at least two entries, residual_norm and gradient.

property initial_values: ndarray[Any, dtype[float]]

Initial coefficient values for optimizer; defaults to zeros.

property preconditioner: ndarray[Any, dtype[float]]

Preconditioner that is applied to the gradient by multiplication.

property preconditioner_hash: str

Hash of the preconditioner.

property regularization_weights: Dict[str, float]

The dictionary of regularization weights appended to this loss function.

property regularizers: Dict[str, Regularizer]

The dictionary of regularizers appended to this loss function.

property residual_norm_multiplier: float

Multiplicative factor by which the residual norm will be scaled. Can be used, together with any regularization_weights, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.

property use_weights: bool

Whether to use weights or not in calculating the residual and gradient.

class mumott.optimization.loss_functions.HuberLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1.0, delta=1.0)[source]

Class object for obtaining the Huber loss function and gradient from a given residual_calculator.

This loss function is used for so-called robust regression and can be written as

$\begin{split}L(r(x, D)) = \begin{Bmatrix} \vert r(x, D) \vert - 0.5 \delta & \quad \text{if } \vert r(x, D) \vert > \delta \\ \dfrac{r(x, D)^2}{2 \delta} & \quad \text{if } \vert r(x, D) \vert < \delta \end{Bmatrix},\end{split}$

where $$r$$ is the residual, a function of $$x$$, the optimization coefficients, and $$D$$, the data. The gradient with respect to $$x$$ is then $$\sigma(\frac{\partial r}{\partial x})$$ for large $$r$$, where $$\sigma(x)$$ is the sign function, and $$\frac{\partial r}{\partial x}$$ for small $$r$$. The partial derivative of $$r$$ with respect to $$x$$ is the responsibility of the residual_calculator to compute.

Broadly speaking, the Huber loss function is less sensitive to outliers than the squared (or $$L_2$$) loss function, while it is easier to minimize than the $$L_1$$ loss function since it its derivative is continuous in the entire domain.

See also the Wikipedia articles on robust regression and the Huber loss.

Parameters
• residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.

• use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is False.

• preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as residual_calculator.coefficients or it must be possible to broadcast by multiplication.

• residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.

• delta (float) – The cutoff value where the $$L_1$$ loss function is spliced with the $$L_2$$ loss function. The default value is 1., but the appropriate value to use depends on the data and the chosen representation.

Add a regularizer to the loss function.

Parameters
• name (str) – Name of the regularizer, to be used as its key.

• regularizer (Regularizer) – The Regularizer instance to be attached.

• regularization_weight (float) – The regularization weight (often denoted $$\lambda$$), by which the residual norm and gradient will be scaled.

Return type

None

Returns loss function value and possibly gradient based on the given coefficients.

Notes

This method simply calls the methods get_residual_norm() and get_regularization_norm() and sums up their respective contributions.

Parameters
Returns

A dictionary with at least two entries, loss and gradient.

Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attrcoefficients. If no coefficients are provided, the ones from the attached residual_calculator are used.

Parameters
Return type
Returns

A dictionary with one entry for each regularizer in regularizers, containing 'regularization_norm' and 'gradient' as entries.

Returns residual norm and possibly gradient based on the attached residual_calculator. If coefficients is given, residual_calculator.coefficients will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.

Parameters
Return type

Dict

Returns

A dictionary with at least two entries, residual_norm and gradient.

property initial_values: ndarray[Any, dtype[float]]

Initial coefficient values for optimizer; defaults to zeros.

property preconditioner: ndarray[Any, dtype[float]]

Preconditioner that is applied to the gradient by multiplication.

property preconditioner_hash: str

Hash of the preconditioner.

property regularization_weights: Dict[str, float]

The dictionary of regularization weights appended to this loss function.

property regularizers: Dict[str, Regularizer]

The dictionary of regularizers appended to this loss function.

property residual_norm_multiplier: float

Multiplicative factor by which the residual norm will be scaled. Can be used, together with any regularization_weights, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.

property use_weights: bool

Whether to use weights or not in calculating the residual and gradient.