Loss functions

class mumott.optimization.loss_functions.SquaredLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1)[source]

Class object for obtaining the squared loss function and gradient from a given residual_calculator.

This loss function can be written as \(L(r(x, d)) = 0.5 r(x, d)^2\), where \(r\) is the residual, a function of \(x\), the optimization coefficients, and \(d\), the data. The gradient with respect to \(x\) is then \(\frac{\partial r}{\partial x}\). The partial derivative of \(r\) with respect to \(x\) is the responsibility of the residual_calculator to compute.

Generally speaking, the squared loss function is easy to compute and has a well-behaved gradient, but it is not robust against outliers in the data. Using weights to normalize residuals by the variance can mitigate this somewhat.

Parameters
  • residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.

  • use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is False.

  • preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as residual_calculator.coefficients or it must be possible to broadcast by multiplication.

  • residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.

add_regularizer(name, regularizer, regularization_weight)

Add a regularizer to the loss function.

Parameters
  • name (str) – Name of the regularizer, to be used as its key.

  • regularizer (Regularizer) – The Regularizer instance to be attached.

  • regularization_weight (float) – The regularization weight (often denoted \(\lambda\)), by which the residual norm and gradient will be scaled.

Return type

None

get_estimate_of_lifschitz_constant()[source]

Calculate an estimate of the Lifschitz constant of this cost function. Used to determine a safe step-size for certain optimization algorithms.

Returns

Lifschitz constant.

Return type

lifschitz_constant

get_loss(coefficients=None, get_gradient=False, gradient_part=None)

Returns loss function value and possibly gradient based on the given coefficients.

Notes

This method simply calls the methods get_residual_norm() and get_regularization_norm() and sums up their respective contributions.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients. Default value is None, which leaves it up to get_residual_norm() to handle the choice of coefficients, which in general defaults to using the coefficients of the attached residual_calculator.

  • get_gradient (bool) – If True, returns a 'gradient' of the same shape as residual_calculator.coefficients. Otherwise, the entry 'gradient' will be None.

Returns

A dictionary with at least two entries, loss and gradient.

get_regularization_norm(coefficients=None, get_gradient=False, gradient_part=None)

Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attr`coefficients`. If no coefficients are provided, the ones from the attached residual_calculator are used.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients.

  • get_gradient (bool) – Whether to compute and return the gradient. Default is False

  • gradient_part (Optional[str]) – Used for the zonal harmonics resonstructions to determine what part of the gradient is being calculated. Default is None.

Return type

dict[str, dict[str, Any]]

Returns

A dictionary with one entry for each regularizer in regularizers, containing 'regularization_norm' and 'gradient' as entries.

get_residual_norm(coefficients=None, get_gradient=False, gradient_part=None)

Returns residual norm and possibly gradient based on the attached residual_calculator. If coefficients is given, residual_calculator.coefficients will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients.

  • get_gradient (bool) – If True, returns a 'gradient' of the same shape as residual_calculator.coefficients. Otherwise, the entry 'gradient' will be None.

Return type

dict

Returns

A dictionary with at least two entries, residual_norm and gradient.

property initial_values: ndarray[float]

Initial coefficient values for optimizer; defaults to zeros.

property preconditioner: ndarray[float]

Preconditioner that is applied to the gradient by multiplication.

property preconditioner_hash: str

Hash of the preconditioner.

property regularization_weights: dict[str, float]

The dictionary of regularization weights appended to this loss function.

property regularizers: dict[str, mumott.optimization.regularizers.base_regularizer.Regularizer]

The dictionary of regularizers appended to this loss function.

property residual_norm_multiplier: float

Multiplicative factor by which the residual norm will be scaled. Can be used, together with any regularization_weights, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.

property use_preconditioner_mask: bool

Determines whether a mask is calculated from the preconditioner.

property use_weights: bool

Whether to use weights or not in calculating the residual and gradient.

class mumott.optimization.loss_functions.HuberLoss(residual_calculator, use_weights=False, preconditioner=None, residual_norm_multiplier=1.0, delta=1.0)[source]

Class object for obtaining the Huber loss function and gradient from a given residual_calculator.

This loss function is used for so-called robust regression and can be written as

\[\begin{split}L(r(x, D)) = \begin{Bmatrix} \vert r(x, D) \vert - 0.5 \delta & \quad \text{if } \vert r(x, D) \vert > \delta \\ \dfrac{r(x, D)^2}{2 \delta} & \quad \text{if } \vert r(x, D) \vert < \delta \end{Bmatrix},\end{split}\]

where \(r\) is the residual, a function of \(x\), the optimization coefficients, and \(D\), the data. The gradient with respect to \(x\) is then \(\sigma(\frac{\partial r}{\partial x})\) for large \(r\), where \(\sigma(x)\) is the sign function, and \(\frac{\partial r}{\partial x}\) for small \(r\). The partial derivative of \(r\) with respect to \(x\) is the responsibility of the residual_calculator to compute.

Broadly speaking, the Huber loss function is less sensitive to outliers than the squared (or \(L_2\)) loss function, while it is easier to minimize than the \(L_1\) loss function since it its derivative is continuous in the entire domain.

See also the Wikipedia articles on robust regression and the Huber loss.

Parameters
  • residual_calculator (ResidualCalculator) – The residual calculator instance from which the residuals, weights, and gradient terms are obtained.

  • use_weights (bool) – Whether to use weighting in the computation of the residual norm and gradient. Default is False.

  • preconditioner (np.ndarray) – A preconditioner to be applied to the gradient. Must have the same shape as residual_calculator.coefficients or it must be possible to broadcast by multiplication.

  • residual_norm_multiplier (float) – A multiplier that is applied to the residual norm and gradient. Useful in cases where a very small or large loss function value changes the optimizer behaviour.

  • delta (float) – The cutoff value where the \(L_1\) loss function is spliced with the \(L_2\) loss function. The default value is 1., but the appropriate value to use depends on the data and the chosen representation.

add_regularizer(name, regularizer, regularization_weight)

Add a regularizer to the loss function.

Parameters
  • name (str) – Name of the regularizer, to be used as its key.

  • regularizer (Regularizer) – The Regularizer instance to be attached.

  • regularization_weight (float) – The regularization weight (often denoted \(\lambda\)), by which the residual norm and gradient will be scaled.

Return type

None

get_loss(coefficients=None, get_gradient=False, gradient_part=None)

Returns loss function value and possibly gradient based on the given coefficients.

Notes

This method simply calls the methods get_residual_norm() and get_regularization_norm() and sums up their respective contributions.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients. Default value is None, which leaves it up to get_residual_norm() to handle the choice of coefficients, which in general defaults to using the coefficients of the attached residual_calculator.

  • get_gradient (bool) – If True, returns a 'gradient' of the same shape as residual_calculator.coefficients. Otherwise, the entry 'gradient' will be None.

Returns

A dictionary with at least two entries, loss and gradient.

get_regularization_norm(coefficients=None, get_gradient=False, gradient_part=None)

Returns the regularization norm, and if requested, the gradient, from all regularizers attached to this instance, based on the provided :attr`coefficients`. If no coefficients are provided, the ones from the attached residual_calculator are used.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients.

  • get_gradient (bool) – Whether to compute and return the gradient. Default is False

  • gradient_part (Optional[str]) – Used for the zonal harmonics resonstructions to determine what part of the gradient is being calculated. Default is None.

Return type

dict[str, dict[str, Any]]

Returns

A dictionary with one entry for each regularizer in regularizers, containing 'regularization_norm' and 'gradient' as entries.

get_residual_norm(coefficients=None, get_gradient=False, gradient_part=None)

Returns residual norm and possibly gradient based on the attached residual_calculator. If coefficients is given, residual_calculator.coefficients will be updated with these new values, otherwise, the residual norm and possibly the gradient will just be calculated using the current coefficients.

Parameters
  • coefficients (Optional[ndarray[float]]) – An np.ndarray of values of the same shape as residual_calculator.coefficients.

  • get_gradient (bool) – If True, returns a 'gradient' of the same shape as residual_calculator.coefficients. Otherwise, the entry 'gradient' will be None.

Return type

dict

Returns

A dictionary with at least two entries, residual_norm and gradient.

property initial_values: ndarray[float]

Initial coefficient values for optimizer; defaults to zeros.

property preconditioner: ndarray[float]

Preconditioner that is applied to the gradient by multiplication.

property preconditioner_hash: str

Hash of the preconditioner.

property regularization_weights: dict[str, float]

The dictionary of regularization weights appended to this loss function.

property regularizers: dict[str, mumott.optimization.regularizers.base_regularizer.Regularizer]

The dictionary of regularizers appended to this loss function.

property residual_norm_multiplier: float

Multiplicative factor by which the residual norm will be scaled. Can be used, together with any regularization_weights, to scale the loss function, in order to address unexpected behaviour that arises when some optimizers are given very small or very large loss functions.

property use_preconditioner_mask: bool

Determines whether a mask is calculated from the preconditioner.

property use_weights: bool

Whether to use weights or not in calculating the residual and gradient.