Data handling

The data_handling module provides functionality for loading and inspecting data. Instances (objects) of the class DataContainer are created by loading data from file. Afterwards one can access, e.g., the geometry via the DataContainer.geometry property, which is a Geometry object. The series of measurements (if available) is accessible via the DataContainer.projections property, which is a ProjectionStack object. The latter acts as a list of individual measurements, which are provided as Projection objects. The mumott.data_handling.utilities module provides convenience function for the calculation of transmittivities and absorbances from diode data.

class mumott.data_handling.DataContainer(data_path=None, data_type='h5', skip_data=False, nonfinite_replacement_value=None)[source]

Instances of this class represent data read from an input file in a format suitable for further analysis. The two core components are geometry and projections. The latter comprises a list of Projection instances, each of which corresponds to a single measurement.

By default all data is read, which can be rather time consuming and unnecessary in some cases, e.g., when aligning data. In those cases, one can skip loading the actual measurements by setting skip_data to True. The geometry information and supplementary information such as the diode data will still be read.

Example

The following code snippet illustrates the basic use of the DataContainer class.

First we create a DataContainer instance, providing the path to the data file to be read.

>>> from mumott.data_handling import DataContainer
>>> dc = DataContainer('tests/test_full_circle.h5')

One can then print a short summary of the content of the DataContainer instance.

>>> print(dc)
==========================================================================
                              DataContainer
--------------------------------------------------------------------------
Corrected for transmission : False
...

To access individual measurements we can use the projections attribute. The latter behaves like a list, where the elements of the list are Projection objects, each of which represents an individual measurement. We can print a summary of the content of the first projection.

>>> print(dc.projections[0])
--------------------------------------------------------------------------
                                Projection
--------------------------------------------------------------------------
hash_data          : 3f0ba8
hash_diode         : 808328
hash_weights       : 088d39
rotation           : [1. 0. 0.], [ 0. -1.  0.], [ 0.  0. -1.]
j_offset           : 0.0
k_offset           : 0.3
inner_angle        : None
outer_angle        : None
inner_axis         : 0.0, 0.0, -1.0
outer_axis         : 1.0, 0.0, 0.0
--------------------------------------------------------------------------

Parameters:

data_path (str, optional) – Path of the data file relative to the directory of execution. If None, a data container with an empty projections attached will be initialized.
data_type (str, optional) – The type (or format) of the data file. Supported values are h5 (default) for hdf5 format and None for an empty DataContainer that can be manually populated.
skip_data (bool, optional) – If True, will skip data from individual measurements when loading the file. This will result in a functioning geometry instance as well as diode and weights entries in each projection, but data will be empty.
nonfinite_replacement_value (float, optional) – Value to replace nonfinite values (np.nan, np.inf, and -np.inf) with in the data, diode, and weights. If None (default), an error is raised if any nonfinite values are present in these input fields.

append(f)[source]

Appends a Projection to the projections attached to this DataContainer instance.

Return type:: None

correct_for_transmission()[source]

Applies correction from the input provided in the diode field. Should only be used if this correction has not been applied yet.

Return type:: None

property data: ndarray[tuple[int, ...], dtype[float64]]: The data in the projections object attached to this DataContainer instance.

property diode: ndarray[tuple[int, ...], dtype[float64]]: The diode data in the projections object attached to this DataContainer instance.

property geometry: Geometry: Container of geometry information.

property projections: ProjectionStack: The projections, containing data and geometry.

property weights: ndarray[tuple[int, ...], dtype[float64]]: The weights in the projections object attached to this DataContainer instance.

write(filename)[source]

Save data and geometry information to a mumott .h5 file.

Parameters:: filename (str) – Path of the data file.
Raises:: ValueError – If the file name does not end on “.h5”.
Return type:: None

Utilities

mumott.data_handling.utilities.get_absorbances(diode, **kwargs)[source]

Calculates the absorbance based on the transmittivity of the diode data.

Notes

The absorbance is defined as the negative base-10 logarithm of the transmittivity. Specifically,

\[A(i, j, k) = -\log_{10}(T(i, j, k))\]

where \(T\) is the transmittivity, normalized to the open interval \((0, 1]\). It can be inferred from this formula why \(T(i, j, k)\) must not have values which are equal to or smaller than \(0\), as that would give a non-finite absorbance. Similarly, values greater than \(1\) would result in physically impossible negative absorbances.

The transmittivity is calculated directly from diode readouts, which may or may not already be normalized and clipped. When using already normalized diode values, it is best to set the keyword argument normalization_percentile to 100, and the argument cutoff_values to whatever cutoff your normalization used.

See get_transmittivities() for more details on the transmittivity calculation.

Parameters:

diode (ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]) – An array of diode readouts.
kwargs – Keyword arguments which are passed on to get_transmittivities().

Return type:

Dict[str, Any]

Returns:

A dictionary with the absorbance, and the union of the cutoff masks from get_transmittivities().

mumott.data_handling.utilities.get_transmittivities(diode, normalize_per_projection=False, normalization_percentile=99.9, cutoff_values=(0.0001, 1.0))[source]

Calculates the transmittivity from the diode, i.e., the fraction of transmitted intensity relative to a high percentile.

Notes

Diode readouts may be given in various formats such as a count or a current. When doing absorption tomography, one is generally interested in the fraction of transmitted intensity. Since we do not generally have access to the incoming flux, or its theoretical readout at complete transmission, we can instead normalize the diode readout based on the largest values, where the beam has only passed through some air. We thus want to compute

\[T(i, j, k) = \frac{I_T(i, j, k)}{I_0}\]

where \(I_T(i, j, k)\) is the diode readout value at projection \(i\), and pixel \((j, k)\) with the approximation \(I_0 \approx \text{max}(I_T(i, j, k))\).

To avoid routine normalization based on individual spurious readouts (from, e.g., hot pixels), by default the normalization is done based on the 99.9th percentile rather than the strict maximum. The normalized values are then clipped to the interval specified by cutoff_values, by default (1e-4, 1.0). A mask is returned which masks out any values outside this range, which can be useful to mask out spurious readouts.

If the transmittivities are to be used to normalize SAXS data, one should leave the normalize_per_projection option at False, because the SAXS data also scales with the incoming flux, and thus we want any variations in flux between projections to be preserved in the transmittivities.

However, if the transmittivities are to be used for transmission (or absorption) tomography, then the normalize_per_projection option should be set to True. Since we are interested in the transmittivity of the sample irrespective of the incoming flux, we are therefore better off assuming the flux is constant over each projection. This corresponds to the slightly modified computation

\[T(i, j, k) = \frac{I_T(i, j, k)}{I_0(i)}\]

with the approximation \(I_0(i) \approx \text{max}_{j, k}(I_T(i, j, k))\), with the understanding that we take the maximum value for each projection \(i\).

Parameters:

diode (ndarray[tuple[int, ...], dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]) – An array of diode readouts.
normalize_per_projection (bool) – If True, the diode will be normalized projection-wise. This is the appropriate choice for absorption tomography. For SAXS, it is preferable to normalize the diode across the entire set of diode measurements in order to account for possible variations in flux. Default value is False.
normalization_percentile (float) – The percentile of values in either the entire set of diode measurements or each projection (depending on normalize_per_projection) to use for normalization. The default value is 99.9. Values above this range will be clipped. If you are certain that you do not have spuriously large diode readout values, you can specify 100. as the percentile instead.
cutoff_values (Tuple[float, float]) – The cutoffs to use for the transmittivity calculation. Default value is (1e-4, 1.0), i.e., one part in ten thousand for the lower bound, and a hundred percent for the upper bound. For values outside of this range, it may be desirable to mask them out during any calculation. For this purpose, a cutoff_mask is included in the return dictionary with the same shape as the weights in projections. In some cases, you may wish to specify other bounds. For example, if you know that your sample is embedded in a substrate which reduces the maximum possible transmittivity, you may wish to lower the upper bound. If you know that your sample has extremely low transmittivity (perhaps compensated with a very long exposure time), then you can set the lower cutoff even lower. The cutoffs must lie within the open interval (0, 1]. A lower bound of 0 is not permitted since this would lead to an invalid absorbance.

Return type:

Dict[str, Any]

Returns:

A dictionary with three entries, transmittivity, cutoff_mask_lower, and cutoff_mask_upper.