Workflow

There are two main levels of access to mumott. Most routine tasks with regard to alignment of reconstruction can be accomplished via pipelines, which are provided in the form of functions.

Internally the pipelines are constructed using objects. The latter provide more fine-grained control over alignment or reconstruction, and can be used to construct custom pipelines.

Pipelines

Reconstruction workflows are most easily accessed via reconstruction pipelines. A pipeline represents a series of subtasks, which are represented via objects. This structure makes it possible to replace some of the components in the pipeline with others preferred by the user.

# dot -Tsvg workflow.dot -o workflow.svg digraph g { graph [ fontname = "helvetica", fontsize = 12.0, rankdir = "TB", bgcolor = "transparent" ]; edge [ fontname = "helvetica", fontsize = 12.0, penwidth = 1.5 ] node [ fontname = "helvetica", fontsize = 12.0, fontcolor = black, shape = ellipse, color = "#a0c9e5", style = filled]; Data [ color="#ffca9c", label="Measured data and metadata\nincluding geometry information", shape=box]; UserParams [ color="#ffca9c", label="User preferences", shape=box, target="_top"]; DataContainer [ label="DataContainer", href="../moduleref/data_handling.html#mumott.data_handling.DataContainer", target="_top" ]; Pipeline [ label="Pipeline", href="../moduleref/pipelines.html", target="_top"]; Components [label="Pipeline component objects", href="../moduleref/methods.html", target="_top"]; Result [label="Reconstruction", target="_top"]; Output [ shape=rectangle, color="#a2daa2", label="Tensor field properties\n(anisotropy, orientation ...)", fontcolor=black, href="../tutorial/reconstruct_and_visualizer.html", target="_top"]; Data -> DataContainer DataContainer -> Pipeline UserParams -> Pipeline Pipeline -> Components Pipeline -> Result Components -> Output Result -> Output [label="Processed via\n BasisSet"] }

The user interaction with the pipeline can be understood as follows:

  1. A DataContainer instance is created from input.

  2. The DataContainer is passed to a pipeline function, e.g., the MITRA pipeline function, along with user-specified parameters as keyword arguments.

  3. For example, one might want to add a Total Variation regularizer, which requires submitting a list with a dictionary containing the regularizer, its name, and weight. In addition, the user will probably pass values for the arguments use_gpu (depending on whether they have a CUDA-capable GPU) and use_absorbances (True if they want to reconstruct the absorbances from the diode measurement, False if they want to carry out tensor tomography).

  4. The MITRA pipeline executes, and returns a dict which contains the entry 'result' with the optimization coefficients. In addition, it contains the entries optimizer, loss_function, residual_calculator, basis_set, and projector, all containing the instances of the respective objects used in the pipeline.

  5. The optimized coefficients can then be processed via the basis set object function get_output() to generate tensor field properties such as the anisotropy or the orientation distribution returned as a dict.

  6. The function dict_to_h5 can be used to convert this dictionary of properties into an h5 file to be further processed or visualized.

Alignment workflow can be accessed via alignment pipelines. They are used similarly to reconstruction pipelines, but their output is instead the parameters needed to align the projections of the data set.

# dot -Tsvg workflow.dot -o workflow.svg digraph g { graph [ fontname = "helvetica", fontsize = 12.0, rankdir = "TB", bgcolor = "transparent" ]; edge [ fontname = "helvetica", fontsize = 12.0, penwidth = 1.5 ] node [ fontname = "helvetica", fontsize = 12.0, fontcolor = black, shape = ellipse, color = "#a0c9e5", style = filled]; Data [ color="#ffca9c", label="Measured data and metadata\nincluding geometry information", shape=box]; UserParams [ color="#ffca9c", label="User preferences", shape=box, target="_top"]; DataContainer [ label="DataContainer", href="../moduleref/data_handling.html#mumott.data_handling.DataContainer", target="_top" ]; Pipeline [ label="Alignment pipeline", href="../moduleref/pipelines.html#alignment", target="_top"]; Reconstruction [ color="#ffca9c", label="Reconstruction pipeline", shape=box, href="../moduleref/pipelines.html#reconstruction", target="_top"]; Result [ shape=rectangle, color="#a2daa2", label="Alignment parameters", fontcolor=black, target="_top"]; Data -> DataContainer Reconstruction -> Pipeline DataContainer -> Pipeline UserParams -> Pipeline Pipeline -> Result }

The interaction is similar to that of the reconstruction pipelines:

  1. A DataContainer instance is created from input.

  2. The DataContainer is passed to a pipeline function, e.g., the phase matching alignment, along with some user-specified parameters.

  3. For example, the user might want to reduce the initial number of iterations, or increase the upsampling rate. It is also possible to pass a specific reconstruction pipeline, e.g., the filtered back-projection pipeline.

  4. The alignment executes, and returns a dictionary of aligned reconstructions. For the phase-matching alignment, the parameters are automatically saved in the Geometry atteched to the DataContainer. For the optical flow alignemnt, they are returned as a tuple in three-dimensional space that must be translated into projection space; see the documentation string for the optical flow pipeline for more details.

Object structure

The following figure illustrates the mumott object structure. Here, classes are shown in blue, input parameters and data in orange, and output data in green.

# dot -Tsvg workflow.dot -o workflow.svg digraph g { graph [ fontname = "helvetica", fontsize = 12.0, rankdir = "TB", bgcolor = "transparent", ranksep = 0.1]; edge [ fontname = "helvetica", fontsize = 12.0, penwidth = 1.5 ] node [ fontname = "helvetica", fontsize = 12.0, fontcolor = black, shape = ellipse, fillcolor = "#a0c9e5", style = "filled,solid"]; Data [ fillcolor="#ffca9c", label="Measured data and metadata\nincluding geometry information", shape=box]; Resolution [ fillcolor="#ffca9c", label="Bandwidth or resolution", shape=box, target="_top"]; DataContainer [ label="DataContainer", href="../moduleref/data_handling.html#mumott.data_handling.DataContainer", target="_top" ]; Geometry [ label="Geometry", href="../moduleref/data_handling.html#mumott.data_handling.Geometry", target="_top" ]; Projector [ label="Projector", href="../moduleref/projectors.html", target="_top" ]; ResidualCalculator [ label="ResidualCalculator", href="../moduleref/residual_calculators.html", target="_top"]; Regularizer [ label="Regularizer", href="../moduleref/regularizers.html", target="_top"]; BasisSet [ label="BasisSet", href="../moduleref/basis_sets.html", target="_top" ]; LossFunction [ label="LossFunction", href="../moduleref/loss_functions.html", target="_top" ]; Optimizer [ label="Optimizer", href="../moduleref/optimizers.html", target="_top" ] Output [ shape=rectangle, fillcolor="#a2daa2", label="Tensor field properties\n(anisotropy, orientation ...)", fontcolor=black, href="../tutorial/reconstruct_and_visualizer.html", target="_top"]; Data -> DataContainer DataContainer -> ResidualCalculator Geometry -> DataContainer [dir = "back"] Geometry -> Projector Resolution -> BasisSet {rank = same; Resolution; DataContainer} {rank = same; Geometry; DataContainer;} {rank = same; Projector; BasisSet} Projector -> ResidualCalculator BasisSet -> ResidualCalculator Regularizer -> LossFunction [label="Attached"] {rank = same; Regularizer; LossFunction} ResidualCalculator -> LossFunction LossFunction -> Optimizer Optimizer -> Output [label="Processed via\n BasisSet"] }

A typical workflow involves the following steps:

  1. First the measured data along with its metadata is loaded into a DataContainer object. The latter allows one to access, inspect, and modify the data in various ways as shown in the tutorial on loading and inspecting data tutorial. Note that it is possible to skip the full data when instantiating a DataContainer object. In that case only geometry and diode data are read, which is much faster and sufficient for alignment.

  2. The DataContainer object holds the information pertaining to the geometry of the data. The latter is stored in the geometry property of the DataContainer object in the form of a Geometry object.

  3. The geometry information is then used to set up a projector object, e.g., SAXSProjector. Projector objects allow one to transform tensor fields from three-dimensional space to projection space.

  4. Next a basis set object such as, e.g., SphericalHarmonics, is set up.

  5. One can then combine the projector object, the basis set, and the data from the DataContainer object to set up a residual calculator object. Residual calculator objects hold the coefficients that need to be optimized and allow one to compute the residuals of the current representation.

  6. To find the optimal coefficients a loss function object is set up, using, e.g., the SquaredLoss or HuberLoss classes. The loss function can include one or several regularization terms, which are defined by regularizer objects such as L1Norm, L2Norm or TotalVariation.

  7. The loss function object is then handed over to an optimizer object, such as LBFGS or GradientDescent, which updates the coefficients of the residual calculator object.

  8. The get_output() method of the basis set can then be used to generate tensor field properties, as in the pipeline workflow.

Asynchronous reconstruction

The regular object-oriented structure and pipelines can leverage the GPU to carry out some computations for increased efficiency, but still operate synchronously. This means the CPU synchronizes with the GPU twice or more per iteration, which can cause a large computational overhead.

A more computationally efficient approach is to carry out all computations using the GPU. This is made possible through the use of CUDA kernels not only for the John transform but for all arithmetic and linear-algebraic computations necessary for an optimization.

Using specialized kernels allows us to not only carry out asynchronous operations (meaning the CPU sends instructions to the GPU ahead-of-time, and only waits for these instructions to be carried out at relatively long intervals), but also to optimize memory usage by pre-allocating arrays and exploiting in-place operations. However, much of the standard object structure cannot be used any more - functions for, e.g., calculating regularization gradients and loss functions must be re-written in CUDA. Additionally, asynchronous optimizers cannot make heavy use of conditional behavior (such as if-statements which lead to different branches) in each iteration, since this typically requires a synchronization. Instead, it is most straightforward to use gradient descent-like methods terminated by selecting the maximum number of iterations.

Asynchronous pipelines use some functions and properties of standard objects, like basis set objects and projectors, to generate the necessary kernels, but do not use object methods in the actual optimization. Therefore, each pipeline uses a predefined set of features such as regularizers, a maximum number of iterations, and so on, which are directly configurable via keyword arguments.

  1. A DataContainer instance is created from input.

  2. The DataContainer is passed to an asynchronous pipeline, along with optional arguments such as the number of iterations, and how often the optimization should be synchronized to update the user on its progress.

  3. The pipeline carries out all of the allocations of device-side (GPU) arrays, and the computation is carried out asynchronously on the GPU.

  4. The pipeline returns a dictionary with the reconstruction and some additional properties, such as the evolution of the loss function. The reconstructed coefficients can be processed similarly to the result of any other reconstruction.

Expert users may wish to construct their own asynchronous pipelines by following the structure of the asynchronous pipelines and making use of CUDA kernels.