Delira Introduction

Last updated: 09.05.2019

Authors: Justus Schock, Christoph Haarburger

Loading Data

To train your network you first need to load your training data (and probably also your validation data). This chapter will therefore deal with delira’s capabilities to load your data (and apply some augmentation).

The Dataset

There are mainly two ways to load your data: Lazy or non-lazy. Loading in a lazy way means that you load the data just in time and keep the used memory to a bare minimum. This has, however, the disadvantage that your loading function could be a bottleneck since all postponed operations may have to wait until the needed data samples are loaded. In a no-lazy way, one would preload all data to RAM before starting any other operations. This has the advantage that there cannot be a loading bottleneck during latter operations. This advantage comes at cost of a higher memory usage and a (possibly) huge latency at the beginning of each experiment. Both ways to load your data are implemented in delira and they are named BaseLazyDatasetand BaseCacheDataset. In the following steps you will only see the BaseLazyDataset since exchanging them is trivial. All Datasets (including the ones you might want to create yourself later) must be derived of delira.data_loading.AbstractDataset to ensure a minimum common API.

The dataset’s __init__ has the following signature:

def __init__(self, data_path, load_fn, **load_kwargs):

This means, you have to pass the path to the directory containing your data (data_path), a function to load a single sample of your data (load_fn). To get a single sample of your dataset after creating it, you can index it like this: dataset[0]. Additionally you can iterate over your dataset just like over any other python iterator via

for sample in dataset:
    # do your stuff here

or enumerate it via

for idx, sample in enumerate(dataset):
    # do your stuff here


The missing argument **load_kwargs accepts an arbitrary amount of additional keyword arguments which are directly passed to your loading function.

An example of how loading your data may look like is given below:

from delira.data_loading import BaseLazyDataset, default_load_fn_2d
dataset_train = BaseLazyDataset("/images/datasets/external/mnist/train",
                                default_load_fn_2d, img_shape=(224, 224))

In this case all data lying in /images/datasets/external/mnist/train is loaded by default_load_fn_2d. The files containing the data must be PNG-files, while the groundtruth is defined in TXT-files. The default_load_fn_2d needs the additional argument img_shape which is passed as keyword argument via **load_kwargs.

Note: for reproducability we decided to use some wrapped PyTorch datasets for this introduction.

Now, let’s just initialize our trainset:

from delira.data_loading import TorchvisionClassificationDataset
dataset_train = TorchvisionClassificationDataset("mnist", train=True,
                                                 img_shape=(224, 224))

Getting a single sample of your dataset with dataset_train[0] will produce:


which means, that our data is stored in a dictionary containing the keys data and label, each of them holding the corresponding numpy arrays. The dataloading works on numpy purely and is thus backend agnostic. It does not matter in which format or with which library you load/preprocess your data, but at the end it must be converted to numpy arrays For validation purposes another dataset could be created with the test data like this:

dataset_val = TorchvisionClassificationDataset("mnist", train=False,
                                               img_shape=(224, 224))

The Dataloader

The Dataloader wraps your dataset to privode the ability to load whole batches with an abstract interface. To create a dataloader, one would have to pass the following arguments to it’s __init__: the previously created dataset.Additionally, it is possible to pass the batch_size defining the number of samples per batch, the total number of batches (num_batches), which will be the number of samples in your dataset devided by the batchsize per default, a random seedfor always getting the same behaviour of random number generators and a `sampler <>`__ defining your sampling strategy. This would create a dataloader for your dataset_train:

from delira.data_loading import BaseDataLoader

batch_size = 32

loader_train = BaseDataLoader(dataset_train, batch_size)

Since the batch_size has been set to 32, the loader will load 32 samples as one batch.

Even though it would be possible to train your network with an instance of BaseDataLoader, malira also offers a different approach that covers multithreaded data loading and augmentation:

The Datamanager

The data manager is implemented as delira.data_loading.BaseDataManager and wraps a DataLoader. It also encapsulates augmentations. Having a view on the BaseDataManager’s signature, it becomes obvious that it accepts the same arguments as the `DataLoader <#The-Dataloader>`__. You can either pass a dataset or a combination of path, dataset class and load function. Additionally, you can pass a custom dataloder class if necessary and a sampler class to choose a sampling algorithm.

The parameter transforms accepts augmentation transformations as implemented in batchgenerators. Augmentation is applied on the fly using n_process_augmentation threads.

All in all the DataManager is the recommended way to generate batches from your dataset.

The following example shows how to create a data manager instance:

from delira.data_loading import BaseDataManager
from batchgenerators.transforms.abstract_transforms import Compose
from batchgenerators.transforms.sample_normalization_transforms import MeanStdNormalizationTransform

batchsize = 64
transforms = Compose([MeanStdNormalizationTransform(mean=1*[0], std=1*[1])])

data_manager_train = BaseDataManager(dataset_train,  # dataset to use
                                    batchsize,  # batchsize
                                    n_process_augmentation=1,  # number of augmentation processes
                                    transforms=transforms)  # augmentation transforms

The approach to initialize a DataManager from a datapath takes more arguments since, in opposite to initializaton from dataset, it needs all the arguments which are necessary to internally create a dataset.

Since we want to validate our model we have to create a second manager containing our dataset_val:

data_manager_val = BaseDataManager(dataset_val,

That’s it - we just finished loading our data!

Iterating over a DataManager is possible in simple loops:

from import tqdm # utility for progress bars

# create actual batch generator from DataManager
batchgen = data_manager_val.get_batchgen()

for data in tqdm(batchgen):
    pass # here you can access the data of the current batch


In previous section samplers have been already mentioned but not yet explained. A sampler implements an algorithm how a batch should be assembled from single samples in a dataset. delira provides the following sampler classes in it’s subpackage delira.data_loading.sampler:

  • AbstractSampler

  • SequentialSampler

  • PrevalenceSequentialSampler

  • RandomSampler

  • PrevalenceRandomSampler

  • WeightedRandomSampler

  • LambdaSampler

The AbstractSampler implements no sampling algorithm but defines a sampling API and thus all custom samplers must inherit from this class. The Sequential sampler builds batches by just iterating over the samples’ indices in a sequential way. Following this, the RandomSampler builds batches by randomly drawing the samples’ indices with replacement. If the class each sample belongs to is known for each sample at the beginning, the PrevalenceSequentialSampler and the PrevalenceRandomSampler perform a per-class sequential or random sampling and building each batch with the exactly same number of samples from each class. The WeightedRandomSampleraccepts custom weights to give specific samples a higher probability during random sampling than others.

The LambdaSampler is a wrapper for a custom sampling function, which can be passed to the wrapper during it’s initialization, to ensure API conformity.

It can be passed to the DataLoader or DataManager as class (argument sampler_cls) or as instance (argument sampler).


Since the purpose of this framework is to use machine learning algorithms, there has to be a way to define them. Defining models is straight forward. delira provides a class delira.models.AbstractNetwork. All models must inherit from this class.

To inherit this class four functions must be implemented in the subclass:

  • __init__

  • closure

  • prepare_batch

  • __call__


The __init__function is a classes constructor. In our case it builds the entire model (maybe using some helper functions). If writing your own custom model, you have to override this method.

Note: If you want the best experience for saving your model and completely recreating it during the loading process you need to take care of a few things: * if using torchvision.models to build your model, always import it with from torchvision import models as t_models * register all arguments in your custom __init__ in the abstract class. A init_prototype could look like this:

def __init__(self, in_channels: int, n_outputs: int, **kwargs):

    in_channels: int
        number of input_channels
    n_outputs: int
        number of outputs (usually same as number of classes)
    # register params by passing them as kwargs to parent class __init__
    # only params registered like this will be saved!


The closurefunction defines one batch iteration to train the network. This function is needed for the framework to provide a generic trainer function which works with all kind of networks and loss functions.

The closure function must implement all steps from forwarding, over loss calculation, metric calculation, logging (for which delira.logging_handlers provides some extensions for pythons logging module), and the actual backpropagation.

It is called with an empty optimizer-dict to evaluate and should thus work with optional optimizers.


The prepare_batchfunction defines the transformation from loaded data to match the networks input and output shape and pushes everything to the right device.

Abstract Networks for specific Backends


At the time of writing, PyTorch is the only backend which is supported, but other backends are planned. In PyTorch every network should be implemented as a subclass of torch.nn.Module, which also provides a __call__ method.

This results in sloghtly different requirements for PyTorch networks: instead of implementing a __call__ method, we simply call the torch.nn.Module.__call__ and therefore have to implement the forward method, which defines the module’s behaviour and is internally called by torch.nn.Module.__call__ (among other stuff). To give a default behaviour suiting most cases and not have to care about internals, delira provides the AbstractPyTorchNetwork which is a more specific case of the AbstractNetwork for PyTorch modules.


The forward function defines what has to be done to forward your input through your network and must return a dictionary. Assuming your network has three convolutional layers stored in self.conv1, self.conv2 and self.conv3 and a ReLU stored in self.relu, a simple forward function could look like this:

def forward(self, input_batch: torch.Tensor):
    out_1 = self.relu(self.conv1(input_batch))
    out_2 = self.relu(self.conv2(out_1))
    out_3 = self.conv3(out2)

    return {"pred": out_3}


The default prepare_batch function for PyTorch networks looks like this:

def prepare_batch(batch: dict, input_device, output_device):
    Helper Function to prepare Network Inputs and Labels (convert them to
    correct type and shape and push them to correct devices)

    batch : dict
        dictionary containing all the data
    input_device : torch.device
        device for network inputs
    output_device : torch.device
        device for network outputs

        dictionary containing data in correct type and shape and on correct

    return_dict = {"data": torch.from_numpy(batch.pop("data")).to(

    for key, vals in batch.items():
        return_dict[key] = torch.from_numpy(vals).to(output_device)

    return return_dict

and can be customized by subclassing the AbstractPyTorchNetwork.

closure example

A simple closure function for a PyTorch module could look like this:

    def closure(model: AbstractPyTorchNetwork, data_dict: dict,
                optimizers: dict, criterions={}, metrics={},
                fold=0, **kwargs):
        closure method to do a single backpropagation step

        model : :class:`ClassificationNetworkBasePyTorch`
            trainable model
        data_dict : dict
            dictionary containing the data
        optimizers : dict
            dictionary of optimizers to optimize model's parameters
        criterions : dict
            dict holding the criterions to calculate errors
            (gradients from different criterions will be accumulated)
        metrics : dict
            dict holding the metrics to calculate
        fold : int
            Current Fold in Crossvalidation (default: 0)
            additional keyword arguments

            Metric values (with same keys as input dict metrics)
            Loss values (with same keys as input dict criterions)
            Arbitrary number of predictions as torch.Tensor

            if optimizers or criterions are empty or the optimizers are not


        assert (optimizers and criterions) or not optimizers, \
            "Criterion dict cannot be emtpy, if optimizers are passed"

        loss_vals = {}
        metric_vals = {}
        total_loss = 0

        # choose suitable context manager:
        if optimizers:
            context_man = torch.enable_grad

            context_man = torch.no_grad

        with context_man():

            inputs = data_dict.pop("data")
            # obtain outputs from network
            preds = model(inputs)["pred"]

            if data_dict:

                for key, crit_fn in criterions.items():
                    _loss_val = crit_fn(preds, *data_dict.values())
                    loss_vals[key] = _loss_val.detach()
                    total_loss += _loss_val

                with torch.no_grad():
                    for key, metric_fn in metrics.items():
                        metric_vals[key] = metric_fn(
                            preds, *data_dict.values())

        if optimizers:


            # add prefix "val" in validation mode
            eval_loss_vals, eval_metrics_vals = {}, {}
            for key in loss_vals.keys():
                eval_loss_vals["val_" + str(key)] = loss_vals[key]

            for key in metric_vals:
                eval_metrics_vals["val_" + str(key)] = metric_vals[key]

            loss_vals = eval_loss_vals
            metric_vals = eval_metrics_vals

        for key, val in {**metric_vals, **loss_vals}.items():
  {"value": {"value": val.item(), "name": key,
                                    "env_appendix": "_%02d" % fold
                                    }}){'image_grid': {"images": inputs, "name": "input_images",
                                     "env_appendix": "_%02d" % fold}})

        return metric_vals, loss_vals, preds

**Note:** This closure is taken from the

Other examples

In delira.models you can find exemplaric implementations of generative adversarial networks, classification and regression approaches or segmentation networks.



Training-parameters (often called hyperparameters) can be defined in the class.

The class accepts the parameters batch_size and num_epochs to define the batchsize and the number of epochs to train, the parameters optimizer_cls and optimizer_params to create an optimizer or training, the parameter criterions to specify the training criterions (whose gradients will be accumulated by defaut), the parameters lr_sched_cls and lr_sched_params to define the learning rate scheduling and the parameter metrics to specify evaluation metrics.

Additionally, it is possible to pass an aritrary number of keyword arguments to the class

It is good practice to create a Parameters object at the beginning and then use it for creating other objects which are needed for training, since you can use the classes attributes and changes in hyperparameters only have to be done once:

import torch
from import Parameters
from delira.data_loading import RandomSampler, SequentialSampler

params = Parameters(fixed_params={
    "model": {},
    "training": {
        "batch_size": 64, # batchsize to use
        "num_epochs": 2, # number of epochs to train
        "optimizer_cls": torch.optim.Adam, # optimization algorithm to use
        "optimizer_params": {'lr': 1e-3}, # initialization parameters for this algorithm
        "criterions": {"CE": torch.nn.CrossEntropyLoss()}, # the loss function
        "lr_sched_cls": None,  # the learning rate scheduling algorithm to use
        "lr_sched_params": {}, # the corresponding initialization parameters
        "metrics": {} # and some evaluation metrics

# recreating the data managers with the batchsize of the params object
manager_train = BaseDataManager(dataset_train, params.nested_get("batch_size"), 1,
                                transforms=None, sampler_cls=RandomSampler,
manager_val = BaseDataManager(dataset_val, params.nested_get("batch_size"), 3,
                              transforms=None, sampler_cls=SequentialSampler,


The class provides functions to train a single network by passing attributes from your parameter object, a save_freq to specify how often your model should be saved (save_freq=1 indicates every epoch, save_freq=2 every second epoch etc.) and gpu_ids. If you don’t pass any ids at all, your network will be trained on CPU (and probably take a lot of time). If you specify 1 id, the network will be trained on the GPU with the corresponding index and if you pass multiple gpu_ids your network will be trained on multiple GPUs in parallel.

Note: The GPU indices are refering to the devices listed in CUDA_VISIBLE_DEVICES. E.g if CUDA_VISIBLE_DEVICES lists GPUs 3, 4, 5 then gpu_id 0 will be the index for GPU 3 etc.

Note: training on multiple GPUs is not recommended for easy and small networks, since for these networks the synchronization overhead is far greater than the parallelization benefit.

Training your network might look like this:

from import PyTorchNetworkTrainer
from delira.models.classification import ClassificationNetworkBasePyTorch

# path where checkpoints should be saved
save_path = "./results/checkpoints"

model = ClassificationNetworkBasePyTorch(in_channels=1, n_outputs=10)

trainer = PyTorchNetworkTrainer(network=model,

#trainer.train(params.nested_get("num_epochs"), manager_train, manager_val)


The class needs an experiment name, a path to save it’s results to, a parameter object, a model class and the keyword arguments to create an instance of this class. It provides methods to perform a single training and also a method for running a kfold-cross validation. In order to create it, you must choose the PyTorchExperiment, which is basically just a subclass of the AbstractExperiment to provide a general setup for PyTorch modules. Running an experiment could look like this:

from import PyTorchExperiment
from import create_optims_default_pytorch

# Add model parameters to Parameter class
params.fixed.model = {"in_channels": 1, "n_outputs": 10}

experiment = PyTorchExperiment(params=params,
                               gpu_ids=[0]), manager_val)

An Experiment is the most abstract (and recommended) way to define, train and validate your network.


Previous class and function definitions used pythons’s logging library. As extensions for this library delira provides a package (delira.logging) containing handlers to realize different logging methods.

To use these handlers simply add them to your logger like this:


Nowadays, delira mainly relies on trixi for logging and provides only a MultiStreamHandler and a TrixiHandler, which is a binding to trixi’s loggers and integrates them into the python logging module


The MultiStreamHandler accepts an arbitrary number of streams during initialization and writes the message to all of it’s streams during logging.

Logging with Visdom - The trixi Loggers

`Visdom <>`__ is a tool designed to visualize your logs. To use this tool you need to open a port on the machine you want to train on via visdom -port YOUR_PORTNUMBER Afterwards just add the handler of your choice to the logger. For more detailed information and customization have a look at this website.

Logging the scalar tensors containing 1, 2, 3, 4 (at the beginning; will increase to show epochwise logging) with the corresponding keys "one", "two", "three", "four" and two random images with the keys "prediction" and "groundtruth" would look like this:


# import logging handler and logging module
from delira.logging import TrixiHandler
from trixi.logger import PytorchVisdomLogger
import logging

# configure logging module (and root logger)
logger_kwargs = {
    'name': 'test_env', # name of loggin environment
    'port': 9999 # visdom port to connect to
logger_cls = PytorchVisdomLogger

# configure logging module (and root logger)
                    handlers=[TrixiHandler(logger_cls, **logger_kwargs)])
# derive logger from root logger
# (don't do `logger = logging.Logger("...")` since this will create a new
# logger which is unrelated to the root logger
logger = logging.getLogger("Test Logger")

# create dict containing the scalar numbers as torch.Tensor
scalars = {"one": torch.Tensor([1]),
           "two": torch.Tensor([2]),
           "three": torch.Tensor([3]),
           "four": torch.Tensor([4])}

# create dict containing the images as torch.Tensor
# pytorch awaits tensor dimensionality of
# batchsize x image channels x height x width
images = {"prediction": torch.rand(1, 3, 224, 224),
          "groundtruth": torch.rand(1, 3, 224, 224)}

# Simulate 4 Epochs
for i in range(4*NUM_ITERS):{"image_grid": {"images": images["prediction"], "name": "predictions"}})

    for key, val_tensor in scalars.items():{"value": {"value": val_tensor.item(), "name": key}})
        scalars[key] += 1