Delira Introduction¶
Authors: Justus Schock, Christoph Haarburger
Loading Data¶
To train your network you first need to load your training data (and
probably also your validation data). This chapter will therefore deal
with delira’s capabilities to load your data (and apply some
augmentation).
The Dataset¶
There are mainly two ways to load your data: Lazy or non-lazy. Loading
in a lazy way means that you load the data just in time and keep the
used memory to a bare minimum. This has, however, the disadvantage that
your loading function could be a bottleneck since all postponed
operations may have to wait until the needed data samples are loaded. In
a no-lazy way, one would preload all data to RAM before starting any
other operations. This has the advantage that there cannot be a loading
bottleneck during latter operations. This advantage comes at cost of a
higher memory usage and a (possibly) huge latency at the beginning of
each experiment. Both ways to load your data are implemented in
delira and they are named BaseLazyDatasetand
BaseCacheDataset. In the following steps you will only see the
BaseLazyDataset since exchanging them is trivial. All Datasets
(including the ones you might want to create yourself later) must be
derived of delira.data_loading.AbstractDataset to ensure a minimum
common API.
The dataset’s __init__ has the following signature:
def __init__(self, data_path, load_fn, img_extensions, gt_extensions,
**load_kwargs):
This means, you have to pass the path to the directory containing your
data (data_path), a function to load a single sample of your data
(load_fn), the file extensions for valid images (img_extensions)
and the extensions for valid groundtruth files (gt_files). The
defined extensions are used to index all data files in the given
data_path. To get a single sample of your dataset after creating it,
you can index it like this: dataset[0].
The missing argument **load_kwargs accepts an arbitrary amount of
additional keyword arguments which are directly passed to your loading
function.
An example of how loading your data may look like is given below:
from delira.data_loading import BaseLazyDataset, default_load_fn_2d
dataset_train = BaseLazyDataset("/images/datasets/external/mnist/train",
default_load_fn_2d, img_extensions=[".png"],
gt_extensions=[".txt"], img_shape=(224, 224))
In this case all data lying in /images/datasets/external/mnist/train
is loaded by default_load_fn_2d. The files containing the data must
be PNG-files, while the groundtruth is defined in TXT-files. The
default_load_fn_2d needs the additional argument img_shape which
is passed as keyword argument via **load_kwargs.
Note: for reproducability we decided to use some wrapped PyTorch datasets for this introduction.
Now, let’s just initialize our trainset:
Getting a single sample of your dataset with dataset_train[0] will produce:
which means, that our data is stored in a dictionary containing the keys
data and label, each of them holding the corresponding numpy
arrays. The dataloading works on numpy purely and is thus backend
agnostic. It does not matter in which format or with which library you
load/preprocess your data, but at the end it must be converted to numpy
arrays For validation purposes another dataset could be created with the
test data like this:
The Dataloader¶
The Dataloader wraps your dataset to privode the ability to load whole
batches with an abstract interface. To create a dataloader, one would
have to pass the following arguments to it’s __init__: the
previously created dataset.Additionally, it is possible to pass the
batch_size defining the number of samples per batch, the total
number of batches (num_batches), which will be the number of samples
in your dataset devided by the batchsize per default, a random
seedfor always getting the same behaviour of random number
generators and a `sampler <>`__ defining your sampling strategy.
This would create a dataloader for your dataset_train:
Since the batch_size has been set to 32, the loader will load 32 samples as one batch.
Even though it would be possible to train your network with an instance
of BaseDataLoader, malira also offers a different approach that
covers multithreaded data loading and augmentation:
The Datamanager¶
The data manager is implemented as
delira.data_loading.BaseDataManager and wraps a DataLoader. It
also encapsulates augmentations. Having a view on the
BaseDataManager’s signature, it becomes obvious that it accepts the
same arguments as the `DataLoader <#The-Dataloader>`__. You can
either pass a dataset or a combination of path, dataset class and
load function. Additionally, you can pass a custom dataloder class if
necessary and a sampler class to choose a sampling algorithm.
The parameter transforms accepts augmentation transformations as
implemented in batchgenerators. Augmentation is applied on the fly
using n_process_augmentation threads.
All in all the DataManager is the recommended way to generate batches from your dataset.
The following example shows how to create a data manager instance:
The approach to initialize a DataManager from a datapath takes more arguments since, in opposite to initializaton from dataset, it needs all the arguments which are necessary to internally create a dataset.
Since we want to validate our model we have to create a second manager
containing our dataset_val:
That’s it - we just finished loading our data!
Iterating over a DataManager is possible in simple loops:
Sampler¶
In previous section samplers have been already mentioned but not yet
explained. A sampler implements an algorithm how a batch should be
assembled from single samples in a dataset. delira provides the
following sampler classes in it’s subpackage
delira.data_loading.sampler:
AbstractSamplerSequentialSamplerPrevalenceSequentialSamplerRandomSamplerPrevalenceRandomSamplerWeightedRandomSamplerLambdaSampler
The AbstractSampler implements no sampling algorithm but defines a
sampling API and thus all custom samplers must inherit from this class.
The Sequential sampler builds batches by just iterating over the
samples’ indices in a sequential way. Following this, the
RandomSampler builds batches by randomly drawing the samples’
indices with replacement. If the class each sample belongs to is known
for each sample at the beginning, the PrevalenceSequentialSampler
and the PrevalenceRandomSampler perform a per-class sequential or
random sampling and building each batch with the exactly same number of
samples from each class. The WeightedRandomSampleraccepts custom
weights to give specific samples a higher probability during random
sampling than others.
The LambdaSampler is a wrapper for a custom sampling function, which
can be passed to the wrapper during it’s initialization, to ensure API
conformity.
It can be passed to the DataLoader or DataManager as class (argument
sampler_cls) or as instance (argument sampler).
Models¶
Since the purpose of this framework is to use machine learning
algorithms, there has to be a way to define them. Defining models is
straight forward. delira provides a class
delira.models.AbstractNetwork. All models must inherit from this
class.
To inherit this class four functions must be implemented in the subclass:
__init__closureprepare_batch__call__
__init__¶
The __init__function is a classes constructor. In our case it
builds the entire model (maybe using some helper functions). If writing
your own custom model, you have to override this method.
Note: If you want the best experience for saving your model and completely recreating it during the loading process you need to take care of a few things: * if usingtorchvision.modelsto build your model, always import it withfrom torchvision import models as t_models* register all arguments in your custom__init__in the abstract class. A init_prototype could look like this:
def __init__(self, in_channels: int, n_outputs: int, **kwargs):
"""
Parameters
----------
in_channels: int
number of input_channels
n_outputs: int
number of outputs (usually same as number of classes)
"""
# register params by passing them as kwargs to parent class __init__
# only params registered like this will be saved!
super().__init__(in_channels=in_channels,
n_outputs=n_outputs,
**kwargs)
closure¶
The closurefunction defines one batch iteration to train the
network. This function is needed for the framework to provide a generic
trainer function which works with all kind of networks and loss
functions.
The closure function must implement all steps from forwarding, over loss
calculation, metric calculation, logging (for which
delira.logging_handlers provides some extensions for pythons logging
module), and the actual backpropagation.
It is called with an empty optimizer-dict to evaluate and should thus work with optional optimizers.
prepare_batch¶
The prepare_batchfunction defines the transformation from loaded
data to match the networks input and output shape and pushes everything
to the right device.
Abstract Networks for specific Backends¶
PyTorch¶
At the time of writing, PyTorch is the only backend which is supported,
but other backends are planned. In PyTorch every network should be
implemented as a subclass of torch.nn.Module, which also provides a
__call__ method.
This results in sloghtly different requirements for PyTorch networks:
instead of implementing a __call__ method, we simply call the
torch.nn.Module.__call__ and therefore have to implement the
forward method, which defines the module’s behaviour and is
internally called by torch.nn.Module.__call__ (among other stuff).
To give a default behaviour suiting most cases and not have to care
about internals, delira provides the AbstractPyTorchNetwork
which is a more specific case of the AbstractNetwork for PyTorch
modules.
forward¶
The forward function defines what has to be done to forward your
input through your network. Assuming your network has three
convolutional layers stored in self.conv1, self.conv2 and
self.conv3 and a ReLU stored in self.relu, a simple forward
function could look like this:
def forward(self, input_batch: torch.Tensor):
out_1 = self.relu(self.conv1(input_batch))
out_2 = self.relu(self.conv2(out_1))
out_3 = self.conv3(out2)
return out_3
prepare_batch¶
The default prepare_batch function for PyTorch networks looks like
this:
@staticmethod
def prepare_batch(batch: dict, input_device, output_device):
"""
Helper Function to prepare Network Inputs and Labels (convert them to
correct type and shape and push them to correct devices)
Parameters
----------
batch : dict
dictionary containing all the data
input_device : torch.device
device for network inputs
output_device : torch.device
device for network outputs
Returns
-------
dict
dictionary containing data in correct type and shape and on correct
device
"""
return_dict = {"data": torch.from_numpy(batch.pop("data")).to(
input_device)}
for key, vals in batch.items():
return_dict[key] = torch.from_numpy(vals).to(output_device)
return return_dict
and can be customized by subclassing the AbstractPyTorchNetwork.
closure example¶
A simple closure function for a PyTorch module could look like this:
@staticmethod
def closure(model: AbstractPyTorchNetwork, data_dict: dict,
optimizers: dict, criterions={}, metrics={},
fold=0, **kwargs):
"""
closure method to do a single backpropagation step
Parameters
----------
model : :class:`ClassificationNetworkBasePyTorch`
trainable model
data_dict : dict
dictionary containing the data
optimizers : dict
dictionary of optimizers to optimize model's parameters
criterions : dict
dict holding the criterions to calculate errors
(gradients from different criterions will be accumulated)
metrics : dict
dict holding the metrics to calculate
fold : int
Current Fold in Crossvalidation (default: 0)
**kwargs:
additional keyword arguments
Returns
-------
dict
Metric values (with same keys as input dict metrics)
dict
Loss values (with same keys as input dict criterions)
list
Arbitrary number of predictions as torch.Tensor
Raises
------
AssertionError
if optimizers or criterions are empty or the optimizers are not
specified
"""
assert (optimizers and criterions) or not optimizers, \
"Criterion dict cannot be emtpy, if optimizers are passed"
loss_vals = {}
metric_vals = {}
total_loss = 0
# choose suitable context manager:
if optimizers:
context_man = torch.enable_grad
else:
context_man = torch.no_grad
with context_man():
inputs = data_dict.pop("data")
preds = model(inputs)
if data_dict:
for key, crit_fn in criterions.items():
_loss_val = crit_fn(preds, *data_dict.values())
loss_vals[key] = _loss_val.detach()
total_loss += _loss_val
with torch.no_grad():
for key, metric_fn in metrics.items():
metric_vals[key] = metric_fn(
preds, *data_dict.values())
if optimizers:
optimizers['default'].zero_grad()
total_loss.backward()
optimizers['default'].step()
else:
# add prefix "val" in validation mode
eval_loss_vals, eval_metrics_vals = {}, {}
for key in loss_vals.keys():
eval_loss_vals["val_" + str(key)] = loss_vals[key]
for key in metric_vals:
eval_metrics_vals["val_" + str(key)] = metric_vals[key]
loss_vals = eval_loss_vals
metric_vals = eval_metrics_vals
for key, val in {**metric_vals, **loss_vals}.items():
logging.info({"value": {"value": val.item(), "name": key,
"env_appendix": "_%02d" % fold
}})
logging.info({'image_grid': {"images": inputs, "name": "input_images",
"env_appendix": "_%02d" % fold}})
return metric_vals, loss_vals, [preds]
**Note:** This closure is taken from the
``delira.models.classification.ClassificationNetworkBasePyTorch``
Other examples¶
In delira.models you can find exemplaric implementations of
generative adversarial networks, classification and regression
approaches or segmentation networks.
Training¶
Parameters¶
Training-parameters (often called hyperparameters) can be defined in the
delira.training.Parameters class.
The class accepts the parameters batch_size and num_epochs to
define the batchsize and the number of epochs to train, the parameters
optimizer_cls and optimizer_params to create an optimizer or
training, the parameter criterions to specify the training
criterions (whose gradients will be accumulated by defaut), the
parameters lr_sched_cls and lr_sched_params to define the
learning rate scheduling and the parameter metrics to specify
evaluation metrics.
Additionally, it is possible to pass an aritrary number of keyword arguments to the class
It is good practice to create a Parameters object at the beginning
and then use it for creating other objects which are needed for
training, since you can use the classes attributes and changes in
hyperparameters only have to be done once:
Trainer¶
The delira.training.NetworkTrainer class provides functions to train
a single network by passing attributes from your parameter object, a
save_freq to specify how often your model should be saved
(save_freq=1 indicates every epoch, save_freq=2 every second
epoch etc.) and gpu_ids. If you don’t pass any ids at all, your
network will be trained on CPU (and probably take a lot of time). If you
specify 1 id, the network will be trained on the GPU with the
corresponding index and if you pass multiple gpu_ids your network
will be trained on multiple GPUs in parallel.
Note: The GPU indices are refering to the devices listed in
CUDA_VISIBLE_DEVICES. E.g ifCUDA_VISIBLE_DEVICESlists GPUs 3, 4, 5 then gpu_id 0 will be the index for GPU 3 etc.Note: training on multiple GPUs is not recommended for easy and small networks, since for these networks the synchronization overhead is far greater than the parallelization benefit.
Training your network might look like this:
Experiment¶
The delira.training.AbstractExperiment class needs an experiment
name, a path to save it’s results to, a parameter object, a model class
and the keyword arguments to create an instance of this class. It
provides methods to perform a single training and also a method for
running a kfold-cross validation. In order to create it, you must choose
the PyTorchExperiment, which is basically just a subclass of the
AbstractExperiment to provide a general setup for PyTorch modules.
Running an experiment could look like this:
An Experiment is the most abstract (and recommended) way to define,
train and validate your network.
Logging¶
Previous class and function definitions used pythons’s logging
library. As extensions for this library delira provides a package
(delira.logging) containing handlers to realize different logging
methods.
To use these handlers simply add them to your logger like this:
logger.addHandler(logging.StreamHandler())
Nowadays, delira mainly relies on
trixi for logging and provides
only a MultiStreamHandler and a TrixiHandler, which is a binding
to trixi’s loggers and integrates them into the python logging
module
MultiStreamHandler¶
The MultiStreamHandler accepts an arbitrary number of streams during
initialization and writes the message to all of it’s streams during
logging.
Logging with Visdom - The trixi Loggers¶
`Visdom <https://github.com/facebookresearch/visdom>`__ is a tool
designed to visualize your logs. To use this tool you need to open a
port on the machine you want to train on via
visdom -port YOUR_PORTNUMBER Afterwards just add the handler of your
choice to the logger. For more detailed information and customization
have a look at this
website.
Logging the scalar tensors containing 1, 2, 3, 4 (at the
beginning; will increase to show epochwise logging) with the
corresponding keys "one", "two", "three", "four" and two
random images with the keys "prediction" and "groundtruth" would
look like this:
Types of VisdomHandlers¶
The abilities of a handler is simply derivable by it’s name: A
VisdomImageHandler is the pure visdom logger, whereas the
VisdomImageSaveHandler combines the abilities of a
VisdomImageHandlerand a ImgSaveHandler. Together with a
StreamHandler (in-built handler) you get the
VisdomImageStreamHandler and if you also want to add the option to
save images to disk, you should use the VisdomImageSaveStreamHandler
The provided handlers are:
ImgSaveHandlerMultistreamHandlerVisdomImageHandlerVisdomImageSaveHandlerVisdomImageSaveStreamHandlerVisdomStreamHandler
More Examples¶
More Examples can be found in * the classification example * the 2d segmentation example * the 3d segmentation example * the generative adversarial example