Datasets

The Dataset the most basic class and implements the loading of your dataset elements. You can either load your data in a lazy way e.g. loading them just at the moment they are needed or you could preload them and cache them.

Datasets can be indexed by integers and return single samples.

To implement custom datasets you should derive the AbstractDataset

AbstractDataset

class AbstractDataset(data_path, load_fn, img_extensions, gt_extensions)[source]

Bases: object

Base Class for Dataset

_make_dataset(path)[source]

Create dataset

Parameters:path (str) – path to data samples
Returns:data: List of sample paths if lazy; List of samples if not
Return type:list
get_sample_from_index(index)[source]

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters:index (int) – index corresponding to targeted sample
Returns:sample corresponding to given index
Return type:Any
get_subset(indices)[source]

Returns a Subset of the current dataset based on given indices

Parameters:indices (iterable) – valid indices to extract subset from current dataset
Returns:the subset
Return type:BlankDataset
train_test_split(*args, **kwargs)[source]

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters:
  • *args – positional arguments of train_test_split
  • **kwargs – keyword arguments of train_test_split
Returns:

  • BlankDataset – train dataset
  • BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

BaseLazyDataset

class BaseLazyDataset(data_path, load_fn, img_extensions, gt_extensions, **load_kwargs)[source]

Bases: delira.data_loading.dataset.AbstractDataset

Dataset to load data in a lazy way

_is_valid_image_file(fname)[source]

Helper Function to check wheter file is image file and has at least one label file

Parameters:fname (str) – filename of image path
Returns:is valid data sample
Return type:bool
_make_dataset(path)[source]

Helper Function to make a dataset containing paths to all images in a certain directory

Parameters:path (str) – path to data samples
Returns:list of sample paths
Return type:list
Raises:AssertionError – if path is not a valid directory
get_sample_from_index(index)

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters:index (int) – index corresponding to targeted sample
Returns:sample corresponding to given index
Return type:Any
get_subset(indices)

Returns a Subset of the current dataset based on given indices

Parameters:indices (iterable) – valid indices to extract subset from current dataset
Returns:the subset
Return type:BlankDataset
train_test_split(*args, **kwargs)

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters:
  • *args – positional arguments of train_test_split
  • **kwargs – keyword arguments of train_test_split
Returns:

  • BlankDataset – train dataset
  • BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

BaseCacheDataset

class BaseCacheDataset(data_path, load_fn, img_extensions, gt_extensions, **load_kwargs)[source]

Bases: delira.data_loading.dataset.AbstractDataset

Dataset to preload and cache data

Notes

data needs to fit completely into RAM!

_is_valid_image_file(fname)[source]

Helper Function to check wheter file is image file and has at least one label file

Parameters:fname (str) – filename of image path
Returns:is valid data sample
Return type:bool
_make_dataset(path)[source]

Helper Function to make a dataset containing all samples in a certain directory

Parameters:path (str) – path to data samples
Returns:list of sample paths
Return type:list
Raises:AssertionError – if path is not a valid directory
get_sample_from_index(index)

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters:index (int) – index corresponding to targeted sample
Returns:sample corresponding to given index
Return type:Any
get_subset(indices)

Returns a Subset of the current dataset based on given indices

Parameters:indices (iterable) – valid indices to extract subset from current dataset
Returns:the subset
Return type:BlankDataset
train_test_split(*args, **kwargs)

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters:
  • *args – positional arguments of train_test_split
  • **kwargs – keyword arguments of train_test_split
Returns:

  • BlankDataset – train dataset
  • BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

ConcatDataset

class ConcatDataset(*datasets)[source]

Bases: delira.data_loading.dataset.AbstractDataset

_make_dataset(path)

Create dataset

Parameters:path (str) – path to data samples
Returns:data: List of sample paths if lazy; List of samples if not
Return type:list
get_sample_from_index(index)[source]

Returns the data sample for a given index (without any loading if it would be necessary) This method implements the index mapping of a global index to the subindices for each dataset. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:AbstractDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters:index (int) – index corresponding to targeted sample
Returns:sample corresponding to given index
Return type:Any
get_subset(indices)

Returns a Subset of the current dataset based on given indices

Parameters:indices (iterable) – valid indices to extract subset from current dataset
Returns:the subset
Return type:BlankDataset
train_test_split(*args, **kwargs)

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters:
  • *args – positional arguments of train_test_split
  • **kwargs – keyword arguments of train_test_split
Returns:

  • BlankDataset – train dataset
  • BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split