Datasets¶

The Dataset the most basic class and implements the loading of your dataset elements. You can either load your data in a lazy way e.g. loading them just at the moment they are needed or you could preload them and cache them.

Datasets can be indexed by integers and return single samples.

To implement custom datasets you should derive the AbstractDataset

AbstractDataset¶

class AbstractDataset(data_path: str, load_fn: Callable)[source]¶

Bases: object

Base Class for Dataset

_make_dataset(path: str)[source]¶

Create dataset

Parameters: path (str) – path to data samples
Returns: data: List of sample paths if lazy; List of samples if not
Return type: list

get_sample_from_index(index)[source]¶

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters: index (int) – index corresponding to targeted sample
Returns: sample corresponding to given index
Return type: Any

get_subset(indices)[source]¶

Returns a Subset of the current dataset based on given indices

Parameters: indices (iterable) – valid indices to extract subset from current dataset
Returns: the subset
Return type: BlankDataset

train_test_split(*args, **kwargs)[source]¶

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters

*args – positional arguments of train_test_split
**kwargs – keyword arguments of train_test_split

Returns

BlankDataset – train dataset
BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

BaseLazyDataset¶

class BaseLazyDataset(data_path: Union[str, list], load_fn: Callable, **load_kwargs)[source]¶

Bases: delira.data_loading.dataset.AbstractDataset

Dataset to load data in a lazy way

_make_dataset(path: Union[str, list])[source]¶

Helper Function to make a dataset containing paths to all images in a certain directory

Parameters: path (str or list) – path to data samples
Returns: list of sample paths
Return type: list
Raises: AssertionError – if path is not a valid directory

get_sample_from_index(index)¶

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters: index (int) – index corresponding to targeted sample
Returns: sample corresponding to given index
Return type: Any

get_subset(indices)¶

Returns a Subset of the current dataset based on given indices

Parameters: indices (iterable) – valid indices to extract subset from current dataset
Returns: the subset
Return type: BlankDataset

train_test_split(*args, **kwargs)¶

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters

*args – positional arguments of train_test_split
**kwargs – keyword arguments of train_test_split

Returns

BlankDataset – train dataset
BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

BaseCacheDataset¶

class BaseCacheDataset(data_path: Union[str, list], load_fn: Callable, **load_kwargs)[source]¶

Bases: delira.data_loading.dataset.AbstractDataset

Dataset to preload and cache data

Notes

data needs to fit completely into RAM!

_make_dataset(path: Union[str, list])[source]¶

Helper Function to make a dataset containing all samples in a certain directory

Parameters: path (str or list) – if data_path is a string, _sample_fn is called for all items inside the specified directory if data_path is a list, _sample_fn is called for elements in the list
Returns: list of items which where returned from _sample_fn (typically dict)
Return type: list
Raises: AssertionError – if path is not a list and is not a valid directory

get_sample_from_index(index)¶

Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters: index (int) – index corresponding to targeted sample
Returns: sample corresponding to given index
Return type: Any

get_subset(indices)¶

Returns a Subset of the current dataset based on given indices

Parameters: indices (iterable) – valid indices to extract subset from current dataset
Returns: the subset
Return type: BlankDataset

train_test_split(*args, **kwargs)¶

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters

*args – positional arguments of train_test_split
**kwargs – keyword arguments of train_test_split

Returns

BlankDataset – train dataset
BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split

ConcatDataset¶

class ConcatDataset(*datasets)[source]¶

Bases: delira.data_loading.dataset.AbstractDataset

_make_dataset(path: str)¶

Create dataset

Parameters: path (str) – path to data samples
Returns: data: List of sample paths if lazy; List of samples if not
Return type: list

get_sample_from_index(index)[source]¶

Returns the data sample for a given index (without any loading if it would be necessary) This method implements the index mapping of a global index to the subindices for each dataset. The actual loading behaviour (lazy or cached) should be implemented in __getitem__

See also

:method:AbstractDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__

Parameters: index (int) – index corresponding to targeted sample
Returns: sample corresponding to given index
Return type: Any

get_subset(indices)¶

Returns a Subset of the current dataset based on given indices

Parameters: indices (iterable) – valid indices to extract subset from current dataset
Returns: the subset
Return type: BlankDataset

train_test_split(*args, **kwargs)¶

split dataset into train and test data

Deprecated since version 0.3: method will be removed in next major release

Parameters

*args – positional arguments of train_test_split
**kwargs – keyword arguments of train_test_split

Returns

BlankDataset – train dataset
BlankDataset – test dataset

See also

sklearn.model_selection.train_test_split