Datasets¶
The Dataset the most basic class and implements the loading of your dataset elements. You can either load your data in a lazy way e.g. loading them just at the moment they are needed or you could preload them and cache them.
Datasets can be indexed by integers and return single samples.
To implement custom datasets you should derive the AbstractDataset
AbstractDataset¶
-
class
AbstractDataset
(data_path: str, load_fn: Callable)[source]¶ Bases:
object
Base Class for Dataset
-
get_sample_from_index
(index)[source]¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
- Parameters
index (int) – index corresponding to targeted sample
- Returns
sample corresponding to given index
- Return type
Any
-
get_subset
(indices)[source]¶ Returns a Subset of the current dataset based on given indices
- Parameters
indices (iterable) – valid indices to extract subset from current dataset
- Returns
the subset
- Return type
BlankDataset
-
train_test_split
(*args, **kwargs)[source]¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
- Parameters
*args – positional arguments of
train_test_split
**kwargs – keyword arguments of
train_test_split
- Returns
BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
-
BaseLazyDataset¶
-
class
BaseLazyDataset
(data_path: Union[str, list], load_fn: Callable, **load_kwargs)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
Dataset to load data in a lazy way
-
_make_dataset
(path: Union[str, list])[source]¶ Helper Function to make a dataset containing paths to all images in a certain directory
- Parameters
- Returns
list of sample paths
- Return type
- Raises
AssertionError – if path is not a valid directory
-
get_sample_from_index
(index)¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
- Parameters
index (int) – index corresponding to targeted sample
- Returns
sample corresponding to given index
- Return type
Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
- Parameters
indices (iterable) – valid indices to extract subset from current dataset
- Returns
the subset
- Return type
BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
- Parameters
*args – positional arguments of
train_test_split
**kwargs – keyword arguments of
train_test_split
- Returns
BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
-
BaseCacheDataset¶
-
class
BaseCacheDataset
(data_path: Union[str, list], load_fn: Callable, **load_kwargs)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
Dataset to preload and cache data
Notes
data needs to fit completely into RAM!
-
_make_dataset
(path: Union[str, list])[source]¶ Helper Function to make a dataset containing all samples in a certain directory
- Parameters
path (str or list) – if data_path is a string, _sample_fn is called for all items inside the specified directory if data_path is a list, _sample_fn is called for elements in the list
- Returns
list of items which where returned from _sample_fn (typically dict)
- Return type
- Raises
AssertionError – if path is not a list and is not a valid directory
-
get_sample_from_index
(index)¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
- Parameters
index (int) – index corresponding to targeted sample
- Returns
sample corresponding to given index
- Return type
Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
- Parameters
indices (iterable) – valid indices to extract subset from current dataset
- Returns
the subset
- Return type
BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
- Parameters
*args – positional arguments of
train_test_split
**kwargs – keyword arguments of
train_test_split
- Returns
BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
-
ConcatDataset¶
-
class
ConcatDataset
(*datasets)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
-
_make_dataset
(path: str)¶ Create dataset
-
get_sample_from_index
(index)[source]¶ Returns the data sample for a given index (without any loading if it would be necessary) This method implements the index mapping of a global index to the subindices for each dataset. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:AbstractDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
- Parameters
index (int) – index corresponding to targeted sample
- Returns
sample corresponding to given index
- Return type
Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
- Parameters
indices (iterable) – valid indices to extract subset from current dataset
- Returns
the subset
- Return type
BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
- Parameters
*args – positional arguments of
train_test_split
**kwargs – keyword arguments of
train_test_split
- Returns
BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
-