Datasets¶
The Dataset the most basic class and implements the loading of your dataset elements. You can either load your data in a lazy way e.g. loading them just at the moment they are needed or you could preload them and cache them.
Datasets can be indexed by integers and return single samples.
To implement custom datasets you should derive the AbstractDataset
AbstractDataset¶
-
class
AbstractDataset
(data_path, load_fn, img_extensions, gt_extensions)[source]¶ Bases:
object
Base Class for Dataset
-
_make_dataset
(path)[source]¶ Create dataset
Parameters: path (str) – path to data samples Returns: data: List of sample paths if lazy; List of samples if not Return type: list
-
get_sample_from_index
(index)[source]¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
Parameters: index (int) – index corresponding to targeted sample Returns: sample corresponding to given index Return type: Any
-
get_subset
(indices)[source]¶ Returns a Subset of the current dataset based on given indices
Parameters: indices (iterable) – valid indices to extract subset from current dataset Returns: the subset Return type: BlankDataset
-
train_test_split
(*args, **kwargs)[source]¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
Parameters: - *args – positional arguments of
train_test_split
- **kwargs – keyword arguments of
train_test_split
Returns: BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
- *args – positional arguments of
-
BaseLazyDataset¶
-
class
BaseLazyDataset
(data_path, load_fn, img_extensions, gt_extensions, **load_kwargs)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
Dataset to load data in a lazy way
-
_is_valid_image_file
(fname)[source]¶ Helper Function to check wheter file is image file and has at least one label file
Parameters: fname (str) – filename of image path Returns: is valid data sample Return type: bool
-
_make_dataset
(path)[source]¶ Helper Function to make a dataset containing paths to all images in a certain directory
Parameters: path (str) – path to data samples Returns: list of sample paths Return type: list Raises: AssertionError
– if path is not a valid directory
-
get_sample_from_index
(index)¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
Parameters: index (int) – index corresponding to targeted sample Returns: sample corresponding to given index Return type: Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
Parameters: indices (iterable) – valid indices to extract subset from current dataset Returns: the subset Return type: BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
Parameters: - *args – positional arguments of
train_test_split
- **kwargs – keyword arguments of
train_test_split
Returns: BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
- *args – positional arguments of
-
BaseCacheDataset¶
-
class
BaseCacheDataset
(data_path, load_fn, img_extensions, gt_extensions, **load_kwargs)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
Dataset to preload and cache data
Notes
data needs to fit completely into RAM!
-
_is_valid_image_file
(fname)[source]¶ Helper Function to check wheter file is image file and has at least one label file
Parameters: fname (str) – filename of image path Returns: is valid data sample Return type: bool
-
_make_dataset
(path)[source]¶ Helper Function to make a dataset containing all samples in a certain directory
Parameters: path (str) – path to data samples Returns: list of sample paths Return type: list Raises: AssertionError
– if path is not a valid directory
-
get_sample_from_index
(index)¶ Returns the data sample for a given index (without any loading if it would be necessary) This implements the base case and can be subclassed for index mappings. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:ConcatDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
Parameters: index (int) – index corresponding to targeted sample Returns: sample corresponding to given index Return type: Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
Parameters: indices (iterable) – valid indices to extract subset from current dataset Returns: the subset Return type: BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
Parameters: - *args – positional arguments of
train_test_split
- **kwargs – keyword arguments of
train_test_split
Returns: BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
- *args – positional arguments of
-
ConcatDataset¶
-
class
ConcatDataset
(*datasets)[source]¶ Bases:
delira.data_loading.dataset.AbstractDataset
-
_make_dataset
(path)¶ Create dataset
Parameters: path (str) – path to data samples Returns: data: List of sample paths if lazy; List of samples if not Return type: list
-
get_sample_from_index
(index)[source]¶ Returns the data sample for a given index (without any loading if it would be necessary) This method implements the index mapping of a global index to the subindices for each dataset. The actual loading behaviour (lazy or cached) should be implemented in
__getitem__
See also
:method:AbstractDataset.get_sample_from_index :method:BaseLazyDataset.__getitem__ :method:BaseCacheDataset.__getitem__
Parameters: index (int) – index corresponding to targeted sample Returns: sample corresponding to given index Return type: Any
-
get_subset
(indices)¶ Returns a Subset of the current dataset based on given indices
Parameters: indices (iterable) – valid indices to extract subset from current dataset Returns: the subset Return type: BlankDataset
-
train_test_split
(*args, **kwargs)¶ split dataset into train and test data
Deprecated since version 0.3: method will be removed in next major release
Parameters: - *args – positional arguments of
train_test_split
- **kwargs – keyword arguments of
train_test_split
Returns: BlankDataset
– train datasetBlankDataset
– test dataset
See also
sklearn.model_selection.train_test_split
- *args – positional arguments of
-