datasets.base_dataset_builder¶
In MMF, for adding new datasets, dataset builder for datasets need to be
added. A new dataset builder must inherit BaseDatasetBuilder
class and
implement load
and build
functions.
build
is used to build a dataset when it is not available. For e.g.
downloading the ImDBs for a dataset. In future, we plan to add a build
to add dataset builder to ease setup of MMF.
load
is used to load a dataset from specific path. load
needs to return
an instance of subclass of mmf.datasets.base_dataset.BaseDataset
.
See complete example for VQA2DatasetBuilder
here.
Example:
from torch.utils.data import Dataset
from mmf.datasets.base_dataset_builder import BaseDatasetBuilder
from mmf.common.registry import registry
@registry.register_builder("my")
class MyBuilder(BaseDatasetBuilder):
def __init__(self):
super().__init__("my")
def load(self, config, dataset_type, *args, **kwargs):
...
return Dataset()
def build(self, config, dataset_type, *args, **kwargs):
...
-
class
mmf.datasets.base_dataset_builder.
BaseDatasetBuilder
(dataset_name)[source]¶ Base class for implementing dataset builders. See more information on top. Child class needs to implement
build
andload
.Parameters: dataset_name (str) – Name of the dataset passed from child. -
build
(config, dataset_type='train', *args, **kwargs)[source]¶ This is used to build a dataset first time. Implement this method in your child dataset builder class.
Parameters: - config (DictConfig) – Configuration of this dataset loaded from config.
- dataset_type (str) – Type of dataset, train|val|test
-
build_dataset
(config, dataset_type='train', *args, **kwargs)[source]¶ Similar to load function, used by MMF to build a dataset for first time when it is not available. This internally calls ‘build’ function. Override that function in your child class.
Parameters: - config (DictConfig) – Configuration of this dataset loaded from config.
- dataset_type (str) – Type of dataset, train|val|test
Warning
DO NOT OVERRIDE in child class. Instead override
build
.
-
load
(config, dataset_type='train', *args, **kwargs)[source]¶ This is used to prepare the dataset and load it from a path. Override this method in your child dataset builder class.
Parameters: - config (DictConfig) – Configuration of this dataset loaded from config.
- dataset_type (str) – Type of dataset, train|val|test
Returns: Dataset containing data to be trained on
Return type: dataset (BaseDataset)
-
load_dataset
(config, dataset_type='train', *args, **kwargs)[source]¶ Main load function use by MMF. This will internally call
load
function. Callsinit_processors
andtry_fast_read
on the dataset returned fromload
Parameters: - config (DictConfig) – Configuration of this dataset loaded from config.
- dataset_type (str) – Type of dataset, train|val|test
Returns: Dataset containing data to be trained on
Return type: dataset (BaseDataset)
Warning
DO NOT OVERRIDE in child class. Instead override
load
.
-