Shortcuts

datasets.base_dataset_builder

In MMF, for adding new datasets, dataset builder for datasets need to be added. A new dataset builder must inherit BaseDatasetBuilder class and implement load and build functions.

build is used to build a dataset when it is not available. For e.g. downloading the ImDBs for a dataset. In future, we plan to add a build to add dataset builder to ease setup of MMF.

load is used to load a dataset from specific path. load needs to return an instance of subclass of mmf.datasets.base_dataset.BaseDataset.

See complete example for VQA2DatasetBuilder here.

Example:

from torch.utils.data import Dataset

from mmf.datasets.base_dataset_builder import BaseDatasetBuilder
from mmf.common.registry import registry

@registry.register_builder("my")
class MyBuilder(BaseDatasetBuilder):
    def __init__(self):
        super().__init__("my")

    def load(self, config, dataset_type, *args, **kwargs):
        ...
        return Dataset()

    def build(self, config, dataset_type, *args, **kwargs):
        ...
class mmf.datasets.base_dataset_builder.BaseDatasetBuilder(dataset_name)[source]

Base class for implementing dataset builders. See more information on top. Child class needs to implement build and load.

Parameters:dataset_name (str) – Name of the dataset passed from child.
build(config, dataset_type='train', *args, **kwargs)[source]

This is used to build a dataset first time. Implement this method in your child dataset builder class.

Parameters:
  • config (DictConfig) – Configuration of this dataset loaded from config.
  • dataset_type (str) – Type of dataset, train|val|test
build_dataset(config, dataset_type='train', *args, **kwargs)[source]

Similar to load function, used by MMF to build a dataset for first time when it is not available. This internally calls ‘build’ function. Override that function in your child class.

Parameters:
  • config (DictConfig) – Configuration of this dataset loaded from config.
  • dataset_type (str) – Type of dataset, train|val|test

Warning

DO NOT OVERRIDE in child class. Instead override build.

load(config, dataset_type='train', *args, **kwargs)[source]

This is used to prepare the dataset and load it from a path. Override this method in your child dataset builder class.

Parameters:
  • config (DictConfig) – Configuration of this dataset loaded from config.
  • dataset_type (str) – Type of dataset, train|val|test
Returns:

Dataset containing data to be trained on

Return type:

dataset (BaseDataset)

load_dataset(config, dataset_type='train', *args, **kwargs)[source]

Main load function use by MMF. This will internally call load function. Calls init_processors and try_fast_read on the dataset returned from load

Parameters:
  • config (DictConfig) – Configuration of this dataset loaded from config.
  • dataset_type (str) – Type of dataset, train|val|test
Returns:

Dataset containing data to be trained on

Return type:

dataset (BaseDataset)

Warning

DO NOT OVERRIDE in child class. Instead override load.

Read the Docs v: website
Versions
latest
stable
website
configuration_docs
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.