Configuration System¶

MMF relies on OmegaConf for its configuration system and adds some sugar on top of it. We have developed MMF as a config-first framework. Most of the parameters/settings in MMF are configurable. MMF defines some default configuration settings for its system including datasets and models. Users can then update these values via their own config or a command line dotlist.

TL;DR

MMF uses OmegaConf for its configuration system with some sugar on top.
MMF defines base defaults config containing all MMF specific parameters and then each dataset and model define their own configs (example configs: [model] [dataset]).
The user can define its own config specified by config=<x> at command line for each unique experiment or training setup. This has higher priority then base, model and dataset default configs and can override anything in those.
Finally, user can override (highest priority) the final config generated by merge of all above configs by specifying config parameters as dotlist in their command. This is the recommended way of overriding the config parameters in MMF.
How MMF knows which config to pick for dataset and model? The user needs to specify those in his command as model=x and dataset=y.
Some of the MMF config parameters under env field can be overridden by environment variable. Have a look at them.

OmegaConf¶

For understanding and using the MMF configuration system to its full extent having a look at OmegaConf docs especially the sections on interpolation, access and configuration flags. MMF’s config currently is by default in struct mode and we plan to make it readonly in future.

Hierarchy¶

MMF follows set hierarchy rules to determine the final configuration values. Following list shows the building blocks of MMF’s configuration in an increasing order of priority (higher rank will override lower rank).

Base Defaults Config
Dataset’s Config (defined in dataset’s config_path classmethod)
Model’s Config (defined in model’s config_path classmethod)
User’s Config (Passed by user as config=x in command)
Command Line DotList (Passed by user as x.y.z=v dotlist in command)

Note

Configs other than base defaults can still add new nodes that are not in base defaults config, so user can add their own config parameters if they need to without changing the base defaults. If a node has same path, nodes in higher priority config will override the lower priority nodes.

Base Defaults¶

Full base defaults config can be seen below. This config is base of MMF’s configuration system and is included in all of the experiments. It sets up nodes for training related configuration and those that need to be filled by other configs which are specified by user. Main configuration parameters that base defaults define:

training parameters
distributed training parameters
env parameters
evaluation parameters
checkpoint parameters
run_type parameters

Dataset Config¶

Each dataset registered to MMF can define its defaults config by specifying it in classmethod config_path (example). If processors key whose value is a dictionary is specified, processors will be initialized by the dataset builder. If dataset builder inherits from MMFDatasetBuilder, it will look for annotations, features and images field as well in the configuration. A sample config for a builder inheriting MMFDatasetBuilder would look like:

dataset_config:
    dataset_registry_key:
        use_images: true
        use_features: true
        annotations:
            train:
            - ...
            val:
            - ...
            test:
            - ...
        images:
            train:
            - ...
            val:
            - ...
            test:
            - ...
        features:
            train:
            - ...
            val:
            - ...
            test:
            - ...
        processors:
            text_processor:
                type: x
                params: ...

Configs for datasets packages with MMF are present at mmf/configs/datasets. Each dataset also provides composable configs which can be used to use some different from default but standard variation of the datasets. These can be directly included into user config by using includes directive.

User needs to specify the dataset they are using by adding dataset=<dataset_key> option to their command.

Model Config¶

Similar to dataset config, each model registered to MMF can define its config. this is defined by model’s config_path classmethod (example). Configs for models live at mmf/configs/models. Again, like datasets models also provide some variations which can be used by including configs for those variations in the user config.

User needs to specify the model they want to use by adding model=<model_key> option to their command. A sample model config would look like:

model_config:
    model_key:
        random_module: ...

User Config¶

User can specify their configuration specific to an experiment or training setup by adding config=<config_path> argument to their command. User config can specify for e.g. training parameters according to their experiment such as batch size using training.batch_size. Most common use case for user config is to specify optimizer, scheduler and training parameters. Other than that user config can also include configs for variations of models and datasets they want to test on. Have a look at an example user config here.

Command Line Dot List Override¶

Updating the configuration through dot list syntax is very helpful when running multiple versions of an experiment without actually updating a config. For example, to override batch size from command line you can add training.batch_size=x at the end of your command. Similarly, for overriding an annotation in the hateful memes dataset, you can do dataset_config.hateful_memes.annotations.train[0]=x.

Note

Command Line Dot List overrides are our recommended way of updating config parameters instead of manually updating them in config for every other change.

Includes¶

MMF’s configuration system on top of OmegaConf allows building user configs by including composable configs provided by the datasets and models. You can include it following the syntax

includes:
- path/to/first/yaml/to/be/included.yaml
- second.yaml

The configs will override in the sequence of how they appear in the directive. Finally, the config parameters defined in the current config will override what is present in the includes. So, for e.g.

First file, a.yaml:

# a.yaml
dataset_config:
  hateful_memes:
    max_features: 80
    use_features: true
  vqa2:
    use_features: true

model_config:
  mmbt:
    num_classes: 4
    features_dim: 2048

Second file, b.yaml:

# b.yaml
optimizer:
  type: adam

dataset_config:
  hateful_memes:
    max_features: 90
    use_features: false
    use_images: true
  vqa2:
    depth_first: false

And final user config, user.yaml:

# user.yaml
includes:
- a.yaml
- b.yaml

dataset_config:
  hateful_memes:
    max_features: 100
  vqa2:
    annotations:
      train: x.npy

model_config:
  mmbt:
    num_classes: 2

would result in final config:

dataset_config:
  hateful_memes:
    max_features: 100
    use_features: false
    use_images: true
  vqa2:
    use_features: true
    depth_first: false
    annotations:
      train: x.npy

model_config:
  mmbt:
    num_classes: 2
    features_dim: 2048

optimizer:
  type: adam

Other overrides¶

We also support some useful overrides schemes at the same level of command line dot list override. For example, user can specify their overrides in form of demjson as value to argument --config_override which will them override each part of config accordingly.

Environment Variables¶

MMF supports overriding some of the config parameters through environment variables. Have a look at them in base default config’s env parameters.

Base Defaults Config¶

Have a look at the defaults config of MMF along with description of parameters from which you may need to override parameters for your experiments:

# Configuration version is useful in migrating older configs to new ones
config_version: 1.0

# Configuration for training
training:
    # Name of the trainer class used to define the training/evalution loop
    trainer: base_trainer
    # Seed to be used for training. -1 means random seed between 1 and 100000.
    # Either pass fixed through your config or command line arguments
    # Pass null to the seed if you don't want it seeded anyhow and
    # want to leave it to default
    seed: -1
    # Name of the experiment, will be used while saving checkpoints
    # and generating reports
    experiment_name: run
    # Maximum number of iterations the training will run
    max_updates: 22000
    # Maximum epochs in case you don't want to use max_updates
    # Can be mixed with max iterations, so it will stop whichever is
    # completed first. Default: null means epochs won't be used
    max_epochs: null

    # After `log_interval` iterations, current iteration's training loss will be
    # reported. This will also report validation on a single batch from validation set
    # to provide an estimate on validation side
    log_interval: 100
    # Level of logging, only logs which are >= to current level will be logged
    logger_level: info
    # Log format: json, simple
    log_format: simple
    # Whether to log detailed final configuration parameters
    log_detailed_config: false
    # Whether MMF should log or not, Default: False, which means
    # mmf will log by default
    should_not_log: false

    # Tensorboard control, by default tensorboard is disabled
    tensorboard: false

    # Size of each batch. If distributed or data_parallel
    # is used, this will be divided equally among GPUs
    batch_size: 512
    # Number of workers to be used in dataloaders
    num_workers: 4
    # Some datasets allow fast reading by loading everything in the memory
    # Use this to enable it
    fast_read: false
    # Use in multi-tasking, when you want to sample tasks proportional to their sizes
    dataset_size_proportional_sampling: true
    # Whether to pin memory in dataloader
    pin_memory: false

    # After `checkpoint_interval` iterations, MMF will make a snapshot
    # which will involve creating a checkpoint for current training scenarios
    checkpoint_interval: 1000
    # This will evaluate evaluation metrics on whole validation set after
    # evaluation interval
    evaluation_interval: 1000
    # Whether gradients should be clipped
    clip_gradients: false
    # Mode for clip norm
    clip_norm_mode: all

    early_stop:
        # Whether to use early stopping, (Default: false)
        enabled: false
        # Patience for early stoppings
        patience: 4000
        # Criteria to be monitored for early stopping
        # total_loss will monitor combined loss from all of the tasks
        # Criteria can also be an evaluation metric in this format `dataset/metric`
        # for e.g. vqa2/vqa_accuracy
        criteria: total_loss
        # Whether the monitored criteria should be minimized for early stopping
        # or not, for e.g. you would want to minimize loss but maximize an evaluation
        # metric like accuracy etc.
        minimize: true

    # Should a lr scheduler be used
    lr_scheduler: false

    # DEPRECATED: Look at scheduler_attributes or
    # Use PythiaScheduler directly instead
    # Steps for LR scheduler, will be an array of iteration count
    # when lr should be decreased
    lr_steps: []
    # DEPRECATED: Look at scheduler_attributes or
    # Use PythiaScheduler directly instead
    # Ratio for each lr step
    lr_ratio: 0.1

    # NOTE: Have a look at newer scheduler available in MMF (such as AdamW) before
    # using these options
    # Should use warmup for lr
    use_warmup: false
    # Warmup factor learning rate warmup
    warmup_factor: 0.2
    # Iteration until which warnup should be done
    warmup_iterations: 1000

    # Device on which the model will be trained. Set 'cpu' to train/infer on CPU
    device: cuda
    # Local rank of the GPU device
    local_rank: null

    # If verbose dump is active, MMF will dump dataset, model specific
    # information which can be useful in debugging
    verbose_dump: false

    # Turn on if you want to ignore unused parameters in case of DDP
    find_unused_parameters: false

    # By default metrics evaluation is turned off during training. Set this to true
    # to enable evaluation every log_interval
    evaluate_metrics: false

# Configuration for evaluation
evaluation:
    # Metrics for evaluation
    metrics: []
    # Generate predictions in a file
    predict: false
    # Prediction file format (csv|json), default is json
    predict_file_format: json

# Configuration for models, default configuration files for various models
# included in MMF can be found under configs directory in root folder
model_config: {}

# Configuration for datasets. Separate configuration
# for different datasets included in MMF are included in dataset folder
# which can be mixed and matched to train multiple datasets together
# An example for mixing all vqa datasets is present under vqa folder
dataset_config: {}

# Defines which datasets from the above tasks you want to train on
datasets: []

# Defines which model you want to train on
model: null

# Config file to be optionally passed by the user
config: null

# Type of run, train+inference by default means both training and inference
# (test) stage will be run, if run_type contains 'val',
# inference will be run on val set also.
run_type: train_inference

# Configuration for optimizer, examples can be found in models' configs in
# configs folder
optimizer: {}

# Configuration for scheduler, examples can be found in models' configs
scheduler: {}

# Common environment configurations for MMF
env:
    # Universal cache directory for mmf
    # This can be overridden by using MMF_CACHE_DIR environment variable
    # or by directly setting this configuration attribute env.cache_dir
    # If nothing is specified, default is set to "mmf" inside
    # pytorch's cache folder
    cache_dir: ${resolve_cache_dir:MMF_CACHE_DIR}

    # Config path for dataset zoo, can be overridden via environment
    # variable MMF_DATASET_ZOO as well.
    dataset_zoo: ${env:MMF_DATASET_ZOO,configs/zoo/datasets.yaml}
    model_zoo: ${env:MMF_MODEL_ZOO, configs/zoo/models.yaml}

    # Similar to cache dir, but can be used if specifically want to override
    # where MMF stores your data. Default would be cache_dir/data.
    # We will auto download models and datasets in this folder
    data_dir: ${resolve_dir:MMF_DATA_DIR, data}

    # Directory for saving checkpoints and other metadata
    # Use MMF_SAVE_DIR or env.save_dir to override
    save_dir: ${env:MMF_SAVE_DIR, ./save}

    # Directory for saving logs, default is "logs" inside the save folder
    # If log_dir is specifically passed, logs will be written inside that folder
    # Use MMF_LOG_DIR or env.log_dir to override
    log_dir: ${env:MMF_LOG_DIR,}

    # Directory for saving reports, if not passed a opts based folder will be generated
    # inside save_dir/reports and reports will be saved there
    # Use MMF_REPORT_DIR or env.report_dir to override
    report_dir: ${env:MMF_REPORT_DIR,}

    # Log directory for tensorboard, default points to same as logs
    # Only used when training.tensorboard is enabled.
    # Use MMF_TENSORBOARD_LOGDIR or env.tensorboard_logdir to override
    tensorboard_logdir: ${env:MMF_TENSORBOARD_LOGDIR,}

    # User directory where user can keep their own models independent of MMF
    # This allows users to create projects which only include MMF as dependency
    # Use MMF_USER_DIR or env.user_dir to specify
    user_dir: ${env:MMF_USER_DIR,}

###
# Configuration for the distributed setup
distributed:
    ###
    # Typically tcp://hostname:port that will be used to establish initial connection
    init_method: null
    # Rank of the current worker
    rank: 0
    # Port number, not required if using init_method,
    port: -1
    # Backend for distributed setup
    backend: nccl
    # Total number of GPUs across all nodes (default: all visible GPUs)
    world_size: ${device_count:}
    # Set if you do not want spawn multiple processes even if
    # multiple GPUs are visible
    no_spawn: false

# Configuration for checkpointing including resuming and loading pretrained models
checkpoint:
    # If checkpoint.resume is true, MMF will try to load automatically load
    # checkpoint and state from "current.ckpt" from env.save_dir
    resume: false
    # `checkpoint.resume_file` can be used to load a specific checkpoint from a file
    # Can also be a zoo key
    resume_file: null
    # `checkpoint.resume_best` will load the best checkpoint according to
    # training.early_stop.criteria instead of the last saved ckpt
    resume_best: false
    # `checkpoint.resume_pretrained` can be used in conjuction with `resume_file`
    # or `resume_zoo` where you specify a checkpoint or .pth file to be loaded
    # but it is mapped based on `checkpoint.pretrained_state_mapping`
    # For e.g. if you want to resume from visual_bert pretrained on coco
    # You would set `checkpoint.resume_zoo=visual_bert.pretrained.coco` and
    # then set `checkpoint.resume_pretrained=True` which will then pick up
    # only the base which you would define in the `checkpoint.pretrained_state_mapping`
    resume_pretrained: false
    # `checkpoint.resume_zoo` can be used to resume from a pretrained model provided
    # in zoo. Value maps to key from zoo. `checkpoint.resume_file` has higher
    # priority compared to `checkpoint.resume_zoo`.
    resume_zoo: null
    # `checkpoint.zoo_config_override` will override the current model config of trainer
    # with what is provided from the zoo checkpoint and will load the model
    # using .from_pretrained of the model passed
    zoo_config_override: false
    # `checkpoint.pretrained_state_mapping` specifies how exactly a pretrained
    # model will be loaded and mapped to which keys of the target model
    # Only use if the keys on the model in which pretrained model is to be loaded
    # don't match with those of the pretrained model or you only want to load specific
    # item from the pretrained model. `checkpoint.resume_pretrained` must be
    # true to use this mapping. for e.g. you can specify
    # text_embedding: text_embedding_pythia
    # for loading `text_embedding` module of your model from `text_embedding_pythia`of
    # pretrained file specified in `checkpoint.resume_file`.
    pretrained_state_mapping: {}

    # Whether to save git details or not
    save_git_details: true

    # `checkpoint.reset` configuration defines what exactly should be reset
    # in case the file from which we are resuming is .ckpt and not .pth
    reset:
        # Everything will be reset except the state_dict of model
        all: false
        # Optimizer specifically will be reset
        optimizer: false
        # All counts such as best_update, current_iteration etc will be reset
        counts: false