Configuration System¶
MMF relies on OmegaConf for its configuration system and adds some sugar on top of it. We have developed MMF as a config-first framework. Most of the parameters/settings in MMF are configurable. MMF defines some default configuration settings for its system including datasets and models. Users can then update these values via their own config or a command line dotlist.
TL;DR
- MMF uses OmegaConf for its configuration system with some sugar on top.
- MMF defines base defaults config containing all MMF specific parameters and then each dataset and model define their own configs (example configs: [model] [dataset]).
- The user can define its own config specified by
config=<x>
at command line for each unique experiment or training setup. This has higher priority then base, model and dataset default configs and can override anything in those. - Finally, user can override (highest priority) the final config generated by merge of all above configs by specifying config parameters as dotlist in their command. This is the recommended way of overriding the config parameters in MMF.
- How MMF knows which config to pick for dataset and model? The user needs to specify those in his command as
model=x
anddataset=y
. - Some of the MMF config parameters under
env
field can be overridden by environment variable. Have a look at them.
OmegaConf¶
For understanding and using the MMF configuration system to its full extent having a look at OmegaConf docs especially the sections on interpolation, access and configuration flags. MMF’s config currently is by default in struct mode and we plan to make it readonly in future.
Hierarchy¶
MMF follows set hierarchy rules to determine the final configuration values. Following list shows the building blocks of MMF’s configuration in an increasing order of priority (higher rank will override lower rank).
- Base Defaults Config
- Dataset’s Config (defined in dataset’s
config_path
classmethod) - Model’s Config (defined in model’s
config_path
classmethod) - User’s Config (Passed by user as
config=x
in command) - Command Line DotList (Passed by user as
x.y.z=v
dotlist in command)
Note
Configs other than base defaults can still add new nodes that are not in base defaults config, so user can add their own config parameters if they need to without changing the base defaults. If a node has same path, nodes in higher priority config will override the lower priority nodes.
Base Defaults¶
Full base defaults config can be seen below. This config is base of MMF’s configuration system and is included in all of the experiments. It sets up nodes for training related configuration and those that need to be filled by other configs which are specified by user. Main configuration parameters that base defaults define:
- training parameters
- distributed training parameters
- env parameters
- evaluation parameters
- checkpoint parameters
- run_type parameters
Dataset Config¶
Each dataset registered to MMF can define its defaults config by specifying it in classmethod config_path
(example). If processors
key whose value is a dictionary is specified, processors will be initialized by the dataset builder. If dataset builder inherits from MMFDatasetBuilder, it will look for annotations
, features
and images
field as well in the configuration. A sample config for a builder inheriting MMFDatasetBuilder would look like:
dataset_config:
dataset_registry_key:
use_images: true
use_features: true
annotations:
train:
- ...
val:
- ...
test:
- ...
images:
train:
- ...
val:
- ...
test:
- ...
features:
train:
- ...
val:
- ...
test:
- ...
processors:
text_processor:
type: x
params: ...
Configs for datasets packages with MMF are present at mmf/configs/datasets. Each dataset also provides composable configs which can be used to use some different from default but standard variation of the datasets. These can be directly included into user config by using includes directive.
User needs to specify the dataset they are using by adding dataset=<dataset_key>
option to their command.
Model Config¶
Similar to dataset config, each model registered to MMF can define its config. this is defined by model’s config_path
classmethod (example). Configs for models live at mmf/configs/models. Again, like datasets models also provide some variations which can be used by including configs for those variations in the user config.
User needs to specify the model they want to use by adding model=<model_key>
option to their command. A sample model config would look like:
model_config:
model_key:
random_module: ...
User Config¶
User can specify their configuration specific to an experiment or training setup by adding config=<config_path>
argument to their command. User config can specify for e.g. training parameters according to their experiment such as batch size using training.batch_size
. Most common use case for user config is to specify optimizer, scheduler and training parameters. Other than that user config can also include configs for variations of models and datasets they want to test on. Have a look at an example user config here.
Command Line Dot List Override¶
Updating the configuration through dot list syntax is very helpful when running multiple versions of an experiment without actually updating a config. For example, to override batch size from command line you can add training.batch_size=x
at the end of your command. Similarly, for overriding an annotation in the hateful memes dataset, you can do dataset_config.hateful_memes.annotations.train[0]=x
.
Note
Command Line Dot List overrides are our recommended way of updating config parameters instead of manually updating them in config for every other change.
Includes¶
MMF’s configuration system on top of OmegaConf allows building user configs by including composable configs provided by the datasets and models. You can include it following the syntax
includes:
- path/to/first/yaml/to/be/included.yaml
- second.yaml
The configs will override in the sequence of how they appear in the directive. Finally, the config parameters defined in the current config will override what is present in the includes. So, for e.g.
First file, a.yaml
:
# a.yaml
dataset_config:
hateful_memes:
max_features: 80
use_features: true
vqa2:
use_features: true
model_config:
mmbt:
num_classes: 4
features_dim: 2048
Second file, b.yaml
:
# b.yaml
optimizer:
type: adam
dataset_config:
hateful_memes:
max_features: 90
use_features: false
use_images: true
vqa2:
depth_first: false
And final user config, user.yaml
:
# user.yaml
includes:
- a.yaml
- b.yaml
dataset_config:
hateful_memes:
max_features: 100
vqa2:
annotations:
train: x.npy
model_config:
mmbt:
num_classes: 2
would result in final config:
dataset_config:
hateful_memes:
max_features: 100
use_features: false
use_images: true
vqa2:
use_features: true
depth_first: false
annotations:
train: x.npy
model_config:
mmbt:
num_classes: 2
features_dim: 2048
optimizer:
type: adam
Other overrides¶
We also support some useful overrides schemes at the same level of command line dot list override. For example, user can specify their overrides in form of demjson as value to argument --config_override
which will them override each part of config accordingly.
Environment Variables¶
MMF supports overriding some of the config parameters through environment variables. Have a look at them in base default config’s env
parameters.
Base Defaults Config¶
Have a look at the defaults config of MMF along with description of parameters from which you may need to override parameters for your experiments:
# Configuration version is useful in migrating older configs to new ones
config_version: 1.0
# Configuration for training
training:
# Name of the trainer class used to define the training/evalution loop
trainer: base_trainer
# Seed to be used for training. -1 means random seed between 1 and 100000.
# Either pass fixed through your config or command line arguments
# Pass null to the seed if you don't want it seeded anyhow and
# want to leave it to default
seed: -1
# Name of the experiment, will be used while saving checkpoints
# and generating reports
experiment_name: run
# Maximum number of iterations the training will run
max_updates: 22000
# Maximum epochs in case you don't want to use max_updates
# Can be mixed with max iterations, so it will stop whichever is
# completed first. Default: null means epochs won't be used
max_epochs: null
# After `log_interval` iterations, current iteration's training loss will be
# reported. This will also report validation on a single batch from validation set
# to provide an estimate on validation side
log_interval: 100
# Level of logging, only logs which are >= to current level will be logged
logger_level: info
# Log format: json, simple
log_format: simple
# Whether to log detailed final configuration parameters
log_detailed_config: false
# Whether MMF should log or not, Default: False, which means
# mmf will log by default
should_not_log: false
# Tensorboard control, by default tensorboard is disabled
tensorboard: false
# Size of each batch. If distributed or data_parallel
# is used, this will be divided equally among GPUs
batch_size: 512
# Number of workers to be used in dataloaders
num_workers: 4
# Some datasets allow fast reading by loading everything in the memory
# Use this to enable it
fast_read: false
# Use in multi-tasking, when you want to sample tasks proportional to their sizes
dataset_size_proportional_sampling: true
# Whether to pin memory in dataloader
pin_memory: false
# After `checkpoint_interval` iterations, MMF will make a snapshot
# which will involve creating a checkpoint for current training scenarios
checkpoint_interval: 1000
# This will evaluate evaluation metrics on whole validation set after
# evaluation interval
evaluation_interval: 1000
# Whether gradients should be clipped
clip_gradients: false
# Mode for clip norm
clip_norm_mode: all
early_stop:
# Whether to use early stopping, (Default: false)
enabled: false
# Patience for early stoppings
patience: 4000
# Criteria to be monitored for early stopping
# total_loss will monitor combined loss from all of the tasks
# Criteria can also be an evaluation metric in this format `dataset/metric`
# for e.g. vqa2/vqa_accuracy
criteria: total_loss
# Whether the monitored criteria should be minimized for early stopping
# or not, for e.g. you would want to minimize loss but maximize an evaluation
# metric like accuracy etc.
minimize: true
# Should a lr scheduler be used
lr_scheduler: false
# DEPRECATED: Look at scheduler_attributes or
# Use PythiaScheduler directly instead
# Steps for LR scheduler, will be an array of iteration count
# when lr should be decreased
lr_steps: []
# DEPRECATED: Look at scheduler_attributes or
# Use PythiaScheduler directly instead
# Ratio for each lr step
lr_ratio: 0.1
# NOTE: Have a look at newer scheduler available in MMF (such as AdamW) before
# using these options
# Should use warmup for lr
use_warmup: false
# Warmup factor learning rate warmup
warmup_factor: 0.2
# Iteration until which warnup should be done
warmup_iterations: 1000
# Device on which the model will be trained. Set 'cpu' to train/infer on CPU
device: cuda
# Local rank of the GPU device
local_rank: null
# If verbose dump is active, MMF will dump dataset, model specific
# information which can be useful in debugging
verbose_dump: false
# Turn on if you want to ignore unused parameters in case of DDP
find_unused_parameters: false
# By default metrics evaluation is turned off during training. Set this to true
# to enable evaluation every log_interval
evaluate_metrics: false
# Configuration for evaluation
evaluation:
# Metrics for evaluation
metrics: []
# Generate predictions in a file
predict: false
# Prediction file format (csv|json), default is json
predict_file_format: json
# Configuration for models, default configuration files for various models
# included in MMF can be found under configs directory in root folder
model_config: {}
# Configuration for datasets. Separate configuration
# for different datasets included in MMF are included in dataset folder
# which can be mixed and matched to train multiple datasets together
# An example for mixing all vqa datasets is present under vqa folder
dataset_config: {}
# Defines which datasets from the above tasks you want to train on
datasets: []
# Defines which model you want to train on
model: null
# Config file to be optionally passed by the user
config: null
# Type of run, train+inference by default means both training and inference
# (test) stage will be run, if run_type contains 'val',
# inference will be run on val set also.
run_type: train_inference
# Configuration for optimizer, examples can be found in models' configs in
# configs folder
optimizer: {}
# Configuration for scheduler, examples can be found in models' configs
scheduler: {}
# Common environment configurations for MMF
env:
# Universal cache directory for mmf
# This can be overridden by using MMF_CACHE_DIR environment variable
# or by directly setting this configuration attribute env.cache_dir
# If nothing is specified, default is set to "mmf" inside
# pytorch's cache folder
cache_dir: ${resolve_cache_dir:MMF_CACHE_DIR}
# Config path for dataset zoo, can be overridden via environment
# variable MMF_DATASET_ZOO as well.
dataset_zoo: ${env:MMF_DATASET_ZOO,configs/zoo/datasets.yaml}
model_zoo: ${env:MMF_MODEL_ZOO, configs/zoo/models.yaml}
# Similar to cache dir, but can be used if specifically want to override
# where MMF stores your data. Default would be cache_dir/data.
# We will auto download models and datasets in this folder
data_dir: ${resolve_dir:MMF_DATA_DIR, data}
# Directory for saving checkpoints and other metadata
# Use MMF_SAVE_DIR or env.save_dir to override
save_dir: ${env:MMF_SAVE_DIR, ./save}
# Directory for saving logs, default is "logs" inside the save folder
# If log_dir is specifically passed, logs will be written inside that folder
# Use MMF_LOG_DIR or env.log_dir to override
log_dir: ${env:MMF_LOG_DIR,}
# Directory for saving reports, if not passed a opts based folder will be generated
# inside save_dir/reports and reports will be saved there
# Use MMF_REPORT_DIR or env.report_dir to override
report_dir: ${env:MMF_REPORT_DIR,}
# Log directory for tensorboard, default points to same as logs
# Only used when training.tensorboard is enabled.
# Use MMF_TENSORBOARD_LOGDIR or env.tensorboard_logdir to override
tensorboard_logdir: ${env:MMF_TENSORBOARD_LOGDIR,}
# User directory where user can keep their own models independent of MMF
# This allows users to create projects which only include MMF as dependency
# Use MMF_USER_DIR or env.user_dir to specify
user_dir: ${env:MMF_USER_DIR,}
###
# Configuration for the distributed setup
distributed:
###
# Typically tcp://hostname:port that will be used to establish initial connection
init_method: null
# Rank of the current worker
rank: 0
# Port number, not required if using init_method,
port: -1
# Backend for distributed setup
backend: nccl
# Total number of GPUs across all nodes (default: all visible GPUs)
world_size: ${device_count:}
# Set if you do not want spawn multiple processes even if
# multiple GPUs are visible
no_spawn: false
# Configuration for checkpointing including resuming and loading pretrained models
checkpoint:
# If checkpoint.resume is true, MMF will try to load automatically load
# checkpoint and state from "current.ckpt" from env.save_dir
resume: false
# `checkpoint.resume_file` can be used to load a specific checkpoint from a file
# Can also be a zoo key
resume_file: null
# `checkpoint.resume_best` will load the best checkpoint according to
# training.early_stop.criteria instead of the last saved ckpt
resume_best: false
# `checkpoint.resume_pretrained` can be used in conjuction with `resume_file`
# or `resume_zoo` where you specify a checkpoint or .pth file to be loaded
# but it is mapped based on `checkpoint.pretrained_state_mapping`
# For e.g. if you want to resume from visual_bert pretrained on coco
# You would set `checkpoint.resume_zoo=visual_bert.pretrained.coco` and
# then set `checkpoint.resume_pretrained=True` which will then pick up
# only the base which you would define in the `checkpoint.pretrained_state_mapping`
resume_pretrained: false
# `checkpoint.resume_zoo` can be used to resume from a pretrained model provided
# in zoo. Value maps to key from zoo. `checkpoint.resume_file` has higher
# priority compared to `checkpoint.resume_zoo`.
resume_zoo: null
# `checkpoint.zoo_config_override` will override the current model config of trainer
# with what is provided from the zoo checkpoint and will load the model
# using .from_pretrained of the model passed
zoo_config_override: false
# `checkpoint.pretrained_state_mapping` specifies how exactly a pretrained
# model will be loaded and mapped to which keys of the target model
# Only use if the keys on the model in which pretrained model is to be loaded
# don't match with those of the pretrained model or you only want to load specific
# item from the pretrained model. `checkpoint.resume_pretrained` must be
# true to use this mapping. for e.g. you can specify
# text_embedding: text_embedding_pythia
# for loading `text_embedding` module of your model from `text_embedding_pythia`of
# pretrained file specified in `checkpoint.resume_file`.
pretrained_state_mapping: {}
# Whether to save git details or not
save_git_details: true
# `checkpoint.reset` configuration defines what exactly should be reset
# in case the file from which we are resuming is .ckpt and not .pth
reset:
# Everything will be reset except the state_dict of model
all: false
# Optimizer specifically will be reset
optimizer: false
# All counts such as best_update, current_iteration etc will be reset
counts: false