datasets.processors¶

class mmf.datasets.processors.BaseProcessor(config, *args, **kwargs)[source]¶

Every processor in MMF needs to inherit this class for compatibility with MMF. End user mainly needs to implement __call__ function.

Parameters:	config (DictConfig) – Config for this processor, containing type and params attributes if available.

class mmf.datasets.processors.Processor(config, *args, **kwargs)[source]¶

Wrapper class used by MMF to initialized processor based on their type as passed in configuration. It retrieves the processor class registered in registry corresponding to the type key and initializes with params passed in configuration. All functions and attributes of the processor initialized are directly available via this class.

Parameters:	config (DictConfig) – DictConfig containing `type` of the processor to be initialized and `params` of that procesor.

class mmf.datasets.processors.VocabProcessor(config, *args, **kwargs)[source]¶

Use VocabProcessor when you have vocab file and you want to process words to indices. Expects UNK token as “<unk>” and pads sentences using “<pad>” token. Config parameters can have preprocessor property which is used to preprocess the item passed and max_length property which points to maximum length of the sentence/tokens which can be convert to indices. If the length is smaller, the sentence will be padded. Parameters for “vocab” are necessary to be passed.

Key: vocab

Example Config:

task_attributes:
    vqa:
        vqa2:
            processors:
              text_processor:
                type: vocab
                params:
                  max_length: 14
                  vocab:
                    type: intersected
                    embedding_name: glove.6B.300d
                    vocab_file: vocabs/vocabulary_100k.txt

Parameters:	config (DictConfig) – node containing configuration parameters of the processor

vocab¶

Vocab class object which is abstraction over the vocab file passed.

Type:	Vocab

get_pad_index()[source]¶

Get index of padding <pad> token in vocabulary.

Returns:	index of the padding token.
Return type:	int

get_vocab_size()[source]¶

Get size of the vocabulary.

Returns:	size of the vocabulary.
Return type:	int

class mmf.datasets.processors.GloVeProcessor(config, *args, **kwargs)[source]¶

Inherits VocabProcessor, and returns GloVe vectors for each of the words. Maps them to index using vocab processor, and then gets GloVe vectors corresponding to those indices.

Parameters:	config (DictConfig) – Configuration parameters for GloVe same as `VocabProcessor()`.

class mmf.datasets.processors.FastTextProcessor(config, *args, **kwargs)[source]¶

FastText processor, similar to GloVe processor but returns FastText vectors.

Parameters:	config (DictConfig) – Configuration values for the processor.

class mmf.datasets.processors.VQAAnswerProcessor(config, *args, **kwargs)[source]¶

Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.

Parameters:	config (DictConfig) – Configuration for the processor

answer_vocab¶

Class representing answer vocabulary

Type:	VocabDict

compute_answers_scores(answers_indices)[source]¶

Generate VQA based answer scores for answers_indices.

Parameters:	answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns:	tensor containing scores.
Return type:	torch.FloatTensor

get_true_vocab_size()[source]¶

True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.

Returns:	True vocab size.
Return type:	int

get_vocab_size()[source]¶

Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.

Returns:	size of the answer vocabulary
Return type:	int

idx2word(idx)[source]¶

Index to word according to the vocabulary.

Parameters:	idx (int) – Index to be converted to the word.
Returns:	Word corresponding to the index.
Return type:	str

word2idx(word)[source]¶

Convert a word to its index according to vocabulary

Parameters:	word (str) – Word to be converted to index.
Returns:	Index of the word.
Return type:	int

class mmf.datasets.processors.MultiHotAnswerFromVocabProcessor(config, *args, **kwargs)[source]¶

compute_answers_scores(answers_indices)[source]¶

Generate VQA based answer scores for answers_indices.

Parameters:	answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns:	tensor containing scores.
Return type:	torch.FloatTensor

class mmf.datasets.processors.SoftCopyAnswerProcessor(config, *args, **kwargs)[source]¶

Similar to Answer Processor but adds soft copy dynamic answer space to it. Read https://arxiv.org/abs/1904.08920 for extra information on soft copy and LoRRA.

Parameters:	config (DictConfig) – Configuration for soft copy processor.

get_true_vocab_size()[source]¶

Actual vocab size which only include size of the vocabulary file.

Returns:	Actual size of vocabs.
Return type:	int

get_vocab_size()[source]¶

Size of Vocab + Size of Dynamic soft-copy based answer space

Returns:	Size of vocab + size of dynamic soft-copy answer space.
Return type:	int

class mmf.datasets.processors.SimpleWordProcessor(*args, **kwargs)[source]¶

Tokenizes a word and processes it.

tokenizer¶

Type of tokenizer to be used.

Type:	function

class mmf.datasets.processors.SimpleSentenceProcessor(*args, **kwargs)[source]¶

Tokenizes a sentence and processes it.

tokenizer¶

Type of tokenizer to be used.

Type:	function

class mmf.datasets.processors.BBoxProcessor(config, *args, **kwargs)[source]¶

Generates bboxes in proper format. Takes in a dict which contains “info” key which is a list of dicts containing following for each of the the bounding box

Example bbox input:

{
    "info": [
        {
            "bounding_box": {
                "top_left_x": 100,
                "top_left_y": 100,
                "width": 200,
                "height": 300
            }
        },
        ...
    ]
}

This will further return a Sample in a dict with key “bbox” with last dimension of 4 corresponding to “xyxy”. So sample will look like following:

Example Sample:

Sample({
    "coordinates": torch.Size(n, 4),
    "width": List[number], # size n
    "height": List[number], # size n
    "bbox_types": List[str] # size n, either xyxy or xywh.
    # currently only supports xyxy.
})

class mmf.datasets.processors.CaptionProcessor(config, *args, **kwargs)[source]¶

Processes a caption with start, end and pad tokens and returns raw string.

Parameters:	config (DictConfig) – Configuration for caption processor.

class mmf.datasets.processors.MaskedTokenProcessor(config, *args, **kwargs)[source]¶

_truncate_seq_pair(tokens_a, tokens_b, max_length)[source]¶: Truncates a sequence pair in place to the maximum length.

class mmf.datasets.processors.TorchvisionTransforms(config, *args, **kwargs)[source]¶