modules.metrics¶
The metrics module contains implementations of various metrics used commonly to understand how well our models are performing. For e.g. accuracy, vqa_accuracy, r@1 etc.
For implementing your own metric, you need to follow these steps:
- Create your own metric class and inherit
BaseMetric
class. - In the
__init__
function of your class, make sure to callsuper().__init__('name')
where ‘name’ is the name of your metric. If you require any parameters in your__init__
function, you can use keyword arguments to represent them and metric constructor will take care of providing them to your class from config. - Implement a
calculate
function which takes inSampleList
and model_output as input and return back a float tensor/number. - Register your metric with a key ‘name’ by using decorator,
@registry.register_metric('name')
.
Example:
import torch
from mmf.common.registry import registry
from mmf.modules.metrics import BaseMetric
@registry.register_metric("some")
class SomeMetric(BaseMetric):
def __init__(self, some_param=None):
super().__init__("some")
....
def calculate(self, sample_list, model_output):
metric = torch.tensor(2, dtype=torch.float)
return metric
Example config for above metric:
model_config:
pythia:
metrics:
- type: some
params:
some_param: a
-
class
mmf.modules.metrics.
Accuracy
[source]¶ Metric for calculating accuracy.
Key:
accuracy
-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate accuracy and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: accuracy.
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
AveragePrecision
(*args, **kwargs)[source]¶ Metric for calculating Average Precision. See more details at sklearn.metrics.average_precision_score # noqa
Key:
ap
-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate AP and returns it back. The function performs softmax on the logits provided and then calculated the AP.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
- model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.
Returns: AP.
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
BaseMetric
(name, *args, **kwargs)[source]¶ Base class to be inherited by all metrics registered to MMF. See the description on top of the file for more information. Child class must implement
calculate
function.Parameters: name (str) – Name of the metric. -
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Abstract method to be implemented by the child class. Takes in a
SampleList
and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.Parameters: - sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
- model_output (Dict) – Output dict from the model for the current SampleList
Returns: Value of the metric.
Return type: torch.Tensor|float
-
-
class
mmf.modules.metrics.
BinaryF1
(*args, **kwargs)[source]¶ Metric for calculating Binary F1.
Key:
binary_f1
-
class
mmf.modules.metrics.
CaptionBleu4Metric
[source]¶ Metric for calculating caption accuracy using BLEU4 Score.
Key:
caption_bleu4
-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate accuracy and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: bleu4 score.
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
F1
(*args, **kwargs)[source]¶ Metric for calculating F1. Can be used with type and params argument for customization. params will be directly passed to sklearn f1 function. Key:
f1
-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate f1 and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: f1.
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
MacroAP
(*args, **kwargs)[source]¶ Metric for calculating Macro Average Precision.
Key:
macro_ap
-
class
mmf.modules.metrics.
MacroF1
(*args, **kwargs)[source]¶ Metric for calculating Macro F1.
Key:
macro_f1
-
class
mmf.modules.metrics.
MacroROC_AUC
(*args, **kwargs)[source]¶ Metric for calculating Macro ROC_AUC.
Key:
macro_roc_auc
-
class
mmf.modules.metrics.
MeanRank
[source]¶ Calculate MeanRank which specifies what was the average rank of the chosen candidate.
Key:
mean_r
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate Mean Rank and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: mean rank
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
MeanReciprocalRank
[source]¶ Calculate reciprocal of mean rank..
Key:
mean_rr
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate Mean Reciprocal Rank and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: Mean Reciprocal Rank
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
Metrics
(metric_list)[source]¶ Internally used by MMF, Metrics acts as wrapper for handling calculation of metrics over various metrics specified by the model in the config. It initializes all of the metrics and when called it runs calculate on each of them one by one and returns back a dict with proper naming back. For e.g. an example dict returned by Metrics class:
{'val/vqa_accuracy': 0.3, 'val/r@1': 0.8}
Parameters: metric_list (ListConfig) – List of DictConfigs where each DictConfig specifies name and parameters of the metrics used.
-
class
mmf.modules.metrics.
MicroAP
(*args, **kwargs)[source]¶ Metric for calculating Micro Average Precision.
Key:
micro_ap
-
class
mmf.modules.metrics.
MicroF1
(*args, **kwargs)[source]¶ Metric for calculating Micro F1.
Key:
micro_f1
-
class
mmf.modules.metrics.
MicroROC_AUC
(*args, **kwargs)[source]¶ Metric for calculating Micro ROC_AUC.
Key:
micro_roc_auc
-
class
mmf.modules.metrics.
MultiLabelF1
(*args, **kwargs)[source]¶ Metric for calculating Multilabel F1.
Key:
multilabel_f1
-
class
mmf.modules.metrics.
MultiLabelMacroF1
(*args, **kwargs)[source]¶ Metric for calculating Multilabel Macro F1.
Key:
multilabel_macro_f1
-
class
mmf.modules.metrics.
MultiLabelMicroF1
(*args, **kwargs)[source]¶ Metric for calculating Multilabel Micro F1.
Key:
multilabel_micro_f1
-
class
mmf.modules.metrics.
ROC_AUC
(*args, **kwargs)[source]¶ Metric for calculating ROC_AUC. See more details at sklearn.metrics.roc_auc_score # noqa
Note: ROC_AUC is not defined when expected tensor only contains one label. Make sure you have both labels always or use it on full val only
Key:
roc_auc
-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate ROC_AUC and returns it back. The function performs softmax on the logits provided and then calculated the ROC_AUC.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
- model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.
Returns: ROC_AUC.
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
RecallAt1
[source]¶ Calculate Recall@1 which specifies how many time the chosen candidate was rank 1.
Key:
r@1
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate Recall@1 and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: Recall@1
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
RecallAt10
[source]¶ Calculate Recall@10 which specifies how many time the chosen candidate was among first 10 ranks.
Key:
r@10
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate Recall@10 and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
RecallAt5
[source]¶ Calculate Recall@5 which specifies how many time the chosen candidate was among first 5 rank.
Key:
r@5
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate Recall@5 and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: Recall@5
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
RecallAtK
(name='recall@k')[source]¶ -
calculate
(sample_list, model_output, k, *args, **kwargs)[source]¶ Abstract method to be implemented by the child class. Takes in a
SampleList
and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.Parameters: - sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
- model_output (Dict) – Output dict from the model for the current SampleList
Returns: Value of the metric.
Return type: torch.Tensor|float
-
-
class
mmf.modules.metrics.
TextVQAAccuracy
[source]¶ -
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Abstract method to be implemented by the child class. Takes in a
SampleList
and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.Parameters: - sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
- model_output (Dict) – Output dict from the model for the current SampleList
Returns: Value of the metric.
Return type: torch.Tensor|float
-
-
class
mmf.modules.metrics.
VQAAccuracy
[source]¶ Calculate VQAAccuracy. Find more information here
Key:
vqa_accuracy
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate vqa accuracy and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: VQA Accuracy
Return type: torch.FloatTensor
-
-
class
mmf.modules.metrics.
VQAEvalAIAccuracy
[source]¶ Calculate Eval AI VQAAccuracy. Find more information here This is more accurate and similar comparision to Eval AI but is slower compared to vqa_accuracy.
Key:
vqa_evalai_accuracy
.-
calculate
(sample_list, model_output, *args, **kwargs)[source]¶ Calculate vqa accuracy and return it back.
Parameters: - sample_list (SampleList) – SampleList provided by DataLoader for current iteration
- model_output (Dict) – Dict returned by model.
Returns: VQA Accuracy
Return type: torch.FloatTensor
-