What is Mean Average Precision [MAP@k] metric explained with code.
MAP@k is a metric used in the cases where information retrieval and ranking of documents is important.
Feb 18, 2022
MAP@k is a commonly used metric, especially in the use cases where information retrieval is done and ranking of documents is equally important. It is usually used in following
- Recommender Systems
- Ranking Models
- Search based Models (Google search)
"k" is not a hyperparameter as it is determined by business & stakeholders.
Evaluating Recommender Systems - MAP@k
If you are using MAP metric in recommender systems then you are implying that you treating recommendation as a ranking model. This seems to be right because often recommendations are provided as a list and users watch the items on top of the list!
MAP metric rewards the model on getting relevant recommendations and also emphasizes rewarding them more for having ranked first on the list.
Keep in mind
- We can recommend k items for each user.
- Because there is no penalty for poor recommendations, it is better to recommend all k recommendations.
- Rank or Order of the recommendation matters a lot as it is better to show better suggestions first to the user.
Explanation
Let us break down MAP first. M denotes mean, so it says Mean of AP where AP is Average Precision. This implies if we have 2500 users, then the sum of Average Precisions for each user divided by 2500 is equal to MAP.
In order to understand MAP one should first understand Precision and Recall. More specifically Precision@k and Recall@k.
Precision
Precision is the percentage of true positives in the retrieved results. Precision is a good measure to determine when the costs of False Positive are high. Hence:
$$ precision = \frac{tp}{tp+fp} \tag{1} $$
Recall
The recall is the number of true positives divided by the total number of true positives and false negatives (all the positives even which are not recommended).
$$ recall = \frac{tp}{tp+fn} \tag{2} $$
Precision@k (P@k)
In our case $tp+fp$ are the retrieved results hence it can be n where n is equal to the number of images received $(tp + fn)$. Using this in equation 1
$$ precision@k = \frac{tp}{k} \tag{3} $$
P@k is generally calculated for one recommendation
Average Precision@k (AP@k)
Above we saw the definition of Precision. Now Average Precision refers, as the name says an average of all the values from P@i for i= 1,2,3, ., k. For Eg. to calculate AP@k: sum of(P@1, P@2, P@3 . . ., P@k).
AP@k is typically calculated for one user from all recommendations by averaging P@1 to P@k
Mean Average Precision@k (MAP@k)
Coming to the last piece ie Mean of AP. It simply implies the mean of AP@k for all the users. In order to do this, we divide the sum of all APs by m where m is min(k, a) where a is the number of actual relevant recommendations while our algorithm is supposed to recommend k.
1/min(k, a) adds the normalization factor, which prevents AP scores from being unfairly suppressed when the number of recommendations couldn't possibly capture all the correct ones.
For eg.
Consider, there are 5 relevant recommendations (m), we are making 10 recommendations (N) as follows — 1 0 1 0 0 1 0 0 1 1. Let’s calculate the mean average precision here.
MAP@10:
$$ (1 * 1+0.5 * 0+0.67 * 1+0.5 * 0+0.4 * 0+0.5 * 1+0.42 * 0+0.375 * 0+0.44 * 1+0.5 * 1) / 5 = 0.62 $$
Proof - MAP is formulated to reward better ranked recommendations highly
Imagine there are two types of predictions and 8 possible recommendations
- rec_a = 1 0 0 0 0 1 1
- rec_b = 1 1 1 0 0 0 0
Here $k=7$ and $a=8$ hence we divide by $min(k, a) = 7$ as mentioned above
Let us calculate MAP@7 for rec_a
$$ (1*1 + 1/2*0 + 1/3*0 + 1/4*0 + 1/5*0 + 2/6*1 + 3/7*1) / 7 = 0.25 $$
Similarly, Let us calculate MAP@7 for rec_b
$$ (1*1 + 2/2*1 + 3/3*1 + 3/4*0 + 3/5*0 + 3/6*0 + 3/7*0) / 7 = 0.42 $$
MAP@7 for rec_a = 1.76 &
MAP@7 for rec_b = 3
Clearly, rec_b has a better score hence providing better recommendations.
Code
Refering to Benhamner's code
import numpy as np
def apk(actual, predicted, k=10):
"""
Computes the average precision at k.
This function computes the average prescision at k between two lists of
items.
Parameters
----------
actual : list
A list of elements that are to be predicted (order doesn't matter)
predicted : list
A list of predicted elements (order does matter)
k : int, optional
The maximum number of predicted elements
Returns
-------
score : double
The average precision at k over the input lists
"""
if not actual:
return 0.0
if len(predicted)>k:
predicted = predicted[:k]
score = 0.0
num_hits = 0.0
for i,p in enumerate(predicted):
# first condition checks whether it is valid prediction
# second condition checks if prediction is not repeated
if p in actual and p not in predicted[:i]:
num_hits += 1.0
score += num_hits / (i+1.0)
return score / min(len(actual), k)
def mapk(actual, predicted, k=10):
"""
Computes the mean average precision at k.
This function computes the mean average prescision at k between two lists
of lists of items.
Parameters
----------
actual : list
A list of lists of elements that are to be predicted
(order doesn't matter in the lists)
predicted : list
A list of lists of predicted elements
(order matters in the lists)
k : int, optional
The maximum number of predicted elements
Returns
-------
score : double
The mean average precision at k over the input lists
"""
return np.mean([apk(a,p,k) for a,p in zip(actual, predicted)])