What is Mean Average Precision [MAP@k] metric explained with code.

MAP@k is a metric used in the cases where information retrieval and ranking of documents is important.

By Nandeshwar

Feb 18, 2022

MAP_k_banner.png


MAP@k is a commonly used metric, especially in the use cases where information retrieval is done and ranking of documents is equally important. It is usually used in following

  • Recommender Systems
  • Ranking Models
  • Search based Models (Google search)
"k" is not a hyperparameter as it is determined by business & stakeholders.

Evaluating Recommender Systems - MAP@k

If you are using MAP metric in recommender systems then you are implying that you treating recommendation as a ranking model. This seems to be right because often recommendations are provided as a list and users watch the items on top of the list!

MAP metric rewards the model on getting relevant recommendations and also emphasizes rewarding them more for having ranked first on the list.

Keep in mind

  1. We can recommend k items for each user.
  2. Because there is no penalty for poor recommendations, it is better to recommend all k recommendations.
  3. Rank or Order of the recommendation matters a lot as it is better to show better suggestions first to the user.

Explanation

Let us break down MAP first. M denotes mean, so it says Mean of AP where AP is Average Precision. This implies if we have 2500 users, then the sum of Average Precisions for each user divided by 2500 is equal to MAP.

In order to understand MAP one should first understand Precision and Recall. More specifically Precision@k and Recall@k.


Precision

Precision is the percentage of true positives in the retrieved results. Precision is a good measure to determine when the costs of False Positive are high. Hence:

$$ precision = \frac{tp}{tp+fp} \tag{1} $$


Recall

The recall is the number of true positives divided by the total number of true positives and false negatives (all the positives even which are not recommended).

$$ recall = \frac{tp}{tp+fn} \tag{2} $$


Precision@k (P@k)

In our case $tp+fp$ are the retrieved results hence it can be n where n is equal to the number of images received $(tp + fn)$. Using this in equation 1

$$ precision@k = \frac{tp}{k} \tag{3} $$

P@k is generally calculated for one recommendation


Average Precision@k (AP@k)

Above we saw the definition of Precision. Now Average Precision refers, as the name says an average of all the values from P@i for i= 1,2,3, ., k. For Eg. to calculate AP@k: sum of(P@1, P@2, P@3 . . ., P@k).

AP@k is typically calculated for one user from all recommendations by averaging P@1 to P@k


Mean Average Precision@k (MAP@k)

Coming to the last piece ie Mean of AP. It simply implies the mean of AP@k for all the users. In order to do this, we divide the sum of all APs by m where m is min(k, a) where a is the number of actual relevant recommendations while our algorithm is supposed to recommend k.


1/min(k, a) adds the normalization factor, which prevents AP scores from being unfairly suppressed when the number of recommendations couldn't possibly capture all the correct ones.


For eg.

Consider, there are 5 relevant recommendations (m), we are making 10 recommendations (N) as follows — 1 0 1 0 0 1 0 0 1 1. Let’s calculate the mean average precision here.

MAP@10:

$$ (1 * 1+0.5 * 0+0.67 * 1+0.5 * 0+0.4 * 0+0.5 * 1+0.42 * 0+0.375 * 0+0.44 * 1+0.5 * 1) / 5 = 0.62 $$


Proof - MAP is formulated to reward better ranked recommendations highly

Imagine there are two types of predictions and 8 possible recommendations

  1. rec_a = 1 0 0 0 0 1 1
  2. rec_b = 1 1 1 0 0 0 0

Here $k=7$ and $a=8$ hence we divide by $min(k, a) = 7$ as mentioned above

Let us calculate MAP@7 for rec_a

$$ (1*1 + 1/2*0 + 1/3*0 + 1/4*0 + 1/5*0 + 2/6*1 + 3/7*1) / 7 = 0.25 $$

Similarly, Let us calculate MAP@7 for rec_b

$$ (1*1 + 2/2*1 + 3/3*1 + 3/4*0 + 3/5*0 + 3/6*0 + 3/7*0) / 7 = 0.42 $$

MAP@7 for rec_a = 1.76 &

MAP@7 for rec_b = 3

Clearly, rec_b has a better score hence providing better recommendations.

Code

Refering to Benhamner's code

import numpy as np

def apk(actual, predicted, k=10):
    """
    Computes the average precision at k.
    This function computes the average prescision at k between two lists of
    items.
    Parameters
    ----------
    actual : list
             A list of elements that are to be predicted (order doesn't matter)
    predicted : list
                A list of predicted elements (order does matter)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The average precision at k over the input lists
    """
	if not actual:
        return 0.0

    if len(predicted)>k:
        predicted = predicted[:k]

    score = 0.0
    num_hits = 0.0

    for i,p in enumerate(predicted):
		# first condition checks whether it is valid prediction
		# second condition checks if prediction is not repeated
        if p in actual and p not in predicted[:i]:
            num_hits += 1.0
            score += num_hits / (i+1.0)

    return score / min(len(actual), k)


def mapk(actual, predicted, k=10):
    """
    Computes the mean average precision at k.
    This function computes the mean average prescision at k between two lists
    of lists of items.
    Parameters
    ----------
    actual : list
             A list of lists of elements that are to be predicted 
             (order doesn't matter in the lists)
    predicted : list
                A list of lists of predicted elements
                (order matters in the lists)
    k : int, optional
        The maximum number of predicted elements
    Returns
    -------
    score : double
            The mean average precision at k over the input lists
    """
    return np.mean([apk(a,p,k) for a,p in zip(actual, predicted)])

Tags

recommender-system
evaluation-metric
Intermediate