Source
Precision and recall at your fingertips
The original post on which this chapter is based.
Precision And recall show different aspects of the quality of the classifier. Precision is responsible for quality of positive responses, recall - for completeness of detection. In real systems, there is almost always a trade-off between these metrics.
Formulas in simple language
Precision
Of all the things the model has labeled as positive, how many are actually positive.
Precision = TP / (TP + FP)
Recall (completeness)
Of all the really positive ones, as many as the model could find.
Recall = TP / (TP + FN)
Visualization: Vasya, sheep and wolf
Vasya keeps a balance between false alarms and missed wolves.
TP
17
Vasya shouted and the wolf really was
FP
10
False alarm: "wolf", but it's not a wolf
FN
13
The wolf was there, but Vasya remained silent
TN
60
Silence and there really is no wolf
Metrics at the current threshold
Where is what is more important?
Code review assistant
Precision: High | Recall: Average
False alarms quickly lead to fatigue and disregard for recommendations.
Medical screening
Precision: Average | Recall: Very high
It is critical not to miss real cases of the disease (FN is more expensive than FP).
Antifraud in payments
Precision: High | Recall: High
You need to balance: do not block unnecessary transactions and do not allow fraud.
Practical recommendations
Always fix the FP and FN price for a particular product before choosing a threshold.
Show precision/recall together with the confusion matrix, and not in isolation from each other.
Check metrics separately by segment (clients, data types, languages) so as not to hide degradation.
For review assistants, it is often beneficial to keep the precision higher in order to maintain the trust of users.
Common mistakes
Look only at accuracy if there is a strong class imbalance.
Compare models by precision without recall control (and vice versa).
Fix the threshold once and do not revise it after data changes.
Ignore user reaction to false positives in production.
