System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Precision and recall at your fingertips

easy

A simple and practical explanation of precision/recall, their trade-off and threshold selection using the example of “Vasya and the Wolf”.

Source

Precision and recall at your fingertips

The original post on which this chapter is based.

Open post

Precision And recall show different aspects of the quality of the classifier. Precision is responsible for quality of positive responses, recall - for completeness of detection. In real systems, there is almost always a trade-off between these metrics.

Formulas in simple language

Precision

Of all the things the model has labeled as positive, how many are actually positive.

Precision = TP / (TP + FP)

Recall (completeness)

Of all the really positive ones, as many as the model could find.

Recall = TP / (TP + FN)

Visualization: Vasya, sheep and wolf

50%

Vasya keeps a balance between false alarms and missed wolves.

TP

17

Vasya shouted and the wolf really was

FP

10

False alarm: "wolf", but it's not a wolf

FN

13

The wolf was there, but Vasya remained silent

TN

60

Silence and there really is no wolf

Metrics at the current threshold

Precision63.0%
Recall56.7%
F1-score59.6%
In the example we use a fixed stream from 100 events where the real wolf comes 30. Change the threshold and watch how TP, FP, FN grow/fall.

Where is what is more important?

Code review assistant

Precision: High | Recall: Average

False alarms quickly lead to fatigue and disregard for recommendations.

Medical screening

Precision: Average | Recall: Very high

It is critical not to miss real cases of the disease (FN is more expensive than FP).

Antifraud in payments

Precision: High | Recall: High

You need to balance: do not block unnecessary transactions and do not allow fraud.

Practical recommendations

Always fix the FP and FN price for a particular product before choosing a threshold.

Show precision/recall together with the confusion matrix, and not in isolation from each other.

Check metrics separately by segment (clients, data types, languages) so as not to hide degradation.

For review assistants, it is often beneficial to keep the precision higher in order to maintain the trust of users.

Common mistakes

Look only at accuracy if there is a strong class imbalance.

Compare models by precision without recall control (and vice versa).

Fix the threshold once and do not revise it after data changes.

Ignore user reaction to false positives in production.

References

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov