System Design Space
Knowledge graphSettings

Updated: April 5, 2026 at 7:44 PM

Fraud / Risk Scoring ML System

hard

Practical ML case: realtime scoring, review operations, delayed labels, threshold tuning, drift analysis, and the next calibration cycle.

Risk scoring is one of the best ML cases for system design because model quality immediately collides with error costs and decisions under tight latency budgets.

The chapter ties threshold choice, delayed labels, human review, and fallback behavior into one production system.

That is especially useful in interviews where you need to connect ML metrics, architecture decisions, and business outcomes.

Practical value of this chapter

Decision cost

Connect the model score to the product action and the real cost of a wrong decision.

Online path

Design scoring, thresholds, and fallback under a strict latency budget.

Delayed labels

Handle feedback that arrives too late for naive online-only schemes.

Interview material

Use a concrete risk-system story instead of generic ML theory.

Related chapter

Precision and recall basics

Foundation for discussing threshold choice and the cost of errors in risk systems.

Читать обзор

Fraud / Risk Scoring ML System is a classic ML case where the cost of a mistake is immediate: a system that is too soft leaks loss, while a system that is too strict hurts conversion and user trust. In interviews, the goal is to show how you connect threshold tuning, realtime scoring, delayed labels, and human review into one working system.

Functional requirements

  • Compute a risk score for payments, logins, transfers, and new-device events in near real time.
  • Support a decision policy that can approve, challenge, route to manual review, or block based on the risk tier.
  • Use features from user history, device graph, geo patterns, velocity checks, and external risk signals.
  • Collect delayed labels from chargebacks, confirmed fraud, analyst review, and customer disputes.

Non-functional requirements

  • p95 scoring latency below 120 ms on the synchronous critical path.
  • The system must survive provider or feature-store degradation through fallback rules and safe default thresholds.
  • Full auditability of which features, model, and threshold produced a decision.
  • Support frequent recalibration and threshold updates without rebuilding the whole system.

Scale assumptions

Transactions/day

250M+

The event rate requires a cheap scoring path and streaming feature updates.

Peak scoring QPS

75k

Peaks align with marketing campaigns, payroll windows, and holiday traffic.

Label delay

days to weeks

Chargebacks and confirmed-fraud events arrive long after the original decision.

False-positive cost

very high

Over-blocking legitimate activity hurts conversion, trust, and support load.

Reference architecture

It helps to read this system as a stack of layers: from incoming signals and feature state to review operations and the loop that updates the next release.

Signals and ingress
payments and loginsdevice eventsexternal signalspre-checks
Layer transition
Feature and state layer
online aggregatesdevice graphvelocity checksfreshness control
Layer transition
Scoring and decisioning
scoringthresholdspolicy rulesfallback
Layer transition
Review and case ops
manual reviewreason codesstep-up checkscase outcomes
Layer transition
Feedback and tuning
delayed labelsdriftreplayrecalibration

What to keep under control

It helps to view this system not only as a request path, but as a balance of error cost, live constraints, and how quickly the next tuning cycle can happen.

Error economics

false-positive costfraud leakageconversion hitsupport load

Live constraints

p95 latencyfeature-store SLAprovider degradationaudit trail

Improvement loop

label delaysegment driftthreshold tuningpolicy/model updates

Below, the chapter separates the synchronous decision path from the write path that carries delayed feedback, labels, and the next round of tuning.

How the system reads and writes fraud signals

Comparing the synchronous decision path with the delayed feedback path

Interactive replay

Step

Synchronous decision path

Active step

1. Event intake and pre-check

The system receives the event, validates the required fields, and decides whether it can enter the critical path.

Latency budgetThresholdsFallback
Latency-sensitive.
Must survive feature and provider degradation.
The cost of a false positive is immediately visible to the user.

Key trade-offs

  • A lower threshold reduces fraud leakage but increases false positives and friction for legitimate users.
  • More realtime features improve quality but make freshness SLAs, debugging, and fallback paths more complex.
  • One global model is easier to operate, but segmented models are often more accurate for different markets and products.
  • Hard blocking high scores lowers loss risk but raises the cost of wrong decisions and increases support pressure.

Common mistakes

Optimizing only ROC-AUC without translating quality into the business cost of false positives and fraud leakage.
Mixing online scoring with post-factum labels without explicit handling of delayed feedback and label leakage.
Making the risk engine a black box with no explainability or audit trail for support and analysts.
Running without a fallback policy when online features, external providers, or the primary model are unavailable.

Recommendations

Separate score generation from decision policy: the model predicts risk, while product and risk policy decide the action.
Track not only an aggregate quality metric but also segment-level drift across countries, products, channels, and device cohorts.
Maintain replay sets and a calibration pipeline because thresholds and score distributions age faster than teams expect.
Treat analyst review as part of the system design, not as a manual tail after the incident.

What to explain in an interview

  • How would you choose the threshold, and who should own the trade-off between false positives and false negatives?
  • How do you design the scoring path when labels arrive weeks after the transaction?
  • What happens when online features or an external risk provider become unavailable?
  • How do you explain to analysts and support staff why the system made a decision?

Related chapters

Enable tracking in Settings