System Design Space
Knowledge graphSettings

Updated: March 24, 2026 at 5:22 PM

Example Troubleshooting Interview

medium

Public interview at DevOops 2023: analysis of the architecture of a fintech application, an incident with a reduction in payments, diagnostics in the Lead + Junior pair.

Reading the troubleshooting rules helps, but the real meaning of the format becomes clear only when you watch an investigation unfold step by step.

This chapter shows how a representative incident develops: which signals the interviewer introduces, how the candidate sets priorities, where the reasoning branches, and which moves genuinely push the investigation forward.

For preparation, it is especially useful as rehearsal material: you can calibrate pacing, question quality, and hypothesis depth against something concrete, while companies can reuse it to onboard and calibrate new interviewers.

Practical value of this chapter

Forensic sequence

Practice investigation order: symptom, hypothesis, test, confirmation, and corrective action.

Root-cause isolation

Separate primary cause from secondary effects so mitigation does not hide systemic failure.

Mitigation design

Design actions by horizon: immediate stabilization, mid-term fix, and long-term guardrail.

Postmortem articulation

Explain clearly what changes after the incident and how success will be measured.

Source

Public interview on DevOops

Article by Alexander Polomodov about the public Troubleshooting interview

tellmeabout.tech

After the theoretical analysis of the Troubleshooting Interview format, it is useful to see it in action. At the conference DevOops 2023 a public interview was conducted, which demonstrates the entire process from start to finish - from describing the architecture to resolving the incident.

Interview participants

  • Interviewer: Alexander Polomodov
  • Candidate: Salikh Fakhrutdinov, Senior SRE at Tinkoff Origination Platform

Legend interview

According to legend, the candidate and the interviewer work together in SRE team. The candidate plays the role Lead, and the interviewer - Junior. Lead leaves for a conference, and Junior remains on duty. When an incident occurs, Junior calls a friend (our candidate) and asks us to solve the incident together.

This role model creates a realistic atmosphere and allows you to appreciate the candidate's communication skills - how he guides a less experienced colleague through the diagnostic process.

Theory

Troubleshooting Interview

9-step incident diagnosis framework

Read the theory

System architecture

Before the start of the incident, the architecture of the fintech application is discussed Yellow:

Scale

~1 million DAU (Daily Active Users)

Functionality

Debit/credit cards, payments

Interactive architecture diagram

Click the buttons to switch between the initialization paths and the main data flow. Use Play to play automatically.

App launch

User opens the web or mobile app

Desktop
Users
Mobile
Frontend LBs
Frontend App
Backend LBs
Auth Service
Auth DB
Card Service
Card DB
Payment Service
Payment DB

Initialization

CDN
Config LBs
Config Service
Config DB
Initialization path
Main path

Incident

Custom path

Click to reveal incident symptoms

Product list

Card #1

Debit • ****4521

Card #2

Credit • ****8832

Products
Payments
First screen

Payments

Payment form

Money transfer

Products
Payments
Second screen

After the candidate has asked clarifying questions about architecture, the interview proceeds to the diagnostic phase. Junior reports a symptom - payment reduction alert - and together with Lead, an investigation into the cause begins.

What is assessed in the process

  • Diagnostic methodology - systematic approach vs chaotic search
  • Formulation of hypotheses and their testing
  • Using RED/USE methods for analysis
  • Communication and direction of a less experienced colleague
  • Balance between workaround and full-fledged fix

Key Findings

Realistic format

Role model "Lead + Junior" creates the atmosphere of a real incident and allows you to evaluate not only technical, but also communication skills.

Architectural context

The interview begins with a detailed analysis of the system architecture - this gives the candidate the necessary context to formulate hypotheses.

Practice vs theory

Watching a real interview complements theoretical knowledge about the format and helps to understand how to apply the methodology in practice.

References

Related chapters

Enable tracking in Settings