Source
Public interview on DevOops
Article by Alexander Polomodov about the public Troubleshooting interview
After the theoretical analysis of the Troubleshooting Interview format, it is useful to see it in action. At the conference DevOops 2023 a public interview was conducted, which demonstrates the entire process from start to finish - from describing the architecture to resolving the incident.
Interview participants
- Interviewer: Alexander Polomodov
- Candidate: Salikh Fakhrutdinov, Senior SRE at Tinkoff Origination Platform
Legend interview
According to legend, the candidate and the interviewer work together in SRE team. The candidate plays the role Lead, and the interviewer - Junior. Lead leaves for a conference, and Junior remains on duty. When an incident occurs, Junior calls a friend (our candidate) and asks us to solve the incident together.
This role model creates a realistic atmosphere and allows you to appreciate the candidate's communication skills - how he guides a less experienced colleague through the diagnostic process.
Theory
Troubleshooting Interview
9-step incident diagnosis framework
System architecture
Before the start of the incident, the architecture of the fintech application is discussed Yellow:
Scale
~1 million DAU (Daily Active Users)
Functionality
Debit/credit cards, payments
Interactive architecture diagram
Click the buttons to switch between the initialization paths and the main data flow. Use Play to play automatically.
App launch
User opens the web or mobile app
Initialization
Incident
Custom path
Product list
Card #1
Debit • ****4521
Card #2
Credit • ****8832
Payments
Payment form
Money transfer
After the candidate has asked clarifying questions about architecture, the interview proceeds to the diagnostic phase. Junior reports a symptom - payment reduction alert - and together with Lead, an investigation into the cause begins.
What is assessed in the process
- •Diagnostic methodology - systematic approach vs chaotic search
- •Formulation of hypotheses and their testing
- •Using RED/USE methods for analysis
- •Communication and direction of a less experienced colleague
- •Balance between workaround and full-fledged fix
Key Findings
Realistic format
Role model "Lead + Junior" creates the atmosphere of a real incident and allows you to evaluate not only technical, but also communication skills.
Architectural context
The interview begins with a detailed analysis of the system architecture - this gives the candidate the necessary context to formulate hypotheses.
Practice vs theory
Watching a real interview complements theoretical knowledge about the format and helps to understand how to apply the methodology in practice.
