System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

Engineering Reliable Mobile Applications (short summary)

mid

Source

Brief overview in Russian

My book review on Tell Me About Tech

Read the article

Engineering Reliable Mobile Applications

Authors: Kristine Chen, Venkat Patnala, Devin Carraway, Pranjal Deo
Publisher: O'Reilly Media, 2019
Length: 35 pages

Mobile SRE from Google: staged rollout, feature flags, client telemetry and impact on the backend.

Engineering Reliable Mobile Applications - original coverOriginal

Features of mobile applications for SRE

Features of Mobile SRE

Scale
Billions of devices, thousands of models
Control
Limited control over devices
Monitoring
Collection of metrics taking into account restrictions
Change Management
Inability to rollback

SRE Book

Site Reliability Engineering

Basics of SLI/SLO/SLA and error budgets

Head of the course

Measuring indicators

For mobile apps, accessibility is a difficult question to answer. There are definitely not enough server logs for this - you need client telemetry to measure and ensure visibility. Even without extensive telemetry, you can rely on crash statistics.

SLI (Service Level Indicators)

We record what and how we measure. On the client side, we provide instrumentation and send the necessary events to the backend, where we calculate the indicators. Sent events can participate in the calculation of different SLIs.

SLO (Service Level Objectives)

With high-quality SLIs, we can set certain SLO levels that we strive for. It is important to take into account the specifics of mobile devices.

Real time monitoring

SRE teams love real-time monitoring. But in the mobile world resolution time increased, since changes are delivered in polling mode. It may take time for client metrics to stabilize after changes are submitted. watch.

Low-latency error ratios

Design metrics with high-confidence denominators to control for normal traffic fluctuations. This allows you to monitor changes immediately after sending.

Configuration state as dimension

Metrics from telemetry should include configuration status as a measurement. This allows you to filter telemetry from devices that have received the desired fix.

White-Box Monitoring

Metrics that publish data about the internal workings of an application. Requires code instrumentation.

Black-Box Monitoring

Checking the external, visible behavior of the application. For example, periodic samples.

Both approaches are complementary - only together they provide a fairly reliable idea of the state of the application.

CI/CD

Grokking Continuous Delivery

Continuous Delivery Practices

Head of the course

Change management

Using change management best practices is critical: rollback is almost impossible, and some problems found in production are fatal (for example, “bricked” devices).

Staged Rollout / Phased Releases

1%
Internal
5%
Early Adopters
20%
Expansion
50%
Broad
100%
Full Rollout
Internal(1% users)

Internal testers and dogfooding

Unlike server-side deployment, in the mobile world only roll forward is possible - rollback through a new version

Case

A/B platform design

Experimentation system architecture for web and mobile applications

Parse the problem

Feature Flags and A/B testing

Mobile applications operate in a very diverse ecosystem, where all parameters may differ from device to device (CPU, memory, network bandwidth). If you focus on metrics immediately after release, you can get distorted data — new versions are installed first by enthusiasts with powerful devices.

Google Recommendation

Separate the release of new applications from the launch of new behavior. Trigger behavioral changes through A/B tests using feature flags.

It is important to test that the rolling back flag will not break the application

When upgrading, there may be side effects that cannot be eliminated - you can organize a “placebo effect” for the old application for the correctness of the experiment

Support for older versions

A large number of releases leads to a long tail of old versions on customer devices. A clear support policy is required.

Support horizon

Support for older versions should have a clear horizon - for example, one or two years. Otherwise, maintaining the entire zoo of old versions will be too expensive and ineffective.

Sustainability

Release It!

Protection patterns against cascade failures

Head of the course

Impact on backend services

Changes to client code can have significant consequences on the server side. For example, changing the caching policy can increase the number of requests by an order of magnitude, which can lead to denial of service to backend systems.

It is important to understand how changes on the client side relate to changes in the nature of use of related services, and to test before publishing that these changes will not be fatal.

SRE: Hope Is Not a Mobile Strategy

The authors highlight the following best practices from Google's experience:

Design

Design mobile applications robust to unexpected input data, capable of recovering from management errors and rolling out changes in a controlled, metric-driven way.

Monitor

Monitor the application in production, measuring critical user interactionsand key health metrics (responsiveness, data freshness, crashes). Success criteria should be directly related to user expectations.

Release

Roll out changes carefully via feature flags, so that they can be evaluated through experimentation and rolled back independently of binary releases.

Understand

Understand and prepare for impact of the application on servers. Prevent known problematic patterns (eg thundering herd). Establish development and release practices that avoid problematic feedback patterns between applications and services.

Related content from Google

Main conclusions

Mobile SRE requires adaptation of backend practices to platform limitations
Rollback is not possible - only roll forward through new versions
Feature flags are critical for controlled changes
Client telemetry is required for visibility
Staged rollout protects against massive problems
Changes on the client affect the backend load

Where to find the book

O'Reilly
Original in English

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov