System Design Space
Knowledge graphSettings

Updated: March 25, 2026 at 3:00 AM

Data platforms: How to build them in 2025 - interview with Nikolay Golov

hard

Research Insights Made Simple #6: centralization vs federalization, data mesh in practice, OLTP/MPP limitations and the evolution of data platforms.

This episode is useful because it grounds the data-platform conversation in real questions about centralization, federation, mesh, and the boundaries between OLTP, MPP, and platform services instead of hype alone.

In day-to-day engineering work, it helps translate 2025 market ideas into capability maps and concrete architecture decisions rather than another tool list the team will spend years supporting.

In interviews, reviews, and architecture conversations, it is especially strong when you need to surface the risk of tool sprawl, vendor lock-in, and platform cost growing faster than real value.

Practical value of this chapter

Design in practice

Translates 2025 market trends into actionable architecture choices for data-platform teams.

Decision quality

Evaluates platform maturity through capability maps rather than tool checklists.

Interview articulation

Adds current context for discussing modern data stack, governance, and cost control.

Risk and trade-offs

Highlights tool sprawl, vendor lock-in, and uncontrolled platform-cost risks.

Data platforms: how to build them in 2025

An interview on practical architecture of modern data platforms: operating models, platform-as-a-product, OLTP/MPP limits, and a realistic implementation path without organizational chaos.

Year:2025
Production:Research Insights Made Simple
Focus:data platform operating model + engineering execution

Source

Telegram: book_cube

Post with a short summary and key points from the episode.

Read summary

About the episode

The core question in this interview is practical: how to build a data platform that accelerates product teams instead of slowing them down. The discussion centers on the balance between domain autonomy, shared platform standards, and cost control.

A key takeaway is that architecture success in 2025 depends less on "the perfect stack" and more on operating model, ownership, and quality of platform capabilities.

Guest and context

Nikolay Golov

  • Head of Data Engineering at ManyChat.
  • Ex Head of Data Platform at Avito.
  • Data platform practitioner and database educator.

The value of this episode is in connecting organizational decisions with architecture trade-offs, not treating them as separate concerns.

What changed in 2025

  • Teams are no longer debating only storage engines; they are redesigning the operating model of the platform.
  • Growth of data products increased demand for self-service and also raised the cost of weak governance.
  • Storage/compute decoupling and open table formats became a baseline expectation rather than an experiment.
  • LLM and realtime use cases increased freshness and observability requirements across data pipelines.
  • Data platform economics (FinOps for data) became a first-class architectural constraint alongside latency and reliability.

Related chapter

T-Bank data platform overview

A practical case of data platform evolution from DWH to Lakehouse.

Open chapter

Operating models for the platform

Centralized platform team

Best fit: Early stage companies or organizations with a limited number of domains.

Strengths

  • Consistent standards for quality, security, and tooling.
  • Fast rollout of core platform capabilities.

Risks

  • The platform team becomes a bottleneck for domain delivery.
  • Weak ownership of source data quality in product teams.

Hybrid model (platform + domain squads)

Best fit: Mid-size and large companies with multiple products and different data SLAs.

Strengths

  • Platform team builds shared capabilities while domains own data products.
  • Lower time-to-data without sacrificing standards or cost control.

Risks

  • Requires explicit ownership boundaries and interface contracts.
  • Without governance, quality and semantics diverge quickly.

Federated model (mature data mesh)

Best fit: Very large organizations with highly autonomous domain units.

Strengths

  • Maximum domain velocity and local accountability.
  • Scalable ownership model without centralizing every decision.

Risks

  • High risk of data mash if shared platform capabilities are weak.
  • Coordination cost grows fast when platform investment is insufficient.

Reference architecture for a data platform

Ingestion and capture layer

Focus: CDC, event buses, batch connectors, and contract-based ingestion.

The main goal is predictable delivery into the platform with schema and freshness control.

Storage and table format layer

Focus: Object storage + open table formats (Iceberg/Delta/Hudi), partitioning, compaction.

This layer provides durability, schema evolution, and compute isolation from physical storage.

Compute and transform layer

Focus: Batch/stream processing, dbt/SQL transforms, orchestration, and retries.

This is where domain-ready datasets, freshness SLAs, and repeatable pipelines are formed.

Serving and consumption layer

Focus: BI, reverse ETL, feature stores, API access to data products, MPP serving.

Consumption must stay self-service while still controlled by cost and access policies.

Governance and reliability layer

Focus: Catalog, lineage, data contracts, quality checks, observability, incident playbooks.

Without this layer, hidden debt accumulates and trust in the platform degrades.

Common anti-patterns

Data mesh without a real platform

Problem: Teams are declared autonomous but do not receive shared capabilities for contracts, quality, and discoverability.

Fix: Build platform layer + governance minimum first, then scale federation.

One MPP as the universal answer

Problem: MPP may cover part of OLAP serving, but not ingestion reliability, ownership, and contract management.

Fix: Treat MPP as one serving component inside a broader data platform architecture.

Raw data without product ownership

Problem: No one owns semantic quality, so downstream teams build conflicting metrics and business logic.

Fix: Assign data product owners and publish explicit quality/freshness SLOs.

Technology-first, outcome-last

Problem: Stack modernization happens, but time-to-data for product teams does not improve.

Fix: Tie every platform initiative to business metrics: lead time, reliability, cost-per-query.

Patterns that consistently work

  • Data contracts as mandatory interfaces between producer and consumer teams.
  • Unified catalog + lineage for dependency discovery and impact analysis.
  • Standardized medallion/semantic layering for predictable data flow.
  • Product-oriented ownership with SLOs for freshness, completeness, and schema stability.
  • FinOps loop for data platform: compute budgeting, storage optimization, chargeback/showback.

Implementation roadmap (0-180 days)

0-30 days

Baseline and constraint mapping

Inventory sources, critical pipelines, current SLAs, and incidents. Build an explicit architectural baseline and focus on constraints that block product delivery.

30-60 days

Governance minimum + shared interfaces

Introduce data contracts, catalog, and basic quality checks for critical domains. Formalize schema evolution rules and ownership boundaries.

60-120 days

Self-service capabilities

Launch reusable ingestion/transform templates, observability standards, and repeatable pipeline setup to reduce time-to-first-dataset.

120-180 days

Economics and scaling the model

Enable showback/chargeback, optimize compute/storage, and expand domain ownership. At this stage, hybrid operating model usually becomes the default.

Related chapter

Data Governance & Compliance

Quality, lineage, and access-control practices for data platforms.

Open chapter

Platform maturity metrics

Time-to-first-dataset

Target: < 1 week

How quickly new business use cases can get production-ready data.

Pipeline reliability

Target: 99.9%+

Share of successful critical pipeline runs without manual intervention.

Freshness SLO compliance

Target: >= 95%

How often data meets declared freshness windows.

Schema incident rate

Target: Quarter-over-quarter reduction

Frequency of incidents caused by incompatible schema changes.

Cost per successful insight

Target: Controlled downward trend

Platform economics linked to actual business outcomes.

Reusability ratio

Target: > 60%

Share of reusable data products and standardized pipelines.

Related materials and references

Enable tracking in Settings