This episode is useful because it grounds the data-platform conversation in real questions about centralization, federation, Data Mesh, and the boundaries between OLTP, MPP, and platform services instead of hype alone.
In day-to-day engineering work, it helps translate 2025 market ideas into a capability map and concrete architecture decisions rather than another tool list the team will spend years supporting.
In interviews and architecture discussions, it is especially strong when you need to surface tool sprawl, vendor lock-in, and platform cost growing faster than real value.
Practical value of this chapter
Design in practice
Translates 2025 market trends into actionable architecture choices for data-platform teams.
Decision quality
Evaluates platform maturity through capability maps rather than tool checklists.
Interview articulation
Adds current context for discussing the modern data stack, governance, and cost control.
Risk and trade-offs
Highlights tool sprawl, vendor lock-in, and uncontrolled platform-cost risk.
Data platforms in 2025: interview with Nikolay Golov
A practical interview about modern data platforms: operating models, platform as a product, OLTP/MPP limits, and a realistic adoption path without organizational chaos.
Source
Telegram: Book Cube
Post with a short summary and key points from the episode.
About the episode
The core question in this interview is practical: how to build a data platform that accelerates product teams instead of slowing them down. The discussion centers on the balance between domain autonomy, shared platform standards, and cost control.
A key takeaway is that architecture success in 2025 depends less on "the perfect stack" and more on operating model, ownership, and quality of platform capabilities.
Guest and context
Nikolay Golov
- Head of Data Engineering at ManyChat.
- Ex Head of Data Platform at Avito.
- Data platform practitioner and database educator.
The value of this episode is in connecting organizational decisions with architecture trade-offs, not treating them as separate concerns.
What changed in 2025
- Teams are no longer debating storage engines alone; they are redesigning the platform operating model.
- The growth of data products increased demand for self-service and made weak governance more expensive.
- Separating storage from compute and using open table formats became baseline expectations, not experiments.
- LLM and low-latency use cases raised the bar for data freshness and pipeline observability.
- Data-platform economics, including FinOps, became an architectural constraint alongside latency and reliability.
Related chapter
T-Bank data platform overview
A practical case of data-platform evolution from DWH to lakehouse architecture.
Data-platform operating models
Centralized platform team
Best fit: Early stage companies or organizations with a limited number of domains.
Strengths
- Consistent standards for quality, security, and tooling.
- Fast rollout of core platform capabilities.
Risks
- The platform team becomes a bottleneck for domain delivery.
- Weak ownership of source data quality in product teams.
Hybrid model (platform + domain squads)
Best fit: Mid-size and large companies with multiple products and different data SLAs.
Strengths
- Platform team builds shared capabilities while domains own data products.
- Lower time-to-data without sacrificing standards or cost control.
Risks
- Requires explicit ownership boundaries and interface contracts.
- Without governance, quality and semantics diverge quickly.
Federated model (mature data mesh)
Best fit: Very large organizations with highly autonomous domain units.
Strengths
- Maximum domain velocity and local accountability.
- Scalable ownership model without centralizing every decision.
Risks
- High risk of fragmented local solutions if shared platform capabilities are weak.
- Coordination cost grows fast when platform investment is insufficient.
Reference architecture for a data platform
Ingestion and capture layer
Focus: CDC, event buses, batch connectors, and contract-based ingestion.
The main goal is predictable delivery into the platform with schema and freshness control.
Storage and table format layer
Focus: Object storage + open table formats (Iceberg/Delta/Hudi), partitioning, compaction.
This layer provides durability, schema evolution, and compute isolation from physical storage.
Compute and transform layer
Focus: Batch/stream processing, dbt/SQL transforms, orchestration, and retries.
This is where domain-ready data marts, freshness targets, and repeatable pipelines are formed.
Serving and consumption layer
Focus: BI, reverse ETL, feature stores, API access to data products, MPP serving.
Consumption must stay self-service while still controlled by cost and access policies.
Governance and reliability layer
Focus: Catalog, lineage, data contracts, quality checks, observability, incident playbooks.
Without this layer, hidden debt accumulates and business trust in the platform degrades.
Common anti-patterns
Data mesh without a real platform
Problem: Teams are declared autonomous but do not receive shared capabilities for contracts, quality, and discoverability.
Fix: Build platform layer + governance minimum first, then scale federation.
One MPP as the universal answer
Problem: MPP may cover part of OLAP serving, but not ingestion reliability, ownership, and contract management.
Fix: Treat MPP as one serving component inside a broader data platform architecture.
Raw data without product ownership
Problem: No one owns semantic quality, so downstream teams build conflicting metrics and business logic.
Fix: Assign data product owners and publish explicit quality/freshness SLOs.
Technology-first, outcome-last
Problem: Stack modernization happens, but time-to-data for product teams does not improve.
Fix: Tie every platform initiative to business metrics: lead time, reliability, or cost per query.
Patterns that consistently work
- Data contracts as mandatory interfaces between producer and consumer teams.
- Unified catalog + lineage for dependency discovery and impact analysis.
- Standardized Bronze/Silver/Gold layers plus a semantic layer for predictable data flow.
- Product-oriented ownership with SLOs for freshness, completeness, and schema stability.
- FinOps loop for data platform: compute budgeting, storage optimization, chargeback/showback.
180-day implementation roadmap
Baseline and constraint map
Inventory sources, critical pipelines, current SLAs, and incidents. Build an explicit architectural baseline and focus on constraints that block product delivery.
Minimum governance and shared interfaces
Introduce data contracts, catalog, and basic quality checks for critical domains. Formalize schema evolution rules and ownership boundaries.
Self-service capabilities
Launch reusable ingestion and transformation templates, observability standards, and repeatable pipeline setup to reduce time-to-first-dataset.
Economics and scaling the model
Enable showback/chargeback, optimize compute/storage, and expand domain ownership. At this stage, hybrid operating model usually becomes the default.
Related chapter
Data Governance & Compliance
Quality, lineage, and access-control practices for data platforms.
Platform maturity metrics
Time-to-first-dataset
Target: < 1 week
How quickly new business use cases can get production-ready data.
Pipeline reliability
Target: 99.9%+
Share of successful critical pipeline runs without manual intervention.
Freshness SLO compliance
Target: >= 95%
How often data meets declared freshness windows.
Schema incident rate
Target: Quarter-over-quarter reduction
Frequency of incidents caused by incompatible schema changes.
Cost per successful insight
Target: Controlled downward trend
Platform economics linked to actual business outcomes.
Reusability ratio
Target: > 60%
Share of reusable data products and standardized pipelines.
Related materials and references
- YouTube: Data platforms 2025 - Full interview recording.
- Telegram: Book Cube - Short summary of the episode with key points.
- Yandex Music - Audio version of the podcast.
- Podster.fm - Alternative audio platform.
- T-Bank data platform overview - Practical case of platform evolution and architecture decisions.
- Data Mesh in Action - Domain ownership, self-service platform, and federated governance.
- Data Pipeline / ETL / ELT Architecture - How to design ingestion, transform, and serving layers.
- Data Governance & Compliance - Data quality, lineage, and compliance practices.
- Apache Iceberg architecture - Open table formats and the transactional lakehouse layer.
- ClickHouse - Fast analytical serving layer for data products.

