System Design Space
Knowledge graphSettings

Updated: February 21, 2026 at 11:59 PM

The history of Google TPUs and their evolution

mid

How Google went from TPU v1 for inference to Ironwood: architectural decisions, economics of AI infrastructure and comparison with the GPU approach.

Primary source

book_cube TPU series

A two-part analysis of the emergence of TPU and the evolution of generations.

Open post

This chapter is compiled based on your posts and official Google materials: how TPUs emerged from the infrastructure bottleneck, why Google needed an ASIC approach, and how the architecture evolved from inference-chip to large-scale training platform and back to inference-first in the era of GenAI.

Why did TPUs appear in the first place?

Give a multiple increase in price/performance for ML inference compared to available CPUs/GPUs.

Make a decision quickly and put it into production in a short time.

Maintain cost efficiency as ML loads grow in Google products.

Evolution of TPU by generation

2015

TPU v1

Inference
  • Development in ~15 months from start to deployment.
  • 28 nm process technology, 700 MHz, ~40 W.
  • Benchmark: 92 TOPS INT8, a noticeable jump in energy efficiency.
2017

TPU v2

Training + inference
  • The transition from a “chip for inference” to a train+infer platform.
  • TPU Pod with 256 chip network.
  • Order of magnitude: 180 TFLOPS, 64 GB HBM (according to chapter sources).
2018

TPU v3

Productivity growth
  • Liquid cooling introduced.
  • Compute and memory bandwidth have been significantly increased.
  • Order of magnitude: up to 420 TFLOPS (according to chapter sources).
2021

TPU v4

Scaling pod networks
  • Optical circuit switching to speed up inter-chip communication.
  • Focus on distributed training of large scale models.
  • Order of magnitude: 275 TFLOPS per chip (according to chapter sources).
2023

TPU v5e / v5p

Cost optimization
  • Emphasis on cost-effective training/inference.
  • Improved power efficiency and pod scaling.
  • Support for sparsity and more flexible workload profiles.
2024

TPU v6 Trillium

Performance leap
  • Up to 4.7x compute growth per chip vs TPU v5e (according to Google).
  • Double HBM capacity/throughput and interconnect bandwidth.
  • ~67% higher energy efficiency vs TPU v5e (according to Google).
2025

TPU v7 Ironwood

Inference of the GenAI era
  • A return to the inference-first idea, like TPU v1, but on a new scale.
  • Up to 9,216 chips in a liquid-cooled cluster.
  • Order of magnitude: 4,614 TFLOPS/chip, 192 GB HBM, 7.37 TB/s memory bandwidth.

TPU vs GPU: How to Read the Comparisons

Compute and memory profile

GPUs are usually more versatile, TPUs are more optimized for tensor workloads and are closely integrated with the Google Cloud stack.

Economy

In a number of training/inference sources, TPUs show the best cost per workload, but the estimates strongly depend on the model, batch size and optimization level.

Ecosystem

NVIDIA's CUDA ecosystem is wider; TPUs win in scenarios where the team is already building a pipeline on TensorFlow/JAX and the GCP managed infrastructure.

An important practical point: comparing FLOPS/tokens/dollars without a common methodology easily leads to distortions. Look at the model, precision, batch, interconnect, software stack and operating restrictions.

Strengths and weaknesses of the TPU approach

Pros

  • Specialization in tensor operations and deep learning.
  • High energy efficiency and strong TCO economics in a number of AI scenarios.
  • Deep integration with Google Cloud, TensorFlow and JAX.
  • Good scalability via TPU Pod approach.

Restrictions

  • Availability is primarily via Google Cloud.
  • Less versatility for atypical computing workloads.
  • The tool ecosystem as a whole is narrower than that around CUDA.
  • Risks of vendor lock-in when the architecture is deeply tied to TPU specifics.

References

All numerical comparisons in this chapter are provided as guidelines from the specified sources and require validation for a specific workload.

Related chapters

Enable tracking in Settings

System Design Space

© 2026 Alexander Polomodov