URL Shortener (TinyURL, bit.ly) is a classic System Design interview task. It is ideal for beginners, as it combines a simple concept with interesting architectural solutions: generating unique IDs, scaling the database, and handling high read loads.
Chapter 8
Alex Xu: URL Shortener
Detailed analysis in the book System Design Interview
Why do you need a URL Shortener?
Convenience
Short links are easier to remember, share on social networks and use in SMS
Analytics
Tracking clicks, user geography and traffic sources
Control
Ability to disable a link, set an expiration date or password
Requirements
Functional
- FR1Creating a short link from a long URL
- FR2Redirect via short link to original URL
- FR3Optional TTL (link lifetime)
- FR4Custom alias (optional)
Non-functional
- NFR1100M new URLs per day (write)
- NFR210:1 read/write ratio → 1B redirects/day
- NFR3Latency < 100ms for redirect
- NFR499.9% availability
Back of the Envelope
Traffic
- Write: 100M/day = 1,160 QPS
- Read: 1B/day = 11,600 QPS
- Peak: ~23,000 QPS (2x average)
Storage
- Avg URL size: 500 bytes
- 100M × 500B = 50GB/day
- 5 years: 50GB × 365 × 5 ≈ 90TB
Short URL length
How many characters are needed for a unique identifier? We use base62 (a-z, A-Z, 0-9):
| Length | Combinations | URLs (5 years) |
|---|---|---|
| 6 characters | 62⁶ = 56.8B | Not enough |
| 7 characters | 62⁷ = 3.5T | ✓ Enough |
| 8 characters | 62⁸ = 218T | With reserve |
Conclusion: 7 base62 characters = 3.5 trillion combinations. At 100M URL/day it will last for 96 years.
ID generation strategies
1Hash + Collision Resolution
MD5/SHA256 from URL → take first 7 characters → check for collision
- Deterministic (same URL = same hash)
- No central point of failure
- Collisions require retry + DB lookup
- Difficulty with custom aliases
2Unique ID Generator + Base62Recommended
Get unique numeric ID → convert to base62
- Guaranteed unique (no collisions)
- Simple logic
- Easy to support custom aliases
- Need ID generator (single point?)
- Identical URLs can give different short URLs
ID Generator options
Auto-increment DB
Simple solution with auto_increment primary key.
⚠️ Single point of failure, does not scale
Multi-master DB
Two servers: one generates even IDs, the other generates odd ones.
✓ Easy scaling, but limited by number of masters
UUID
128-bit unique identifier generated on the client.
⚠️ Too long (36 characters), bad for URL
Snowflake IDRecommended
64-bit ID: timestamp + datacenter + machine + sequence.
✓ Distributed, time-sorted, compact
Snowflake
Twitter/X: Snowflake ID
Detailed analysis of the ID generation algorithm
High-Level Architecture
Architecture Map
Browser / App
Edge routing
Stateless API
Redis
Snowflake
PostgreSQL / Cassandra
Highlight a flow
Cache miss shown as a dashed line to the database.
Write Path
- 1. The client sends a long URL
- 2. ID Generator produces a unique ID
- 3. Convert ID to base62 → short URL
- 4. Save mapping in DB
- 5. Return short URL to the client
Read Path
- 1. The client requests a short URL
- 2. Check Cache (Redis)
- 3. Cache miss → query to DB
- 4. Update Cache
- 5. HTTP 301/302 Redirect
301 vs 302 Redirect
301 Moved Permanently
The browser caches the redirect. The following requests go directly to the target URL.
✓ Less load on the server
✗ Clicks cannot be tracked
302 FoundRecommended
The browser does NOT cache. Every click goes through the server.
✓ Full click analytics
✓ You can change the target URL
Data Model
urls table
| Column | Type | Description |
|---|---|---|
| short_url | VARCHAR(7) | Primary key, base62 encoded |
| original_url | TEXT | Original long URL |
| user_id | BIGINT | Link creator (optional) |
| created_at | TIMESTAMP | Creation date |
| expires_at | TIMESTAMP | TTL (null = unlimited) |
Deep Dive
Database Internals
Indexes, B-Trees and optimization for read-heavy workloads
Caching Strategy
A Read/Write ratio of 10:1 makes caching critical. We use Redis to store hot URLs.
Strategy
- Cache-aside: read from cache, if miss goes to DB
- LRU eviction: evict rarely used URLs
- Write-through: when creating, we immediately write to the cache
Cache Size
20% daily reads × avg URL size
= 200M × 500B = 100GB
→ Redis cluster with replication
CDN
Content Delivery Network
Geo-distributed caching for global systems
Selecting a Database
PostgreSQL
- ✓ ACID guarantees
- ✓ Easy to use
- ✓ Good for medium loads
- ✗ Horizontal scaling is more difficult
Cassandra / DynamoDBFor scale
- ✓ Linear horizontal scaling
- ✓ High availability (no single point)
- ✓ Optimized for write-heavy
- ✗ Eventually consistent
Key takeaways from the interview
Show understanding
• Base62 encoding and URL length calculation
• Trade-offs between hash and ID generator
• 301 vs 302 for analytics
• Read-heavy system → focus on caching
Frequent follow-up questions
• How to handle duplicate URLs?
• How to implement custom aliases?
• How to remove expired URLs?
• How to protect yourself from abuse?
