Last Updated: December 31, 2025
Spotify is a music streaming platform that allows users to search, discover, and listen to millions of songs and podcasts on demand. It provides personalized recommendations, curated playlists, offline downloads, and features like shuffle, repeat, and cross-device syncing.
Loading simulation...
Users can follow artists, share playlists, and explore trending or new releases.
With over 600 million monthly active users (MAU) and 200 million paid users, Spotify is the most popular music streaming platform in the world.
Other Popular Examples: Apple Music, Amazon Music, YouTube Music
This system design problem touches on several fundamental concepts: content delivery at global scale, real-time streaming protocols, search infrastructure, recommendation systems, and high availability architecture.
In this chapter, we will walk through the high-level design of a music streaming platform like Spotify.
Let’s begin by clarifying the requirements.
Before jumping into architecture diagrams, we need to understand what we are building.
Music streaming might seem straightforward, but the requirements can vary significantly. Are we designing for millions or hundreds of millions of users? Do we need to support offline playback? How sophisticated should the recommendation engine be? These questions shape our design decisions.
Here is how a requirements discussion might unfold in an interview:
Candidate: "How many users should the system support, and what does daily usage look like?"
Interviewer: "Let's design for 500 million total users, with about 200 million daily active users. On an average day, we see around 1 billion streams."
Candidate: "That's substantial scale. A billion streams per day means we are dealing with serious CDN and bandwidth requirements. Should I focus primarily on music, or do we need to handle podcasts and other audio formats as well?"
Interviewer: "Focus on music streaming for now. Podcasts have different characteristics like longer duration and less repeat listening, so you can mention them but don't need to design for them in detail."
Candidate: "Understood. What about the user experience around playback? I am thinking about latency requirements and whether users can download content for offline listening."
Interviewer: "Latency is critical. Users expect to hear audio within 200ms of pressing play. As for offline playback, yes, that's an important feature for premium subscribers, especially for people with limited data plans or unreliable connectivity."
Candidate: "Personalization is a big part of what makes these services sticky. Should I include the recommendation system in the design, or treat it as out of scope?"
Interviewer: "Cover the high-level approach to recommendations, though you don't need to dive deep into the ML algorithms."
Candidate: "One more question: should I design the content ingestion pipeline, meaning how artists upload music to the platform?"
Interviewer: "That's a separate system entirely. Focus on the consumer-facing streaming experience."
This conversation reveals several important constraints that will influence our design. Let's formalize these into functional and non-functional requirements.
Based on our discussion, here are the core features our system must support:
Beyond features, we need to consider the qualities that make the system production-ready:
Before diving into the design, let's run some quick calculations to understand the scale we are dealing with. These numbers will guide our architectural decisions, particularly around CDN infrastructure, storage, and database selection.
Starting with the numbers from our requirements discussion:
We expect 1 billion streams per day. Let's convert this to queries per second (QPS):
Traffic is rarely uniform throughout the day. Music streaming sees significant spikes during morning and evening commutes, lunch breaks, and weekends. During peak hours, we might see 3x the average load:
Every stream request triggers additional metadata lookups: song info, artist details, album art URLs, lyrics. Users also browse, search, and load playlists. This metadata traffic is typically 5-10x higher than the stream traffic:
These numbers are significant. At peak, we are handling over 300,000 requests per second across all services. This is why we need aggressive caching at multiple layers.
Music streaming has an interesting storage profile: a large catalog of audio files that rarely changes, combined with user data that grows continuously.
The audio catalog is where most of the storage goes:
One petabyte is substantial but manageable with object storage services like S3. The key insight is that this data is mostly static. Songs don't change once uploaded, which makes caching highly effective.
Song metadata is relatively small but needs to be fast:
| Data Type | Estimate | Notes |
|---|---|---|
| Song metadata | 100M × 2 KB = 200 GB | Title, artist, album, duration, genre |
| Artist profiles | 10M × 5 KB = 50 GB | Bio, images, discography |
| Album data | 20M × 3 KB = 60 GB | Track lists, artwork, release info |
This comfortably fits in memory for caching, which is important for the response times we need.
User data grows continuously and needs different handling:
| Data Type | Estimate | Notes |
|---|---|---|
| User profiles | 500M × 2 KB = 1 TB | Preferences, settings, subscription |
| Playlists | 500M × 5 KB = 2.5 TB | Average user has 5-10 playlists |
| Listening history | 500M × 8 KB = 4 TB | Last 6 months of plays |
Bandwidth is where music streaming gets expensive. Each stream transfers significant data:
Five petabytes per day of bandwidth is massive. At typical cloud egress rates ($0.05-0.09 per GB), this would cost millions per month without optimization. This is exactly why CDN infrastructure is not optional for music streaming. It is essential for both performance and cost control.
These estimates reveal several important design implications:
With our requirements and scale understood, let's define the API contract. A music streaming service needs APIs for search, playback, playlist management, and recommendations. Let's walk through each one.
GET /searchThis is the entry point for discovery. Users type a query and expect relevant songs, artists, albums, and playlists back within milliseconds.
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The search term (e.g., "bohemian rhapsody", "queen") |
type | string | No | Filter by type: "song", "artist", "album", "playlist". Default: all types |
limit | integer | No | Number of results per type. Default: 20 |
offset | integer | No | Pagination offset for results |
| Status Code | Meaning | When It Occurs |
|---|---|---|
400 Bad Request | Invalid input | Empty query or malformed parameters |
429 Too Many Requests | Rate limited | Too many searches in short period |
GET /songs/{song_id}/streamThis is the critical path for playback. Instead of streaming audio directly through our API servers, we return a signed URL that points to a CDN edge server. This approach offloads bandwidth from our application tier and ensures users get audio from a server close to them.
| Parameter | Type | Description |
|---|---|---|
song_id | string | The unique identifier for the song |
| Parameter | Type | Required | Description |
|---|---|---|---|
quality | string | No | Preferred quality: "low" (96kbps), "medium" (160kbps), "high" (320kbps) |
The signed URL includes an authentication token and expiration time. This prevents URL sharing (the token is tied to the user) and limits the window for unauthorized access.
| Status Code | Meaning | When It Occurs |
|---|---|---|
401 Unauthorized | Not authenticated | Missing or invalid auth token |
403 Forbidden | Access denied | Song unavailable in user's region, or subscription does not allow this quality |
404 Not Found | Song not found | Invalid song_id |
POST /playlistsCreates a new playlist for the authenticated user.
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Playlist name (max 100 characters) |
description | string | No | Playlist description |
is_public | boolean | No | Whether the playlist is publicly visible. Default: true |
POST /playlists/{playlist_id}/songsAdds one or more songs to an existing playlist. Songs are added at the end of the playlist by default.
The optional position parameter allows inserting songs at a specific index. If omitted, songs are appended.
| Status Code | Meaning | When It Occurs |
|---|---|---|
403 Forbidden | Not owner | User does not own the playlist |
404 Not Found | Not found | Playlist or one of the songs does not exist |
GET /recommendationsReturns personalized song recommendations based on the user's listening history and optional seed inputs.
| Parameter | Type | Required | Description |
|---|---|---|---|
seed_songs | string | No | Comma-separated song IDs to base recommendations on |
seed_artists | string | No | Comma-separated artist IDs |
limit | integer | No | Number of recommendations (1-100). Default: 20 |
The reason field provides transparency about why each song was recommended, which helps users understand and trust the recommendations.
Now we get to the interesting part: designing the system architecture. Rather than presenting a complex diagram upfront, we will build the design incrementally, starting with the core user journey (streaming music) and adding components as we address each requirement. This mirrors how you would approach the problem in an interview.
Our system needs to handle four core operations:
The system is heavily read-intensive. For every song added to the catalog (which happens through a separate ingestion pipeline), there are millions of streams. This read-to-write asymmetry means we should optimize aggressively for read performance, and caching will be essential at every layer.
Let's visualize the two primary paths through our system:
Notice how the stream path separates the control plane (getting a signed URL) from the data plane (actual audio delivery via CDN). This separation is key to handling billions of streams without our API servers becoming a bottleneck.
Let's build this architecture step by step.
This is the core functionality. When a user taps play, audio must start within 200ms. Once playing, buffering should be rare even on variable network conditions. Let's design for this critical path first.
When a user clicks play on a song, several things need to happen:
Let's introduce the components we need to make this work.
Every request enters through the API Gateway. Think of it as the front door to our system, handling concerns that are common across all requests.
The gateway terminates SSL connections, validates request format, enforces rate limits, authenticates users via tokens, and routes requests to the appropriate backend service. By handling these cross-cutting concerns at the edge, we keep our application services focused on business logic.
This service handles the critical decision: should this user be allowed to play this song? It checks:
If all checks pass, the service generates a signed URL that grants temporary access to the audio file on the CDN. The URL includes an authentication token and expiration time, typically 15-30 minutes. This prevents URL sharing and limits the window for unauthorized access.
This is where the magic happens for low-latency playback. A CDN is a network of edge servers distributed globally. When a user in Tokyo requests audio, instead of fetching from our origin in Virginia, they get it from a nearby edge server.
For popular songs (think Taylor Swift's latest release), the audio is already cached at edge locations worldwide. The CDN can serve millions of concurrent streams without touching our origin servers. This is essential given our bandwidth estimates of 5 PB per day.
The actual audio files live in object storage (S3, Google Cloud Storage, or Azure Blob). Files are organized by song ID and quality level:
Each song exists in multiple quality levels to support adaptive bitrate streaming and different subscription tiers.
Let's trace through this step by step:
Why this design works: By separating the control plane (authorization and URL generation) from the data plane (audio delivery), we can handle billions of streams. Our API servers handle lightweight authorization requests while the heavy lifting of audio delivery is handled by CDN infrastructure designed for exactly this purpose.
Users need to find any song instantly. They type "bohemian" and expect to see Queen's Bohemian Rhapsody appear within milliseconds. Search must be fast (under 100ms), handle typos ("bohemain"), partial matches, and return results ranked by relevance.
This service handles all search queries. It does not query the primary database directly. Instead, it queries a dedicated search index optimized for text matching.
The Search Service parses queries, handles autocomplete requests, applies filters, and ranks results based on relevance and popularity. It also caches frequent queries since many users search for the same popular songs and artists.
We use Elasticsearch (or a similar search engine like OpenSearch) for full-text search. Elasticsearch is purpose-built for this use case:
The search index contains denormalized data: song titles, artist names, album names, genres, and popularity scores. When a new song is added to the catalog, an indexing pipeline updates Elasticsearch.
Why cache search results? Popular searches are repeated constantly. "Taylor Swift", "Drake", and "Bohemian Rhapsody" are searched thousands of times per minute. Caching these results for even a few minutes dramatically reduces load on Elasticsearch.
Users create playlists to organize their music. A playlist can contain anywhere from a few songs to thousands. Users expect to add songs instantly and have changes sync across all their devices.
Manages all playlist CRUD operations. Key responsibilities:
We need a database that handles:
We will discuss the specific database choice in the Database Design section.
The cache invalidation is important: when a user adds a song on their phone, the change must appear immediately when they open the app on their laptop. By invalidating the cache, we ensure the next request fetches fresh data.
This is what makes modern streaming services sticky. Users expect the app to know their taste and surface music they will love but haven't discovered yet. Features like "Discover Weekly" and "Daily Mix" drive significant engagement.
Recommendations combine two fundamentally different patterns: batch processing (generating recommendations offline) and real-time serving (delivering them instantly when requested).
This batch processing system runs periodically (daily or weekly) to:
The pipeline produces recommendations that are stored and ready to serve. This is how features like "Discover Weekly" work: they are generated once per week and cached.
Stores computed features about users and songs:
Serves recommendations in real-time. For most requests, it simply fetches pre-computed recommendations from cache. For seed-based recommendations ("songs similar to X"), it may compute results on the fly using the feature store.
The key insight is that most recommendations are pre-computed. "Discover Weekly" is generated once per week for each user. When a user opens the app, we simply fetch their pre-computed playlist from cache. This approach allows us to use sophisticated ML models that would be too slow to run in real-time.
Now that we have designed each requirement, let's step back and see the complete architecture:
The architecture follows a layered approach, with each layer having a specific responsibility:
Client Layer: Users access the service through mobile apps, web browsers, or desktop applications. From our perspective, they all look the same: HTTP requests with authentication tokens.
Edge Layer: The CDN handles audio delivery at the edge, close to users geographically. The load balancer distributes API traffic across gateway instances.
Gateway Layer: The API Gateway handles authentication, rate limiting, and request routing. It is the single entry point for all API traffic.
Application Layer: Stateless services implement the business logic. Each service is independently deployable and horizontally scalable.
Cache Layer: Redis provides low-latency access to frequently requested data: song metadata, user sessions, search results, and pre-computed recommendations.
Data Layer: Different databases for different access patterns: Elasticsearch for search, relational databases for metadata and playlists, and potentially Cassandra for high-write-volume data like listening history.
ML Infrastructure: Offline pipelines train models and generate recommendations. The feature store provides fast access to user and song features.
| Component | Primary Responsibility | Scaling Strategy |
|---|---|---|
| CDN | Audio delivery, edge caching | Managed service (auto-scales) |
| Load Balancer | Traffic distribution | Managed service |
| API Gateway | Auth, rate limiting, routing | Horizontal (stateless) |
| Playback Service | Stream authorization, URL signing | Horizontal (stateless) |
| Search Service | Query parsing, result ranking | Horizontal (stateless) |
| Playlist Service | Playlist CRUD operations | Horizontal (stateless) |
| Recommendation Service | Serve personalized content | Horizontal (stateless) |
| Elasticsearch | Full-text search | Add nodes to cluster |
| Redis Cluster | Hot data caching | Add nodes, shard by key |
| Databases | Persistent storage | Read replicas, sharding |
| Object Storage | Audio files | Managed (infinite scale) |
This architecture handles our requirements well: the CDN absorbs audio streaming traffic, the cache layer handles most read traffic, and the stateless services scale horizontally for everything else.
With the high-level architecture in place, let's zoom into the data layer. Choosing the right database and designing an efficient schema are critical decisions that affect performance, scalability, and operational complexity.
Music streaming has diverse data with very different access patterns. Song metadata needs complex queries and relationships. Playlists need fast reads by user. Listening history is append-heavy with billions of writes. No single database excels at all of these.
This is where polyglot persistence shines: using different databases for different data types, each optimized for its specific access pattern.
Let's think through why each data type maps to a specific database:
Song, artist, and album data has natural relationships: songs belong to albums, albums belong to artists, artists can collaborate on songs. A relational database handles these relationships well with foreign keys and joins.
We need complex queries: find all songs by this artist, find albums released in this year, get the top 50 songs in this genre. PostgreSQL's rich query language handles this naturally.
The catalog (100 million songs) changes slowly and fits comfortably in a single database with read replicas. No need for the complexity of sharding.
We already discussed this: full-text search, fuzzy matching, relevance scoring. Elasticsearch is purpose-built for these requirements. It sits alongside PostgreSQL, receiving updates when the catalog changes.
Playlist access patterns are user-centric: show me all my playlists, load this specific playlist, add a song to this playlist. These queries always include a user_id, making it a perfect partition key.
Cassandra excels here because:
Session data (is this user logged in?) and pre-computed recommendations need sub-millisecond reads. Redis keeps this data in memory and handles millions of reads per second.
Redis also supports TTLs natively, which is perfect for session expiration and recommendation refresh cycles.
This is the highest-write-volume data in the system. With 1 billion streams per day, we are writing 11,500+ events per second. Cassandra handles this write throughput while still supporting efficient reads by user (show me my recent listening history).
The partition key is user_id, and the clustering key is listened_at timestamp (descending). This means we can efficiently query "last 100 songs this user played" without scanning the entire table.
| Data Type | Database | Key Reasoning |
|---|---|---|
| Song Metadata | PostgreSQL | Complex relationships, rich queries, stable catalog |
| Search Index | Elasticsearch | Full-text search, fuzzy matching, relevance scoring |
| Playlists | Cassandra | User-partitioned, high read/write throughput |
| User Sessions | Redis | Sub-millisecond reads, TTL support |
| Listening History | Cassandra | High write volume, time-series access pattern |
| Recommendations | Redis | Pre-computed, needs instant reads |
Now let's design the schema for each data store.
This is the heart of our catalog. Each row represents one song.
| Field | Type | Description |
|---|---|---|
song_id | UUID (PK) | Unique identifier for the song |
title | VARCHAR(255) | Song title |
artist_id | UUID (FK) | Reference to artists table |
album_id | UUID (FK) | Reference to albums table |
duration_ms | INTEGER | Song duration in milliseconds |
release_date | DATE | When the song was released |
genres | VARCHAR[] | Array of genre tags |
audio_path | VARCHAR(500) | Path to audio in object storage (e.g., "songs/abc123/") |
play_count | BIGINT | Total play count (updated periodically) |
created_at | TIMESTAMP | When the record was created |
Indexes:
| Field | Type | Description |
|---|---|---|
artist_id | UUID (PK) | Unique identifier for the artist |
name | VARCHAR(255) | Artist name |
bio | TEXT | Artist biography |
image_url | VARCHAR(500) | Profile image URL on CDN |
monthly_listeners | INTEGER | Current monthly listener count (updated daily) |
verified | BOOLEAN | Whether artist is verified |
created_at | TIMESTAMP | When the record was created |
Cassandra tables are designed around query patterns. Our primary queries are:
The partition key is user_id, so all of a user's playlists are on the same node. The clustering key orders playlists by creation date (newest first).
The partition key is playlist_id, and songs are ordered by position. This makes loading a playlist efficient: one query returns all songs in order.
With this schema, we can efficiently query "last 50 songs user X played" since data is partitioned by user and ordered by timestamp.
Why Cassandra handles the write volume: With 200 million DAU and an average of 5 songs per session, we are writing 1 billion events per day. Cassandra distributes this across nodes and handles writes without locks, achieving the throughput we need.
The high-level architecture gives us a solid foundation, but system design interviews often go deeper into specific components. In this section, we will explore the trickiest parts of our design: audio streaming architecture, search infrastructure, personalization, scaling strategies, and offline playback.
These are the topics that distinguish a good system design answer from a great one.
Delivering audio with low latency and high quality is the core technical challenge. A user clicks play and expects music within 200ms. Once playing, there should be no buffering even when network conditions change. Let's explore how to achieve this at scale.
Before a song can be streamed, it goes through an ingestion pipeline that prepares it for delivery:
The original high-quality audio (often WAV or FLAC from the label) is converted to multiple quality levels:
| Quality | Bitrate | File Size (4 min song) | Target Users |
|---|---|---|---|
| Very High | 320 kbps | ~10 MB | Premium users on WiFi |
| High | 160 kbps | ~5 MB | Standard quality |
| Normal | 96 kbps | ~3 MB | Low bandwidth/data saver |
| Low | 24 kbps | ~720 KB | Extreme data saver |
Multiple quality levels enable adaptive streaming: if network conditions degrade, the client can switch to a lower quality mid-song rather than buffering.
Each quality variant is split into small chunks, typically 5-10 seconds each. A 4-minute song becomes roughly 24-48 chunks per quality level.
Why chunk? Several reasons:
Each chunk is encrypted for digital rights management. The client decrypts on playback using a license key tied to the user's subscription. This prevents users from extracting and redistributing audio files.
There are several ways to deliver audio to clients. Let's compare the approaches:
The simplest approach: the client requests the complete audio file and downloads it sequentially.
Pros:
Cons:
The modern standard for adaptive streaming. Audio is pre-chunked, and a manifest file describes available qualities and chunk URLs.
How it works:
Pros:
Cons:
Spotify uses a proprietary streaming protocol optimized specifically for music. Key optimizations include:
Use HLS/DASH as your answer. It's well understood, handles the core requirements (adaptive streaming, CDN compatibility), and is what most new streaming services would choose. Mention that mature services like Spotify have evolved custom protocols for specialized optimizations.
With 5 PB of daily bandwidth, CDN caching is essential for both cost and latency. But not all songs are equal: some are played millions of times per day, while others might be played once per year.
Music consumption follows an extreme power law:
This distribution makes caching highly effective. Taylor Swift's latest single is requested millions of times per day and should be cached at every edge location. An obscure jazz recording from the 1960s might be requested once per month and can stay at origin.
For audio chunks, we set long cache times since content never changes:
The immutable directive tells browsers and CDNs that the content will never change, eliminating conditional requests.
Search is how users find music. They type a few letters and expect to see relevant results instantly, even if they misspell the artist name or only remember part of the song title. Search must be fast (under 100ms), forgiving of errors, and smart about ranking.
We do not query the primary database for search. Instead, we maintain a dedicated search index using Elasticsearch (or OpenSearch). This separation allows us to optimize each system for its specific workload.
| Field | Source | Notes |
|---|---|---|
| Song titles | Song metadata | Primary search target |
| Artist names | Artist table | Includes aliases and collaborations |
| Album names | Album table | Including compilation names |
| Lyrics | Lyrics service | For "search by lyric" feature |
| Genres and moods | Tagging system | "upbeat pop", "chill electronic" |
Elasticsearch uses inverted indexes, mapping terms to the documents containing them:
When a user searches "bohemian rhapsody", Elasticsearch finds documents matching both terms and ranks them by relevance.
Users frequently misspell artist and song names. "Ariana Grandi", "Tylor Swift", "bohemain rapsody" should all return correct results.
Elasticsearch provides several mechanisms:
Raw text matching is not enough. When searching "queen", users probably want the legendary rock band, not every song with "queen" in the title. We rank results using multiple signals:
| Signal | Weight | Description |
|---|---|---|
| Text relevance | High | Exact matches rank higher than partial matches |
| Popularity | High | Monthly listeners, total streams, trending velocity |
| Personalization | Medium | Boost artists the user has listened to before |
| Recency | Low | New releases get a small boost |
The weights are tuned based on user behavior data. If users consistently click the 3rd result, something is wrong with the ranking.
As users type, we show suggestions in real-time. This requires extremely low latency (under 50ms) because users are actively typing and waiting.
Personalization is what makes Spotify addictive. When you open the app and "Discover Weekly" has exactly the kind of music you love but have never heard, it feels like magic. That magic is the result of sophisticated recommendation systems running at massive scale.
The recommendation system learns user preferences from multiple signals, each providing different information:
| Signal | Type | Strength | Notes |
|---|---|---|---|
| Liked songs | Explicit | Very strong | User clicked heart, clear preference |
| Added to playlist | Explicit | Strong | Deliberate curation |
| Listening history | Implicit | Medium | What they play repeatedly |
| Skip behavior | Implicit | Medium | Skipping within 30 seconds indicates dislike |
| Listen duration | Implicit | Weak | Full song = engagement, partial = maybe skip |
| Time of day | Context | Weak | Morning playlists differ from night |
There are several approaches to generating recommendations, each with trade-offs:
The core idea: users with similar taste like similar songs. If you and I both love the same 50 songs, and you love a song I have not heard, I will probably like it too.
How it works:
Pros:
Cons:
The idea: recommend songs similar to what the user already likes, based on the songs' characteristics.
How it works:
Pros:
Cons:
Real systems combine multiple approaches:
The ensemble layer learns how to weight each model's predictions. For new users, content-based gets more weight. For users with rich history, collaborative filtering dominates.
Different recommendation features have different latency requirements:
| Feature | Compute Time | Refresh Rate | Infrastructure |
|---|---|---|---|
| Discover Weekly | Hours | Weekly | Spark batch jobs |
| Daily Mix | Hours | Daily | Spark batch jobs |
| "Because you listened to..." | <100ms | Per request | Model serving + feature store |
| Radio (similar to current song) | <100ms | Per request | Nearest neighbor search |
Batch recommendations (Discover Weekly) run offline using full model training. They are pre-computed for every user and stored in Redis for instant serving.
Real-time recommendations use pre-trained models with real-time features. When you finish a song, we query the model with your current session context to suggest the next song.
With 200 million daily active users and 1 billion streams per day, the system must be designed for extreme scale. Equally important, it must stay up. Users expect music to be available 24/7, and outages are highly visible on social media.
The key to horizontal scaling is keeping services stateless. When a service stores no state locally, you can add or remove instances without coordination.
All our application services are stateless:
We use Horizontal Pod Autoscaler (HPA) to automatically add capacity:
| Metric | Target | Action |
|---|---|---|
| CPU utilization | 70% | Add pods when exceeded |
| Request latency (p99) | 100ms | Add pods if latency spikes |
| Request queue depth | 100 | Add pods if queue builds up |
During peak hours (evening commute, weekends), the system automatically scales up. During low-traffic periods (3 AM), it scales down to save costs.
Different databases scale differently:
Cassandra scales horizontally by adding nodes. Data is automatically rebalanced across the cluster using consistent hashing.
Each user's data lives on a subset of nodes (replication factor = 3). Reads and writes for a user hit the same nodes, making operations efficient.
For the song catalog, we use read replicas rather than sharding:
Elasticsearch scales by adding nodes to the cluster. Indexes are automatically sharded across nodes.
Caching is essential at every layer to handle our traffic:
| Layer | What's Cached | TTL | Hit Rate |
|---|---|---|---|
| Client | Recently played, user preferences | 1 hour | Very high |
| CDN | Audio files, album art, static assets | 24+ hours | 80-90% |
| Redis | Song metadata, recommendations, sessions | 5-15 min | 90%+ |
| Database | Query results in buffer pool | Automatic | 95%+ |
With these cache layers, only a tiny fraction of requests reach the primary databases.
We deploy in multiple geographic regions with active-active configuration:
When components fail, the system should degrade gracefully rather than crash entirely:
| Component | If It Fails | Fallback |
|---|---|---|
| Recommendation Service | Home page shows generic playlists | Serve cached "top hits" |
| Search Service | Search is temporarily unavailable | Show recently played |
| Playlist Service | Cannot modify playlists | Read from cache (read-only mode) |
| CDN | Slower audio delivery | Fall back to origin |
The most important invariant: playback should always work. Users can tolerate degraded recommendations, but not being able to play music is a critical failure.
Protection against abuse and accidental overload:
| Endpoint | Limit | Notes |
|---|---|---|
| Stream requests | 100/min per user | Prevents bot abuse |
| Search requests | 30/min per user | Prevents scraping |
| Playlist writes | 60/min per user | Prevents spam |
| API (unauthenticated) | 10/min per IP | Prevents anonymous abuse |
Rate limits are enforced at the API Gateway using Redis-backed counters with sliding window algorithm.
Premium users can download songs for offline listening. This is essential for users on flights, subways, or in areas with poor connectivity. But it introduces unique challenges: we are giving users permanent copies of copyrighted content that we need to protect and eventually revoke.
The download flow is similar to streaming, but the client stores the content locally instead of playing it immediately:
Downloaded files must be protected from piracy. If users could extract and share downloaded MP3s, the entire licensing model would collapse.
Audio files are encrypted using industry-standard DRM systems:
The decryption keys are stored in secure hardware (when available) and never exposed to the application layer.
Downloaded content does not last forever. Licenses expire after 30 days of offline use. When the device comes online, it must check with the License Server to renew. If the user's subscription has lapsed, renewal fails and downloaded content becomes unplayable.
| Limit | Value | Reason |
|---|---|---|
| Max devices per account | 5 | Prevent account sharing abuse |
| Max downloads per device | 10,000 songs | Storage sanity |
| Offline validity | 30 days | Periodic subscription verification |
The client is responsible for managing downloaded content:
Every stream must be tracked accurately for royalty payments to artists and labels. This is not just a feature; it's a legal and financial requirement. Billions of dollars flow from Spotify to rights holders based on stream counts.
Not every play counts as a stream for royalty purposes:
| Condition | Counts as Stream? | Why |
|---|---|---|
| Played 30+ seconds | Yes | Industry standard threshold |
| Played < 30 seconds | No | Prevents gaming with short plays |
| Automated/bot playback | No | Detected and filtered |
| User actively listening | Yes | Normal usage |
| Background audio with no engagement | Maybe | Complex rules apply |
The 30-second threshold is an industry standard. Playing 29 seconds and skipping does not count. Playing 30 seconds and then skipping does count.
We need to capture every play event, process it at scale, and aggregate for royalty calculation:
play_start: User pressed playplay_pause: User pausedplay_seek: User jumped to different positionplay_complete: Song finished naturallyplay_skip: User skipped before completionStream manipulation is a real problem. Bad actors try to inflate stream counts to earn royalties fraudulently. We detect and filter these:
| Signal | Description | Indicates |
|---|---|---|
| Loop behavior | Same song played 100+ times | Bot or manipulation |
| Geographic impossibility | User in Tokyo, then NYC 5 minutes later | Account compromise |
| Device fingerprint | Virtual machine, emulator, automation | Bot farm |
| No diversity | Only plays one artist, no browsing | Targeted manipulation |
| Payment anomalies | Many streams but subscription always fails | Stolen credentials |
Detected fraud is flagged, excluded from royalty calculations, and investigated. Repeat offenders have accounts suspended.
Stream data feeds into the royalty system that determines artist payments. The calculation is complex and varies by contract, but the basic model is:
This happens monthly, processing billions of stream records to determine millions of payments.
To meet a “play starts within ~200ms” goal globally, which approach is most directly effective for delivering audio bytes quickly?