Before writing your first SQL query, let’s settle two things: where you’ll run SQL, and what data you’ll use. Both are already handled for you.

Every chapter in this course includes an embedded SQL editor, so you can run queries directly in your browser with no installation, setup, or signup required.

Click "Run" on this query or edit and run to confirm the environment is working:

SQL

The StreamFlow database (described below) is preloaded, which means every query in the course can be edited, re-run, and experimented with right where you read it.

PostgreSQL

Throughout this course, we’ll primarily use PostgreSQL as the default SQL dialect.

PostgreSQL is one of the most widely used relational databases in the world. It is powerful, standards-compliant, feature-rich, and an excellent choice for learning SQL because it balances clean syntax with advanced real-world capabilities.

All examples, exercises, and query outputs in this course are written with PostgreSQL in mind by default.

That said, the core ideas you learn here apply across most relational databases. When syntax or behavior differs between PostgreSQL, MySQL, and SQL Server, the chapter will call it out explicitly so you understand both the PostgreSQL version and the cross-database differences.

The goal is not just to teach one database, but to help you build SQL skills that transfer anywhere.

The StreamFlow Dataset

StreamFlow is a fictional music streaming platform inspired by services like Spotify. Users sign up, choose subscription plans, follow artists, create playlists, and stream tracks. Behind the scenes, the platform also runs marketing campaigns, processes payments, serves ads, and tracks user activity.

The dataset is designed to feel like a real production system. It is realistic enough to model real business problems, small enough to understand quickly, and rich enough to demonstrate every major SQL concept in this course.

It also includes the kinds of messy details real data often has:

NULL values for optional or missing information
Hierarchical genres such as Rock → Alternative Rock
Self-referential relationships like users following other users
JSON columns for flexible metadata
Time-series streaming events for analytics and reporting

Every example, exercise, and interview-style problem in this course uses the same dataset.

That means once you understand the schema, you can focus on writing better SQL instead of constantly relearning new tables and relationships.

Sample Data

Most chapters include their own focused sample data through INSERT queries.

The browser environment already has all tables pre-created, so you can start querying immediately. If a chapter needs additional or specially shaped data, such as overlapping subscriptions, intentional duplicates, or sparse time-series records, it includes its own INSERT block at the top of the page.

This keeps every chapter self-contained and ensures the examples behave exactly as intended.

Schema at a Glance

Five clusters worth recognizing:

People: users, subscriptions, payments. Who is on the platform, what they pay for.
Catalog: artists, albums, tracks, genres. What's available to listen to.
Activity: streams. Every play event. This is the largest table and drives most analytical questions.
Social: playlists, playlist_tracks, follows, artist_follows. How users curate and connect.
Marketing: campaigns, ad_impressions. What ads ran and who saw them.

The Tables

All 14 tables are pre-created in the browser environment. The full schema is below for reference. You don't need to run any of these CREATE TABLE statements; they're listed so you know what columns exist when you write queries.

SQL

Table Reference

Table	Purpose	Key Columns
`users`	Account holders	`user_id`, `country`, `subscription_tier`, `signup_date`
`subscriptions`	Plan history per user, including renewals and cancellations	`user_id`, `plan`, `start_date`, `end_date`
`genres`	Music genres, with optional parent for hierarchy	`genre_id`, `parent_genre_id`
`artists`	Performing artists, optionally managed by another artist	`artist_id`, `verified`, `manager_artist_id`
`albums`	Albums released by artists	`album_id`, `artist_id`, `genre_id`
`tracks`	Individual songs on albums; carries JSONB `metadata`	`track_id`, `album_id`, `duration_seconds`
`streams`	One row per play event. The largest table, used heavily for analytics	`user_id`, `track_id`, `started_at`, `duration_listened`
`playlists`	User-created playlists, optionally forked from another	`creator_user_id`, `forked_from_playlist_id`
`playlist_tracks`	Many-to-many between playlists and tracks, with ordering	`playlist_id`, `track_id`, `position`
`follows`	User-to-user follow graph	`follower_user_id`, `followed_user_id`
`artist_follows`	User-to-artist follow graph	`user_id`, `artist_id`
`payments`	Payment transactions, including refunds	`user_id`, `amount`, `status`
`campaigns`	Marketing campaigns across channels	`campaign_id`, `channel`, `start_date`, `end_date`
`ad_impressions`	Ad views and clicks tied to a campaign	`user_id`, `campaign_id`, `clicked`

Local Setup (Optional)

If you want to follow along outside the browser, you can run StreamFlow locally on PostgreSQL, MySQL, or SQL Server. Postgres is the recommended default; the syntax in the course matches what Postgres expects.

PostgreSQL:

SQL

Other databases: A few SERIAL/BIGSERIAL and JSONB types need adjustment. The mappings:

PostgreSQL	MySQL	SQL Server
`SERIAL`	`INT AUTO_INCREMENT`	`INT IDENTITY(1,1)`
`BIGSERIAL`	`BIGINT AUTO_INCREMENT`	`BIGINT IDENTITY(1,1)`
`JSONB`	`JSON`	`NVARCHAR(MAX)` (or `JSON` in Azure SQL)
`TIMESTAMP WITH TIME ZONE`	`DATETIME`	`DATETIMEOFFSET`
`BOOLEAN`	`TINYINT(1)`	`BIT`

Re-running the chapter INSERTs locally hits a duplicate-key error if the data is already there. Two ways around it:

SQL

The browser environment handles this for you automatically on Reset.

Course Setup