AlgoMaster Logo

Course Setup

Last Updated: May 3, 2026

3 min read

Before writing your first SQL query, let’s settle two things: where you’ll run SQL, and what data you’ll use. Both are already handled for you.

Every chapter in this course includes an embedded SQL editor, so you can run queries directly in your browser with no installation, setup, or signup required.

Click "Run" on this query or edit and run to confirm the environment is working:

SQL

The StreamFlow database (described below) is preloaded, which means every query in the course can be edited, re-run, and experimented with right where you read it.

PostgreSQL

Throughout this course, we’ll primarily use PostgreSQL as the default SQL dialect.

PostgreSQL is one of the most widely used relational databases in the world. It is powerful, standards-compliant, feature-rich, and an excellent choice for learning SQL because it balances clean syntax with advanced real-world capabilities.

All examples, exercises, and query outputs in this course are written with PostgreSQL in mind by default.

That said, the core ideas you learn here apply across most relational databases. When syntax or behavior differs between PostgreSQL, MySQL, and SQL Server, the chapter will call it out explicitly so you understand both the PostgreSQL version and the cross-database differences.

The goal is not just to teach one database, but to help you build SQL skills that transfer anywhere.

The StreamFlow Dataset

StreamFlow is a fictional music streaming platform inspired by services like Spotify. Users sign up, choose subscription plans, follow artists, create playlists, and stream tracks. Behind the scenes, the platform also runs marketing campaigns, processes payments, serves ads, and tracks user activity.

The dataset is designed to feel like a real production system. It is realistic enough to model real business problems, small enough to understand quickly, and rich enough to demonstrate every major SQL concept in this course.

It also includes the kinds of messy details real data often has:

  • NULL values for optional or missing information
  • Hierarchical genres such as Rock → Alternative Rock
  • Self-referential relationships like users following other users
  • JSON columns for flexible metadata
  • Time-series streaming events for analytics and reporting

Every example, exercise, and interview-style problem in this course uses the same dataset.

That means once you understand the schema, you can focus on writing better SQL instead of constantly relearning new tables and relationships.

Sample Data

Most chapters include their own focused sample data through INSERT queries.

The browser environment already has all tables pre-created, so you can start querying immediately. If a chapter needs additional or specially shaped data, such as overlapping subscriptions, intentional duplicates, or sparse time-series records, it includes its own INSERT block at the top of the page.

This keeps every chapter self-contained and ensures the examples behave exactly as intended.

Schema at a Glance

Five clusters worth recognizing:

  • People: users, subscriptions, payments. Who is on the platform, what they pay for.
  • Catalog: artists, albums, tracks, genres. What's available to listen to.
  • Activity: streams. Every play event. This is the largest table and drives most analytical questions.
  • Social: playlists, playlist_tracks, follows, artist_follows. How users curate and connect.
  • Marketing: campaigns, ad_impressions. What ads ran and who saw them.

The Tables

All 14 tables are pre-created in the browser environment. The full schema is below for reference. You don't need to run any of these CREATE TABLE statements; they're listed so you know what columns exist when you write queries.

SQL

Table Reference

TablePurposeKey Columns
usersAccount holdersuser_id, country, subscription_tier, signup_date
subscriptionsPlan history per user, including renewals and cancellationsuser_id, plan, start_date, end_date
genresMusic genres, with optional parent for hierarchygenre_id, parent_genre_id
artistsPerforming artists, optionally managed by another artistartist_id, verified, manager_artist_id
albumsAlbums released by artistsalbum_id, artist_id, genre_id
tracksIndividual songs on albums; carries JSONB metadatatrack_id, album_id, duration_seconds
streamsOne row per play event. The largest table, used heavily for analyticsuser_id, track_id, started_at, duration_listened
playlistsUser-created playlists, optionally forked from anothercreator_user_id, forked_from_playlist_id
playlist_tracksMany-to-many between playlists and tracks, with orderingplaylist_id, track_id, position
followsUser-to-user follow graphfollower_user_id, followed_user_id
artist_followsUser-to-artist follow graphuser_id, artist_id
paymentsPayment transactions, including refundsuser_id, amount, status
campaignsMarketing campaigns across channelscampaign_id, channel, start_date, end_date
ad_impressionsAd views and clicks tied to a campaignuser_id, campaign_id, clicked

Local Setup (Optional)

If you want to follow along outside the browser, you can run StreamFlow locally on PostgreSQL, MySQL, or SQL Server. Postgres is the recommended default; the syntax in the course matches what Postgres expects.

PostgreSQL:

SQL

Other databases: A few SERIAL/BIGSERIAL and JSONB types need adjustment. The mappings:

PostgreSQLMySQLSQL Server
SERIALINT AUTO_INCREMENTINT IDENTITY(1,1)
BIGSERIALBIGINT AUTO_INCREMENTBIGINT IDENTITY(1,1)
JSONBJSONNVARCHAR(MAX) (or JSON in Azure SQL)
TIMESTAMP WITH TIME ZONEDATETIMEDATETIMEOFFSET
BOOLEANTINYINT(1)BIT

Re-running the chapter INSERTs locally hits a duplicate-key error if the data is already there. Two ways around it:

SQL

The browser environment handles this for you automatically on Reset.