Learn
Newsletter
Roadmaps
New
Search
⌘K
Toggle theme
Toggle theme
Toggle menu
Data Engineering Roadmap
What is Data Engineering & Role of a Data Engineer
Data Types & Schemas
ETL vs ELT
Foundations
Basics of Databases & Data Modeling
Batch vs Stream Processing
Relational Databases (PostgreSQL, MySQL, SQL Server)
NoSQL Databases (MongoDB, Cassandra, DynamoDB)
OLTP vs OLAP
Popular Warehouses (Snowflake, BigQuery, Redshift)
Databases & Data Warehousing
SQL Mastery (Joins, Window Functions, CTEs)
Data Warehousing Concepts
Data Lakes vs Data Warehouses
ETL Concepts
Streaming Pipelines
dbt (Data Build Tool)
Data Pipelines & ETL Tools
Batch Pipelines
Apache Airflow
Informatica / Talend / Fivetran
Hadoop Ecosystem (HDFS, MapReduce, YARN)
Spark Streaming & Structured Streaming
Parquet, ORC & Avro File Formats
Big Data Ecosystem
Apache Spark (RDDs, DataFrames, SparkSQL)
Hive & Presto
Message Queues (RabbitMQ, ActiveMQ)
Pulsar vs Kafka
Event-driven Architectures
Streaming & Messaging Systems
Apache Kafka
Real-time Stream Processing (Flink, Storm)
AWS for Data Engineering (S3, Glue, Redshift, EMR, Kinesis)
Azure for Data Engineering (Synapse, Data Lake, Event Hubs)
Orchestration & Workflow Management
Cloud Platforms & Orchestration
GCP for Data Engineering (BigQuery, Dataflow, Pub/Sub)
Containerization (Docker, Kubernetes)
Data Validation & Quality Checks
Data Catalogs
Observability & Monitoring in Data Pipelines
Data Governance & Quality
Data Lineage & Metadata Management
GDPR & Compliance
Lakehouse Architecture (Delta Lake, Iceberg, Hudi)
Streaming Joins & State Management
Cost Optimization Strategies
Advanced Topics
Data Mesh
Real-time Analytics (ClickHouse, Druid)
Data Security & Encryption
Build a Data Pipeline with Airflow + Spark
Data Warehouse for E-commerce Analytics
Log Processing System
Projects
Real-time Streaming Pipeline with Kafka + Flink
ETL Pipeline with dbt + Snowflake
IoT Data Streaming Project
SQL Query Challenges
Data Modeling Scenarios
Big Data Case Studies (Netflix, Uber, Airbnb)
Interview Preparation
ETL Design Questions
System Design for Data Pipelines