EdX

Advanced Data Engineering (edX)

Advanced Data Engineering (edX)

Become an expert in scaling data systems. Master Celery, Airflow, graph databases. Build real-world solutions for massive datasets and complex workflows. Optimize performance at enterprise scale.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Scalable Data Engineering with Cutting-Edge Tools

  • Learn to handle massive datasets efficiently with this advanced course
  • Gain practical expertise in scaling data systems using modern technologies
  • Ideal for data scientists, engineers & professionals with data handling experience

Course Highlights:

  • Leverage Celery & RabbitMQ for scalable data consumption
  • Optimize workflows with Apache Airflow for efficient management
  • Utilize Vector & Graph databases for robust data management at scale
  • Hands-on projects for real-world experience in solving data challenges
  • Create scalable systems & analyze performance for optimum results

Upskill to design, build & optimize data engineering pipelines that can handle complex, large-scale datasets. Prepare for demanding data roles by mastering advanced techniques with this comprehensive training.
This course is part of the following programs:

What you'll learn

  • Create and manage data pipelines and their lifecycle
  • Connect and work with message queues to manage data processing
  • Use vector, graph, and key/value databases for data storage at scale

Syllabus

Module 1: Queues and Databases-RabbitMQ and MySQL (6 hours)
\\- Video: Meet your instructor: Alfredo Deza (1 minute, Preview module)
\\- Video: About this course (2 minutes)
\\- Reading: Connect with your instructor (10 minutes)
\\- Reading: Meet your instructor: Noah Gift (10 minutes)
\\- Reading: Course structure and discussion etiquette (10 minutes)
\\- Video: Introduction (1 minute)
\\- Video: Overview of Queues (5 minutes)
\\- Video: What is Celery? (3 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Introduction to Celery (10 minutes)
\\- Video: Use cases for RabbitMQ (3 minutes)
\\- Reading: Using RabbitMQ with Docker (10 minutes)
\\- Reading: External lab: Start RabbitMQ in a development environment (10 minutes)
\\- Video: Overview of a Flask and Celery application (3 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: Introduction to RabbitMQ and Flask (30 minutes)
\\- Video: Introduction (0 minutes)
\\- Video: Configuring Celery with Flask (4 minutes)
\\- Video: Connecting Celery with RabbitMQ (5 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Build a web app by using Python and Flask (10 minutes)
\\- Reading: Background tasks with Celery (10 minutes)
\\- Video: Defining a Celery task in Flask (3 minutes)
\\- Video: Fire and forget task in Flask (2 minutes)
\\- Video: Retrieve values from asynchronous tasks (3 minutes)
\\- Reading: External lab: Add a new Celery task for RabbitMQ (10 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: RabbitMQ with Celery and Flask (30 minutes)
\\- Video: MySQL Overview (2 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Getting Started with MySQL (10 minutes)
\\- Video: MySQL from Terminal (3 minutes)
\\- Video: Archive and Drop Database (5 minutes)
\\- Video: Import external database Sakila (7 minutes)
\\- Video: Modify database Sakila (4 minutes)
\\- Video: Bash pipelines with MySQL (5 minutes)
\\- Video: MySQL to Python Standard Library Web Server (4 minutes)
\\- Ungraded Lab: Linux Hacking with MySQL (60 minutes)
\\- Quiz: Quiz-MySQL for Data Engineering (30 minutes)
\\- Reading: Lesson Reflection (10 minutes)
\\- Discussion Prompt: Meet and greet (optional) (10 minutes)
\\- Quiz: Queues and Databases - Final week quiz (30 minutes)

Module 2: Optimizing Workflow Management at Scale with Apache Airflow (5 hours)

  • Video: Introduction (1 minute, Preview module)
  • Video: What is Apache Airflow? (6 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: What is Apache Airflow (10 minutes)
  • Video: Installing Apache Airflow from PyPI (5 minutes)
  • Video: Using Apache Airflow with Docker (6 minutes)
  • Reading: Exploring the Airflow User Interface (10 minutes)
  • Reading: External lab: Install Apache Airflow (10 minutes)
  • Video: Exploring the Airflow UI (6 minutes)
  • Quiz: Quiz-Installing Apache Airflow (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (0 minutes)
  • Video: Exploring directed acyclic graphs (DAG) (10 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: External lab: Create a DAG (10 minutes)
  • Video: Creating a DAG (7 minutes)
  • Video: Running a backfill (4 minutes)
  • Reading: Architecture overview (10 minutes)
  • Video: Testing and validation (7 minutes)
  • Video: Summary (0 minutes)
  • Quiz: Quiz-Apache Airflow Fundamentals (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (1 minute)
  • Video: Identifying a task to build a DAG (4 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: External Lab: Build a data pipeline for census data (10 minutes)
  • Video: Retrieving remote data (4 minutes)
  • Video: Cleaning and normalizing data (4 minutes)
  • Video: Inspecting the UI for results (4 minutes)
  • Reading: Build Data Pipelines with Apache Airflow (10 minutes)
  • Video: Summary (1 minute)
  • Reading: Lesson Reflection (10 minutes)
  • Quiz: Quiz-Creating a pipeline (30 minutes)
  • Quiz: Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow (30 minutes)

Module 3: Achieving Scalability with Vector, Graph, and Key/Value Databases (5 hours)

  • Video: Picking the proper database (3 minutes, Preview module)
  • Video: What are vector databases and how they work (2 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: What is a Vector Database? (10 minutes)
  • Video: Implementing Semantic search (4 minutes)
  • Video: Quickstart Qdrant (3 minutes)
  • Reading: External Lab: Run Quickstart of Qdrant (10 minutes)
  • Video: Qdrant Rust Client (3 minutes)
  • Reading: External Lab: Extend Semantic Search (10 minutes)
  • Video: Vector Database Architectures (2 minutes)
  • Video: Hands-on lab: Enhance Semantic Search (3 minutes)
  • Reading: Jaccard index (10 minutes)
  • Quiz: Quiz-Introduction to Vector Databases (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Graph data models and database concepts (2 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Rust CLI with Clap (10 minutes)
  • Video: Introduction to Amazon Neptune (2 minutes)
  • Reading: External Lab: Rust Graph CLI Tool (10 minutes)
  • Video: Graph algorithms: UFC graph centrality in Rust (4 minutes)
  • Video: Kosaraju Community Detection in Graphs (4 minutes)
  • Video: Shortest Path with Graphs (3 minutes)
  • Reading: Amazon Neptune (10 minutes)
  • Video: Key Components of Rust CLI Tool (1 minute)
  • Video: Lab Walkthrough: Building a Rust Graph CLI Tool (2 minutes)
  • Quiz: Quiz-Introduction to Graph Databases (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Quiz: Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases (30 minutes)
  • Ungraded Lab: Social Media Recommender (60 minutes)

Module 4: Real-world Advanced Data Engineering Projects (5 hours)

  • Video: Learn AWS CloudShell for Dynamo Development (4 minutes, Preview module)
  • Video: Learn AWS CodeCatalyst for Dynamo Development (5 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Amazon CodeCatalyst (10 minutes)
  • Video: Leveraging AWS CodeWhisperer for Dynamo Development (4 minutes)
  • Video: Create a Table with CLI (1 minute)
  • Video: Populate a Table With Batching Records (1 minute)
  • Video: Query a Table with Records (2 minutes)
  • Reading: External Lab: Extended DynamoDB (10 minutes)
  • Video: Project Walkthrough (2 minutes)
  • Quiz: Quiz-Building a solution with DynamoDB with the AWS CLI (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (1 minute)
  • Video: Overview of a pipeline requirements (3 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Quick start for SQLAlchemy (10 minutes)
  • Video: Using SqlAlchemy with Pandas (6 minutes)
  • Reading: Explore and analyze data with Python (10 minutes)
  • Video: Persisting data in a task (6 minutes)
  • Video: Reviewing the results (4 minutes)
  • Video: Summary (1 minute)
  • Quiz: Quiz-Persisting data through a multi-task DAG with Pandas (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Reading: Recommended Next Steps (10 minutes)
  • Quiz: Final Quiz-Advanced Data Engineering (30 minutes)
  • Ungraded Lab: Jupyter Sandbox (60 minutes)
  • Ungraded Lab: VS Code Sandbox (60 minutes)
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Python Project for Data Engineering (Coursera) Coursera
IBM

Python Project for Data Engineering (Coursera)

This mini-course is intended to apply foundational Python skills by implementing different techniques to collect and work with data. Assume the role of a Data Engineer and extract data from multiple file formats, transform it into specific datatypes, and then load it into a single source for analysis. Continue with the course and test your knowledge by implementing webscraping and extracting data with APIs all with the help of multiple hands-on labs. After completing this course you will have acquired the confidence to begin collecting large datasets from multiple sources and transform them into one primary source, or begin web scraping to gain valuable business insights all with the use of Python.

Jun 8th 2026
1 Week
Building Database Applications in PHP (Coursera) Coursera
University of Michigan

Building Database Applications in PHP (Coursera)

In this course, we'll look at the object oriented patterns available in PHP. You'll learn how to connect to a MySQL using the Portable Data Objects (PDO) library and issue SQL commands in the the PHP language. We'll also look at how PHP uses cookies and manages session data. You'll learn how PHP avoids double posting data, how flash messages are implemented, and how to use a session to log in users in web applications.

Jun 8th 2026
5-12 Weeks
Introduction to RISC-V (edX) EdX
Linux Foundation,LinuxFoundationX

Introduction to RISC-V (edX)

Discover various aspects of RISC-V, including technical aspects, specifications and the community ecosystem. RISC-V is a free and open instruction set architecture (ISA) enabling a new era of processor innovation through open standard collaboration. This course will guide you through the various aspects of understanding the RISC-V community ecosystem, the RISC-V specifications, and some technical aspects of working with RISC-V.

Self Paced
Self-Paced
Scripting with Python and SQL for Data Engineering (Coursera) Coursera
Duke University

Scripting with Python and SQL for Data Engineering (Coursera)

In this third course of the Python, Bash and SQL Essentials for Data Engineering Specialization, you will explore techniques to work effectively with Python and SQL. We will go through useful data structures in Python scripting and connect to databases like MySQL. Additionally, you will learn how to use a modern text editor to connect and run SQL queries against a real database, performing operations to load and extract data.

Jun 8th 2026
4 Weeks