EdX

Advanced Data Engineering (edX)

Advanced Data Engineering (edX)

Become an expert in scaling data systems. Master Celery, Airflow, graph databases. Build real-world solutions for massive datasets and complex workflows. Optimize performance at enterprise scale.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Scalable Data Engineering with Cutting-Edge Tools

  • Learn to handle massive datasets efficiently with this advanced course
  • Gain practical expertise in scaling data systems using modern technologies
  • Ideal for data scientists, engineers & professionals with data handling experience

Course Highlights:

  • Leverage Celery & RabbitMQ for scalable data consumption
  • Optimize workflows with Apache Airflow for efficient management
  • Utilize Vector & Graph databases for robust data management at scale
  • Hands-on projects for real-world experience in solving data challenges
  • Create scalable systems & analyze performance for optimum results

Upskill to design, build & optimize data engineering pipelines that can handle complex, large-scale datasets. Prepare for demanding data roles by mastering advanced techniques with this comprehensive training.
This course is part of the following programs:

What you'll learn

  • Create and manage data pipelines and their lifecycle
  • Connect and work with message queues to manage data processing
  • Use vector, graph, and key/value databases for data storage at scale

Syllabus

Module 1: Queues and Databases-RabbitMQ and MySQL (6 hours)
\\- Video: Meet your instructor: Alfredo Deza (1 minute, Preview module)
\\- Video: About this course (2 minutes)
\\- Reading: Connect with your instructor (10 minutes)
\\- Reading: Meet your instructor: Noah Gift (10 minutes)
\\- Reading: Course structure and discussion etiquette (10 minutes)
\\- Video: Introduction (1 minute)
\\- Video: Overview of Queues (5 minutes)
\\- Video: What is Celery? (3 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Introduction to Celery (10 minutes)
\\- Video: Use cases for RabbitMQ (3 minutes)
\\- Reading: Using RabbitMQ with Docker (10 minutes)
\\- Reading: External lab: Start RabbitMQ in a development environment (10 minutes)
\\- Video: Overview of a Flask and Celery application (3 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: Introduction to RabbitMQ and Flask (30 minutes)
\\- Video: Introduction (0 minutes)
\\- Video: Configuring Celery with Flask (4 minutes)
\\- Video: Connecting Celery with RabbitMQ (5 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Build a web app by using Python and Flask (10 minutes)
\\- Reading: Background tasks with Celery (10 minutes)
\\- Video: Defining a Celery task in Flask (3 minutes)
\\- Video: Fire and forget task in Flask (2 minutes)
\\- Video: Retrieve values from asynchronous tasks (3 minutes)
\\- Reading: External lab: Add a new Celery task for RabbitMQ (10 minutes)
\\- Video: Summary (1 minute)
\\- Quiz: RabbitMQ with Celery and Flask (30 minutes)
\\- Video: MySQL Overview (2 minutes)
\\- Reading: Key Terms (10 minutes)
\\- Reading: Getting Started with MySQL (10 minutes)
\\- Video: MySQL from Terminal (3 minutes)
\\- Video: Archive and Drop Database (5 minutes)
\\- Video: Import external database Sakila (7 minutes)
\\- Video: Modify database Sakila (4 minutes)
\\- Video: Bash pipelines with MySQL (5 minutes)
\\- Video: MySQL to Python Standard Library Web Server (4 minutes)
\\- Ungraded Lab: Linux Hacking with MySQL (60 minutes)
\\- Quiz: Quiz-MySQL for Data Engineering (30 minutes)
\\- Reading: Lesson Reflection (10 minutes)
\\- Discussion Prompt: Meet and greet (optional) (10 minutes)
\\- Quiz: Queues and Databases - Final week quiz (30 minutes)

Module 2: Optimizing Workflow Management at Scale with Apache Airflow (5 hours)

  • Video: Introduction (1 minute, Preview module)
  • Video: What is Apache Airflow? (6 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: What is Apache Airflow (10 minutes)
  • Video: Installing Apache Airflow from PyPI (5 minutes)
  • Video: Using Apache Airflow with Docker (6 minutes)
  • Reading: Exploring the Airflow User Interface (10 minutes)
  • Reading: External lab: Install Apache Airflow (10 minutes)
  • Video: Exploring the Airflow UI (6 minutes)
  • Quiz: Quiz-Installing Apache Airflow (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (0 minutes)
  • Video: Exploring directed acyclic graphs (DAG) (10 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: External lab: Create a DAG (10 minutes)
  • Video: Creating a DAG (7 minutes)
  • Video: Running a backfill (4 minutes)
  • Reading: Architecture overview (10 minutes)
  • Video: Testing and validation (7 minutes)
  • Video: Summary (0 minutes)
  • Quiz: Quiz-Apache Airflow Fundamentals (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (1 minute)
  • Video: Identifying a task to build a DAG (4 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: External Lab: Build a data pipeline for census data (10 minutes)
  • Video: Retrieving remote data (4 minutes)
  • Video: Cleaning and normalizing data (4 minutes)
  • Video: Inspecting the UI for results (4 minutes)
  • Reading: Build Data Pipelines with Apache Airflow (10 minutes)
  • Video: Summary (1 minute)
  • Reading: Lesson Reflection (10 minutes)
  • Quiz: Quiz-Creating a pipeline (30 minutes)
  • Quiz: Final Week Quiz-Optimizing Workflow Management at Scale with Apache Airflow (30 minutes)

Module 3: Achieving Scalability with Vector, Graph, and Key/Value Databases (5 hours)

  • Video: Picking the proper database (3 minutes, Preview module)
  • Video: What are vector databases and how they work (2 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: What is a Vector Database? (10 minutes)
  • Video: Implementing Semantic search (4 minutes)
  • Video: Quickstart Qdrant (3 minutes)
  • Reading: External Lab: Run Quickstart of Qdrant (10 minutes)
  • Video: Qdrant Rust Client (3 minutes)
  • Reading: External Lab: Extend Semantic Search (10 minutes)
  • Video: Vector Database Architectures (2 minutes)
  • Video: Hands-on lab: Enhance Semantic Search (3 minutes)
  • Reading: Jaccard index (10 minutes)
  • Quiz: Quiz-Introduction to Vector Databases (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Graph data models and database concepts (2 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Rust CLI with Clap (10 minutes)
  • Video: Introduction to Amazon Neptune (2 minutes)
  • Reading: External Lab: Rust Graph CLI Tool (10 minutes)
  • Video: Graph algorithms: UFC graph centrality in Rust (4 minutes)
  • Video: Kosaraju Community Detection in Graphs (4 minutes)
  • Video: Shortest Path with Graphs (3 minutes)
  • Reading: Amazon Neptune (10 minutes)
  • Video: Key Components of Rust CLI Tool (1 minute)
  • Video: Lab Walkthrough: Building a Rust Graph CLI Tool (2 minutes)
  • Quiz: Quiz-Introduction to Graph Databases (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Quiz: Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases (30 minutes)
  • Ungraded Lab: Social Media Recommender (60 minutes)

Module 4: Real-world Advanced Data Engineering Projects (5 hours)

  • Video: Learn AWS CloudShell for Dynamo Development (4 minutes, Preview module)
  • Video: Learn AWS CodeCatalyst for Dynamo Development (5 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Amazon CodeCatalyst (10 minutes)
  • Video: Leveraging AWS CodeWhisperer for Dynamo Development (4 minutes)
  • Video: Create a Table with CLI (1 minute)
  • Video: Populate a Table With Batching Records (1 minute)
  • Video: Query a Table with Records (2 minutes)
  • Reading: External Lab: Extended DynamoDB (10 minutes)
  • Video: Project Walkthrough (2 minutes)
  • Quiz: Quiz-Building a solution with DynamoDB with the AWS CLI (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Video: Introduction (1 minute)
  • Video: Overview of a pipeline requirements (3 minutes)
  • Reading: Key Terms (10 minutes)
  • Reading: Quick start for SQLAlchemy (10 minutes)
  • Video: Using SqlAlchemy with Pandas (6 minutes)
  • Reading: Explore and analyze data with Python (10 minutes)
  • Video: Persisting data in a task (6 minutes)
  • Video: Reviewing the results (4 minutes)
  • Video: Summary (1 minute)
  • Quiz: Quiz-Persisting data through a multi-task DAG with Pandas (30 minutes)
  • Reading: Lesson Reflection (10 minutes)
  • Reading: Recommended Next Steps (10 minutes)
  • Quiz: Final Quiz-Advanced Data Engineering (30 minutes)
  • Ungraded Lab: Jupyter Sandbox (60 minutes)
  • Ungraded Lab: VS Code Sandbox (60 minutes)
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Analytics in Python (edX) EdX
Columbia University,ColumbiaX

Analytics in Python (edX)

Learn the fundamental of programming in Python and develop the ability to analyze data and make data-driven decisions. Data is the lifeblood of an organization. Competency in programming is an essential skill for successfully extracting information and knowledge from data. The goal of this course is to introduce learners to the basics of programming in Python and to give a working knowledge of how to use programs to deal with data.

This course is archived
5-12 Weeks
Cloud Data Engineering (Coursera) Coursera
Duke University

Cloud Data Engineering (Coursera)

Welcome to the third course in the Building Cloud Computing Solutions at Scale Specialization! In this course, you will learn how to apply Data Engineering to real-world projects using the Cloud computing concepts introduced in the first two courses of this series. By the end of this course, you will be able to develop Data Engineering applications and use software development best practices to create data engineering applications.

Jun 22nd 2026
4 Weeks
Scripting with Python and SQL for Data Engineering (Coursera) Coursera
Duke University

Scripting with Python and SQL for Data Engineering (Coursera)

In this third course of the Python, Bash and SQL Essentials for Data Engineering Specialization, you will explore techniques to work effectively with Python and SQL. We will go through useful data structures in Python scripting and connect to databases like MySQL. Additionally, you will learn how to use a modern text editor to connect and run SQL queries against a real database, performing operations to load and extract data.

Jun 22nd 2026
4 Weeks
Introduction to Structured Query Language (SQL) (Coursera) Coursera
University of Michigan

Introduction to Structured Query Language (SQL) (Coursera)

In this course, you'll walk through installation steps for installing a text editor, installing MAMP or XAMPP (or equivalent) and creating a MySql Database. You'll learn about single table queries and the basic syntax of the SQL language, as well as database design with multiple tables, foreign keys, and the JOIN operation. Lastly, you'll learn to model many-to-many relationships like those needed to represent users, roles, and courses.

Jun 22nd 2026
4 Weeks
Building Database Applications in PHP (Coursera) Coursera
University of Michigan

Building Database Applications in PHP (Coursera)

In this course, we'll look at the object oriented patterns available in PHP. You'll learn how to connect to a MySQL using the Portable Data Objects (PDO) library and issue SQL commands in the the PHP language. We'll also look at how PHP uses cookies and manages session data. You'll learn how PHP avoids double posting data, how flash messages are implemented, and how to use a session to log in users in web applications.

Jun 22nd 2026
5-12 Weeks