EdX

Data Engineering with Databricks (edX)

Offered by AI (Pragmatic AI Labs),

Become an expert in modern data engineering on Databricks' unified lakehouse platform. Master ETL pipelines, data transformations with Apache Spark, and Delta Lake for reliable data management.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Data Engineering on Databricks Lakehouse Platform

Learn Databricks architecture, cluster management & notebook analysis
Build reliable ETL pipelines with Delta Lake for data transformation
Implement advanced data processing techniques with Apache Spark

Course Highlights:

Create & scale Databricks clusters for workloads
Load data from diverse sources into notebooks
Explore, visualize & profile datasets with notebooks
Version control & share notebooks via Git integration
Read & ingest data in various file formats
Transform data with SQL & DataFrame operations
Handle complex data types like arrays, structs, timestamps
Deduplicate, join & flatten nested data structures
Identify & fix data quality issues with UDFs
Load cleansed data into Delta Lake for reliability
Build production-ready pipelines with Delta Live Tables
Schedule & monitor workloads using Databricks Jobs
Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.
This course is part of the Large Language Model Operations (LLMOps) Professional Certificate.

What you'll learn

Use Databricks for data engineering and ML workloads
Create and design ML pipelines
Use Llamafile and other local LLMs like Mixtral

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

Introduction to the Databricks Lakehouse Platform and its architecture
Creating, managing, and configuring clusters
Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI
Introduction to notebooks, including execution, sharing, and multi-language support
Efficient data transformation with Spark SQL and the Catalog Explorer
Creating tables from files and querying external data sources
Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines
Automated pipelines with Delta Live Tables
Delta Live Tables components
Continuous vs triggered pipelines
Configuring Auto Loader
Querying pipeline events
End-to-end example of Delta Live
Vacuum and garbage collection
Orchestrating workloads with Databricks Jobs
Multi-task workflows and task dependencies
Viewing job history
Using dashboards
Handling failures and configuring retries
Unified data access with Unity Catalog
Catalogs vs metastores
Unity Catalog quickstart in Python
Applying object security
Best practices for catalogs, connections, and business units

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

EdX

Google Cloud

Modernizing Data Lakes and Data Warehouses with Google Cloud (edX)

Computer Science

This course is intended for developers who are responsible for: Querying datasets, visualizing query results, and creating reports. Specific job roles include: Data Engineer, Data Analyst, Database Administrators, Big Data Architects.

Self Paced

Self-Paced

Data Warehouse Google Cloud Data Lake

Spark, Hadoop, and Snowflake for Data Engineering (edX)

EdX

AI (Pragmatic AI Labs)

Spark, Hadoop, and Snowflake for Data Engineering (edX)

Computer Science

Gain the skills for building efficient and scalable data pipelines. Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) and learn how to optimize them using Python, PySpark, and MLflow.

Self Paced

Self-Paced

Python Hadoop Spark

Coursera

University of Colorado System

Data Warehouse Concepts, Design, and Data Integration (Coursera)

CS: Design & Product

This is the second course in the Data Warehousing for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will learn exciting concepts and skills for designing data warehouses and creating data integration workflows. These are fundamental skills for data warehouse developers and administrators. You will have hands-on experience for data warehouse design and use open source products for manipulating pivot tables and creating data integration workflows.

Jun 22nd 2026

5-12 Weeks

Business Intelligence Data Warehousing Data Warehouse

Coursera

École Polytechnique Fédérale de Lausanne

Big Data Analysis with Scala and Spark (Coursera)

CS: Theory CS: Programming

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout.

Jun 22nd 2026

4 Weeks

Programming Algorithms SQL

EdX

AI (Pragmatic AI Labs)

Cloud Computing Foundations (edX)

Computer Science

Learn the foundations of cloud computing and build websites using serverless, PaaS, and IaaS technologies. Apply DevOps principles and create continuous delivery pipelines for efficient cloud infrastructure management.

Self Paced

Self-Paced

Machine Learning Cloud Computing Cloud Infrastructures

EdX

Microsoft

Data Science Essentials (edX)

Data Science

Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from MIT and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization. Plus, look at examples of how to build a cloud data science solution using Azure Machine Learning, R, and Python.

Not Available

Course Not Available

Machine Learning Data Science Data Visualization

Linux and Bash for Data Engineering (edX)

EdX

AI (Pragmatic AI Labs)

Linux and Bash for Data Engineering (edX)

Computer Science

Master Linux and Bash essentials for data engineering. Learn to manipulate data, build pipelines, and automate tasks using shell scripting and powerful Linux tools.

Self Paced

Self-Paced

Linux Bash Data Engineering

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

EdX

Delft University of Technology,DelftX

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

Statistics & Data Analysis

Good data is central to effective AI applications. This course teaches the basics of data for AI, covering what data is needed, how to extract data from existing databases and basic data skills including setup of a Python notebook environment, basic data exploration and simple data visualizations.

Self Paced

Self-Paced

Artificial Intelligence Data Management AI

EdX

IBM

Data Engineering Capstone Project (edX)

Computer Science

This Capstone Project is designed for you to apply and demonstrate your Data Engineering skills and knowledge in SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools and Big Data.

Self Paced

Self-Paced

Python NoSQL SQL

EdX

AI (Pragmatic AI Labs)

Open Source LLMOps (edX)

Computer Science

Unlock Open Source AI: Dive into LLM Architectures, Fine-Tuning, and Cutting-Edge Deployments.

Self Paced

Self-Paced

Artificial Intelligence Open Source AI

Applied Local Large Language Models (edX)

EdX

AI (Pragmatic AI Labs)

Applied Local Large Language Models (edX)

Computer Science

Unlock the power of large language models on your machine. Master setup and interaction with cutting-edge LLMs through intuitive web interfaces and APIs. Explore diverse tools, programming languages, and frameworks like Hugging Face and Mozilla for seamless LLM integration. Gain invaluable skills for efficient local LLM deployment.

Self Paced

Self-Paced

Hugging Face Large Language Models LLMs

EdX

AI (Pragmatic AI Labs)

Large Language Models with Azure (edX)

Computer Science

Harness Azure's AI Power: Master Large Language Models, (LLMs) Optimize Deployments, and Build Cutting-Edge Applications.

Self Paced

Self-Paced

Artificial Intelligence Azure Machine Learning Scalability