EdX

Data Engineering with Databricks (edX)

Data Engineering with Databricks (edX)

Become an expert in modern data engineering on Databricks' unified lakehouse platform. Master ETL pipelines, data transformations with Apache Spark, and Delta Lake for reliable data management.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Master Data Engineering on Databricks Lakehouse Platform

  • Learn Databricks architecture, cluster management & notebook analysis
  • Build reliable ETL pipelines with Delta Lake for data transformation
  • Implement advanced data processing techniques with Apache Spark

Course Highlights:

  • Create & scale Databricks clusters for workloads
  • Load data from diverse sources into notebooks
  • Explore, visualize & profile datasets with notebooks
  • Version control & share notebooks via Git integration
  • Read & ingest data in various file formats
  • Transform data with SQL & DataFrame operations
  • Handle complex data types like arrays, structs, timestamps
  • Deduplicate, join & flatten nested data structures
  • Identify & fix data quality issues with UDFs
  • Load cleansed data into Delta Lake for reliability
  • Build production-ready pipelines with Delta Live Tables
  • Schedule & monitor workloads using Databricks Jobs
  • Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.
This course is part of the Large Language Model Operations (LLMOps) Professional Certificate.

What you'll learn

  • Use Databricks for data engineering and ML workloads
  • Create and design ML pipelines
  • Use Llamafile and other local LLMs like Mixtral

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

  • Introduction to the Databricks Lakehouse Platform and its architecture
  • Creating, managing, and configuring clusters
  • Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI
  • Introduction to notebooks, including execution, sharing, and multi-language support
  • Efficient data transformation with Spark SQL and the Catalog Explorer
  • Creating tables from files and querying external data sources
  • Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines
Automated pipelines with Delta Live Tables
Delta Live Tables components
Continuous vs triggered pipelines
Configuring Auto Loader
Querying pipeline events
End-to-end example of Delta Live
Vacuum and garbage collection
Orchestrating workloads with Databricks Jobs
Multi-task workflows and task dependencies
Viewing job history
Using dashboards
Handling failures and configuring retries
Unified data access with Unity Catalog
Catalogs vs metastores
Unity Catalog quickstart in Python
Applying object security
Best practices for catalogs, connections, and business units

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Data Warehouse Concepts, Design, and Data Integration (Coursera) Coursera
University of Colorado System

Data Warehouse Concepts, Design, and Data Integration (Coursera)

This is the second course in the Data Warehousing for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will learn exciting concepts and skills for designing data warehouses and creating data integration workflows. These are fundamental skills for data warehouse developers and administrators. You will have hands-on experience for data warehouse design and use open source products for manipulating pivot tables and creating data integration workflows.

Jun 22nd 2026
5-12 Weeks
Big Data Analysis with Scala and Spark (Coursera) Coursera
École Polytechnique Fédérale de Lausanne

Big Data Analysis with Scala and Spark (Coursera)

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout.

Jun 22nd 2026
4 Weeks
Data Science Essentials (edX) EdX
Microsoft

Data Science Essentials (edX)

Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from MIT and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization. Plus, look at examples of how to build a cloud data science solution using Azure Machine Learning, R, and Python.

Not Available
Course Not Available
AI Skills for Engineers: Data Engineering and Data Pipelines (edX) EdX
Delft University of Technology,DelftX

AI Skills for Engineers: Data Engineering and Data Pipelines (edX)

Good data is central to effective AI applications. This course teaches the basics of data for AI, covering what data is needed, how to extract data from existing databases and basic data skills including setup of a Python notebook environment, basic data exploration and simple data visualizations.

Self Paced
Self-Paced