Data Engineering and Machine Learning using Spark (Coursera)

Offered by IBM,
Data Engineering and Machine Learning using Spark (Coursera)

Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others. In this short course you'll gain practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The course culminates in a project where you will apply your Spark skills to an ETL for ML workflow use-case.
NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or similar prior to starting this one.
This course can be applied to multiple Specializations or Professional Certificates programs. Completing this course will count towards your learning in any of the following programs:

What You Will Learn

  • Glean insights into how streaming data and Spark Structured Streaming empower machine learning and AI tasks.
  • Delve into graph theory and Apache Spark GraphFrames, used for motif finding in genetics and biological sciences, and learn to identify data.
  • Discover how ETL processes work with Apache Spark and machine learning and extend that knowledge to Spark MLlib capabilities and related benefits.
  • Explore supervised learning and unsupervised learning, clustering, and learn how to use the k-means clustering algorithm with Spark MLlib.

Syllabus

WEEK 1
Spark for Data Engineering
In this first of two modules, learn what streaming data is and get the essential knowledge to use Spark for Structured Streaming. Learn about data sources, streaming output modes, and supported data destinations. Learn about data operations considerations and discover how Spark Structured streaming listeners and checkpointing benefit streaming data processing. Discover how graph theory works with streaming data. You’ll gain insights into the advantages that Apache Spark GraphFrames offers and learn what qualities make data suitable for GraphFrames processing. Then, explore ETL and learn how to use Apache Spark for data extraction, transformation, and loading, put your newfound knowledge to practice, and gain practical, real-world skills in the ETL for Machine Learning Pipelines hands-on lab.

WEEK 2
SparkML
This module demystifies the concepts and practices related to machine learning using SparkML and the Spark Machine learning library. Explore both supervised and unsupervised machine learning Explore classification and regression tasks and learn how SparkML supports these machine learning tasks. Gain insights into unsupervised learning, with a focus on clustering, and discover how to apply the k-means clustering algorithm using the Spark MLlib. Complete this learning with the lab that solidifies your learning and gain real-world experience with Spark ML.

WEEK 3
Final Project
This final project provides real-world experience where you'll create your own Apache Spark application. You will create this Spark application as an end-to-end use-case that follows the Extract, Transform and Load processes (ETL) including data acquisition, transformation, model training, and deployment using IBM Watson Machine Learning.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026
5-12 Weeks
Regression Models (Coursera) Coursera
Johns Hopkins University

Regression Models (Coursera)

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models.

Jun 22nd 2026
4 Weeks
Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera) Coursera
University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Jun 22nd 2026
4 Weeks
Preparing for the Google Cloud Professional Data Engineer Exam (Coursera) Coursera
Google Cloud

Preparing for the Google Cloud Professional Data Engineer Exam (Coursera)

From the course: "The best way to prepare for the exam is to be competent in the skills required of the job." This course uses a top-down approach to recognize knowledge and skills already known, and to surface information and skill areas for additional preparation. You can use this course to help create your own custom preparation plan. It helps you distinguish what you know from what you don't know. And it helps you develop and practice skills required of practitioners who perform this job.

Jun 27th 2026
5-12 Weeks
Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning (Coursera) Coursera
DeepLearning.AI

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning (Coursera)

If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This course is part of the upcoming Machine Learning in Tensorflow Specialization and will teach you best practices for using TensorFlow, a popular open-source framework for machine learning.

Jun 22nd 2026
4 Weeks
Fundamentals of Reinforcement Learning (Coursera) Coursera
University of Alberta,Alberta Machine Intelligence Institute

Fundamentals of Reinforcement Learning (Coursera)

Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.

Jun 22nd 2026
4 Weeks
Structuring Machine Learning Projects (Coursera) Coursera
DeepLearning.AI

Structuring Machine Learning Projects (Coursera)

You will learn how to build a successful machine learning project. If you aspire to be a technical leader in AI, and know how to set direction for your team's work, this course will show you how. Much of this content has never been taught elsewhere, and is drawn from my experience building and shipping many deep learning products. This course also has two "flight simulators" that let you practice decision-making as a machine learning project leader. This provides "industry experience" that you might otherwise get only after years of ML work experience.

Jun 22nd 2026
2 Weeks
Device-based Models with TensorFlow Lite (Coursera) Coursera
DeepLearning.AI

Device-based Models with TensorFlow Lite (Coursera)

Bringing a machine learning model into the real world involves a lot more than just modeling. This Specialization will teach you how to navigate various deployment scenarios and use data more effectively to train your model. This second course teaches you how to run your machine learning models in mobile applications. You’ll learn how to prepare models for a lower-powered, battery-operated devices, then execute models on both Android and iOS platforms. Finally, you’ll explore how to deploy on embedded systems using TensorFlow on Raspberry Pi and microcontrollers.

Jun 22nd 2026
4 Weeks
Data Manipulation at Scale: Systems and Algorithms (Coursera) Coursera
University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 22nd 2026
4 Weeks
Sequence Models (Coursera) Coursera
DeepLearning.AI

Sequence Models (Coursera)

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.

Jun 22nd 2026
3 Weeks
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 22nd 2026
5-12 Weeks