Machine Learning with Apache Spark (Coursera)

Offered by IBM,
Machine Learning with Apache Spark (Coursera)

Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML.
In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models. Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally, demonstrate your acquired skills through a final assignment.
This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.

What you'll learn

  • Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence.
  • Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines.
  • Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML.
  • Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

Syllabus

Get Started with Machine Learning
Module 1
In this module, you will gain knowledge of machine learning techniques that enable computers to perform tasks without explicit programming. You will explore the lifecycle of machine learning models and understand the crucial role of data engineering in machine learning projects. The module covers supervised and unsupervised learning techniques, including classification, regression, and clustering. Furthermore, you will acquire valuable insights into Generative AI and its potential to revolutionize multiple industries, enhance people's lives, and generate newer and previously unimaginable data and experiences.

Machine Learning with Apache Spark
Module 2
This module will introduce you to Spark and provide an overview of its key features and applications in the field of data engineering. You will discover the process of connecting to a Spark cluster using SN labs and delve into various topics such as regression, mileage prediction, classification, diabetic classification, clustering, and clustering load data using SparkML. Additionally, you will gain insights into how to construct these models using Spark ML. Moreover, this module will cover GraphFrames on Apache Spark and guide you in hands-on labs.

Data Engineering for Machine Learning using Apache Spark
Module 3
This module begins with Apache Spark Structured Streaming and its role in processing streaming data with Spark SQL. You will acquire knowledge about key terms associated with Structured Streaming. The module then covers the Extract-Transform-Load process and provides hands-on experience in transferring data from one source to another destination with varying data formats or structures. Additionally, you will gain a practical understanding of feature extraction and transformation using Spark extract and transform features. The module also delves into machine learning pipelines using Spark, demonstrating the process and benefits involved. Lastly, you will grasp the concept of model persistence and its significant role in Machine Learning.

Final Project
Module 4
In this module, you will apply the data engineering skills and techniques you have acquired throughout the course. The course concludes with a final project and assignments that allow you to demonstrate your proficiency in these areas. You will step into the role of a data engineer working at a renowned aeronautics consulting company recognized for its adeptness in handling large datasets. Your role as a data engineer is crucial as the data scientists rely on your expertise to carry out ETL (Extract, Transform, Load) tasks and establish machine learning pipelines. While data scientists possess expertise in machine learning, they depend on your specialized knowledge to handle various algorithms and data formats. Your contribution plays a vital role in ensuring the smooth execution of their tasks.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning With Big Data (Coursera) Coursera
University of California, San Diego

Machine Learning With Big Data (Coursera)

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems.

Jun 22nd 2026
5-12 Weeks
Text Retrieval and Search Engines (Coursera) Coursera
University of Illinois at Urbana-Champaign

Text Retrieval and Search Engines (Coursera)

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text.

Jun 22nd 2026
5-12 Weeks
Application of AI, InsurTech, and Real Estate Technology (Coursera) Coursera
University of Pennsylvania

Application of AI, InsurTech, and Real Estate Technology (Coursera)

In this course, you’ll learn about the emerging technologies in Artificial Intelligence and Machine Learning that are utilized in InsurTech and Real Estate Tech. Professor Chris Geczy of the Wharton School has designed this course to help you navigate the complex world of insurance and real estate tech, and understand how FinTech plays a role in the future of the industry. Through study and analysis of Artificial Intelligence and Machine Learning, you’ll learn how InsurTech is redefining the insurance industry.

Jun 22nd 2026
4 Weeks
Getting started with TensorFlow 2 (Coursera) Coursera
Imperial College London

Getting started with TensorFlow 2 (Coursera)

Welcome to this course on Getting started with TensorFlow 2! In this course you will learn a complete end-to-end workflow for developing deep learning models with Tensorflow, from building, training, evaluating and predicting with models using the Sequential API, validating your models and including regularisation, implementing callbacks, and saving and loading models.

Jun 22nd 2026
5-12 Weeks
Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 22nd 2026
5-12 Weeks
Machine Learning: Clustering & Retrieval (Coursera) Coursera
University of Washington

Machine Learning: Clustering & Retrieval (Coursera)

Case Studies: Finding Similar Documents. A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?

Jun 22nd 2026
5-12 Weeks
Sequence Models (Coursera) Coursera
DeepLearning.AI

Sequence Models (Coursera)

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.

Jun 22nd 2026
3 Weeks
Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera) Coursera
University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Jun 22nd 2026
4 Weeks
Structuring Machine Learning Projects (Coursera) Coursera
DeepLearning.AI

Structuring Machine Learning Projects (Coursera)

You will learn how to build a successful machine learning project. If you aspire to be a technical leader in AI, and know how to set direction for your team's work, this course will show you how. Much of this content has never been taught elsewhere, and is drawn from my experience building and shipping many deep learning products. This course also has two "flight simulators" that let you practice decision-making as a machine learning project leader. This provides "industry experience" that you might otherwise get only after years of ML work experience.

Jun 22nd 2026
2 Weeks
Neural Networks and Deep Learning (Coursera) Coursera
DeepLearning.AI

Neural Networks and Deep Learning (Coursera)

If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities. Deep learning is also a new "superpower" that will let you build AI systems that just weren't possible a few years ago. In this course, you will learn the foundations of deep learning.

Jun 22nd 2026
4 Weeks