EdX

Apache Spark for Data Engineering and Machine Learning (edX)

Offered by IBM,
Apache Spark for Data Engineering and Machine Learning (edX)

This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.
In this short course, you explore concepts and gain hands-on skills to use Spark for data engineering and machine learning applications. You'll learn about Spark Structured Streaming, including data sources, output modes, operations. Then, explore how Graph theory works and discover how GraphFrames supports Spark DataFrames and popular algorithms.
Organizations can acquire data from structured and unstructured sources and deliver the data to users in formats they can use. Learn how to use Spark for extract, transform and load (ETL) data. Then, you'll hone your newly acquired skills during your "ETL for Machine Learning Pipelines" lab.
Next, discover why machine learning practitioners prefer Spark. You'll learn how to create pipelines and quickly implement features for extraction, selections, and transformations on structured data sets. Discover how to perform classification and regression using Spark. You'll be able to define and identify both supervised and unsupervised learning. Learn about clustering and how to apply the k-mean s clustering algorithm using Spark MLlib. You'll reinforce your knowledge with focused, hands-on labs and a final project where you will apply Spark to a real-world inspired problem.
Prior to taking this course, please ensure you have foundational Spark knowledge and skills, for example, by first completing the IBM course titled "Big Data, Hadoop and Spark Basics."
This course is part of the NoSQL, Big Data and Spark Fundamentals Professional Certificate.

What you'll learn

  • Describe the features, benefits, limitations, and application of Apache Spark Structured Streaming
  • Describe Graph theory and explain how GraphFrames benefits developers
  • Explain how developers can apply extract, transform and load (ETL) processes using Spark.
  • Describe how Spark ML supports machine learning development
  • Apply Spark ML for regression and classification
  • Differentiate between supervised and unsupervised Machine learning"
  • Explain how Spark ML uses clustering
  • Demonstrate hands-on working knowledge of using Spark for ETL processes
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Data Science Essentials (edX) EdX
Microsoft

Data Science Essentials (edX)

Explore data visualization and exploration concepts with experts from MIT and Microsoft, and get an introduction to machine learning. Demand for data science talent is exploding. Develop your career as a data scientist, as you explore essential skills and principles with experts from MIT and Microsoft. In this data science course, you will learn key concepts in data acquisition, preparation, exploration, and visualization. Plus, look at examples of how to build a cloud data science solution using Azure Machine Learning, R, and Python.

Not Available
Course Not Available
Data Science: Machine Learning (edX) EdX
HarvardX,Harvard University

Data Science: Machine Learning (edX)

Build a movie recommendation system and learn the science behind one of the most popular and successful data science techniques. Perhaps the most popular data science methodologies come from machine learning. What distinguishes machine learning from other computer guided decision processes is that it builds prediction algorithms using data.

Self Paced
Self-Paced
Applied Quantum Computing III: Algorithm and Software (edX) EdX
Purdue University,PurdueX

Applied Quantum Computing III: Algorithm and Software (edX)

Learn domain-specific quantum algorithms and how to run them on present-day quantum hardware. This course is part III of the series of Quantum computing courses, which covers aspects from fundamentals to present-day hardware platforms to quantum software and programming. The goal of part III is to discuss some of the key domain-specific algorithms that are developed by exploiting the fundamental quantum phenomena (e.g. entanglement)and computing models discussed in part I.

Mar 25th 2024
5-12 Weeks
Machine Learning with Python: from Linear Models to Deep Learning (edX) EdX
MIT,MITx

Machine Learning with Python: from Linear Models to Deep Learning (edX)

An in-depth introduction to the field of machine learning, from linear models to deep learning and reinforcement learning, through hands-on Python projects. Machine learning methods are commonly used across engineering and sciences, from computer systems to physics. Moreover, commercial sites such as search engines, recommender systems (e.g., Netflix, Amazon), advertisers, and financial institutions employ machine learning algorithms for content recommendation, predicting customer behavior, compliance, or risk.

May 27th 2024
13-24 Weeks
Big Data Analytics Using Spark (edX) EdX
University of California, San Diego,UC San DiegoX

Big Data Analytics Using Spark (edX)

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

Dec 5th 2023
5-12 Weeks
Probability and Statistics in Data Science using Python (edX) EdX
University of California, San Diego,UC San DiegoX

Probability and Statistics in Data Science using Python (edX)

Using Python, learn statistical and probabilistic approaches to understand and gain insights from data. The job of a data scientist is to glean knowledge from complex and noisy datasets. Reasoning about uncertainty is inherent in the analysis of noisy data. Probability and Statistics provide the mathematical foundation for such reasoning.

Self Paced
Self-Paced
Data Science and Machine Learning Capstone Project (edX) EdX
IBM

Data Science and Machine Learning Capstone Project (edX)

Create a project that you can use to showcase your Data Science skills to prospective employers. Apply various data science and machine learning techniques to analyze and visualize a data set involving a real life business scenario and build a predictive model. Now that you've taken several courses on data science and machine learning, it’s time to put your learning to work on a data problem involving a real life scenario. Employers really care about how well you can apply your knowledge and skills to solve real world problems, and the work you do in this capstone project will make you stand out in the job market.

Self Paced
Self-Paced