EdX

Distributed Machine Learning with Apache Spark (edX)

Distributed Machine Learning with Apache Spark (edX)

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark. Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.
This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.
This course is part of the Data Science and Engineering with Spark XSeries Program.
What you'll learn:

  • The underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines
  • Exploratory data analysis, feature extraction, supervised learning, and model evaluation
  • Application of these principles using Spark
  • How to implement distributed algorithms for fundamental statistical models

Prerequisites:

  • Python programming background
  • experience with PySpark equivalent to CS105x: Introduction to Spark
  • comfort with mathematical and algorithmic reasoning
  • familiarity with basic machine learning concepts
  • exposure to algorithms, probability, linear algebra and calculus
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Computing for Data Analysis (edX) EdX
Georgia Institute of Technology,GTx

Computing for Data Analysis (edX)

A hands-on introduction to basic programming principles and practice relevant to modern data analysis, data mining, and machine learning. The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data. In the course, you’ll see how computing and mathematics come together.

Aug 19th 2024
13-24 Weeks
Platform-Based Analytics (edX) EdX
Indiana University,IUx

Platform-Based Analytics (edX)

Gain hands-on experience extracting, preparing, exploring, and analyzing data statistically and visually using features and tools native to Microsoft Excel. In an ever-growing digital world, the need for strong data analysis skills is at the forefront of every business function, along with the ability to accurately describe and interpret analytical findings.

Nov 7th 2023
5-12 Weeks
Recommender Systems: Behind the Screen (edX) EdX
Université de Montréal,UMontrealX

Recommender Systems: Behind the Screen (edX)

How are items recommended when you’re browsing for movies, jobs or clothing online? Register here and you’ll discover the fundamental concepts and methods allowing the most relevant item suggestions to users from e-commerce to online advertisement. In this course, you will explore and learn the best methods and practices in recommender systems, which are an essential component of the online ecosystem. This course was developed by IVADO and HEC Montréal as part of a workshop that took place in Montreal.

Sep 26th 2023
5-12 Weeks
Aplicaciones de la Teoría de Grafos a la vida real II (edX) EdX
Universitat Politècnica de València,UPValenciaX

Aplicaciones de la Teoría de Grafos a la vida real II (edX)

Aprenderemos a modelizar problemas del mundo real mediante su representación con grafos y a resolverlos mediante sus algoritmos asociados. Este curso trata la Teoría de Grafos desde el punto de vista de la modelización, lo que nos permitirá con posterioridad resolver muchos problemas de diversa índole. Presentaremos ejemplos de los distintos problemas en un contexto real, analizaremos la representación de éstos mediante grafos y veremos los algoritmos necesarios para resolverlos.

Self Paced
Self-Paced
Introduction to Java Programming: Fundamental Data Structures and Algorithms (edX) EdX
Universidad Carlos III de Madrid - UC3M,UC3Mx

Introduction to Java Programming: Fundamental Data Structures and Algorithms (edX)

Learn to enhance your code by using fundamental data structures and powerful algorithms in Java. In this introductory course, you will learn programming with Java in an easy and interactive way. You will learn about fundamental data structures, such as lists, stacks, queues and trees, and presents algorithms for inserting, deleting, searching and sorting information on these data structures in an efficient way.

Self Paced
Self-Paced
Machine Learning with Python: from Linear Models to Deep Learning (edX) EdX
MIT,MITx

Machine Learning with Python: from Linear Models to Deep Learning (edX)

An in-depth introduction to the field of machine learning, from linear models to deep learning and reinforcement learning, through hands-on Python projects. Machine learning methods are commonly used across engineering and sciences, from computer systems to physics. Moreover, commercial sites such as search engines, recommender systems (e.g., Netflix, Amazon), advertisers, and financial institutions employ machine learning algorithms for content recommendation, predicting customer behavior, compliance, or risk.

May 27th 2024
13-24 Weeks
Applied Quantum Computing III: Algorithm and Software (edX) EdX
Purdue University,PurdueX

Applied Quantum Computing III: Algorithm and Software (edX)

Learn domain-specific quantum algorithms and how to run them on present-day quantum hardware. This course is part III of the series of Quantum computing courses, which covers aspects from fundamentals to present-day hardware platforms to quantum software and programming. The goal of part III is to discuss some of the key domain-specific algorithms that are developed by exploiting the fundamental quantum phenomena (e.g. entanglement)and computing models discussed in part I.

Mar 25th 2024
5-12 Weeks
The Beauty and Joy of Computing - AP® CS Principles Part 2 (edX) EdX
University of California, Berkeley,BerkeleyX

The Beauty and Joy of Computing - AP® CS Principles Part 2 (edX)

A computer science principles course for anyone who wants to learn how to translate ideas into code. Discover the big ideas and thinking practices in computer science plus learn how to code using one of the friendliest programming languages, Snap! (based on Scratch).

No sessions available
13-24 Weeks