EdX

Big Data Analytics Using Spark (edX)

Big Data Analytics Using Spark (edX)

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.
You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What you'll learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

Prerequisites
The previous courses in the MicroMasters program: DSE200x - Python for Data Science, DSE210x - Probability and Statistics in Data Science using Pythonand DSE220x - Machine Learning Fundamentals.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Biostatistics for Big Data Applications (edX) EdX
University of Texas Medical Branch

Biostatistics for Big Data Applications (edX)

Learn data analysis basics for working with biomedical big data with practical hands-on examples using R. This course provides a broad foundation of statistical terms and concepts as well as an introduction to the R statistical software package. The topics covered are fundamental components of biostatistical methods used in both omics and population health research.

No sessions Available
5-12 Weeks
Understanding the World Through Data (edX) EdX
MIT,MITx

Understanding the World Through Data (edX)

Become a data explorer – learn how to leverage data and basic machine learning algorithms to understand the world. Speech recognition, drones, and self-driving cars – things that once seemed like pure science fiction – are now widely available technologies, and just a few examples of how humans have taught machines to analyze data and make decisions. In this hands-on, introductory course, you will examine all the forms in which data exists, learn tools that uncover relationships between data, and leverage basic algorithms to understand the world from a new perspective.

Mar 13th 2024
5-12 Weeks
Recommender Systems: Behind the Screen (edX) EdX
Université de Montréal,UMontrealX

Recommender Systems: Behind the Screen (edX)

How are items recommended when you’re browsing for movies, jobs or clothing online? Register here and you’ll discover the fundamental concepts and methods allowing the most relevant item suggestions to users from e-commerce to online advertisement. In this course, you will explore and learn the best methods and practices in recommender systems, which are an essential component of the online ecosystem. This course was developed by IVADO and HEC Montréal as part of a workshop that took place in Montreal.

Sep 26th 2023
5-12 Weeks
Probability and Statistics in Data Science using Python (edX) EdX
University of California, San Diego,UC San DiegoX

Probability and Statistics in Data Science using Python (edX)

Using Python, learn statistical and probabilistic approaches to understand and gain insights from data. The job of a data scientist is to glean knowledge from complex and noisy datasets. Reasoning about uncertainty is inherent in the analysis of noisy data. Probability and Statistics provide the mathematical foundation for such reasoning.

Self Paced
Self-Paced
Knowledge Management and Big Data in Business (edX) EdX
The Hong Kong Polytechnic University,HKPolyUx

Knowledge Management and Big Data in Business (edX)

Learn why and how knowledge management and Big Data are vital to the new business era. The business landscape is changing so rapidly that traditional management, business and computing courses do not meet the needs for the next generation of workers in the business world. Most traditional methods are of a repetitive, rule-based nature and will be gradually replaced by Artificial Intelligence.

Self Paced
Self-Paced
Computer Applications of Artificial Intelligence and e-Construction (edX) EdX
Purdue University,PurdueX

Computer Applications of Artificial Intelligence and e-Construction (edX)

Learn the fundamentals of artificial intelligence, machine learning, natural language processing and their applications in e-Construction. This course is the third in a sequence of interrelated courses of the current computer applications in the construction industry. The emphasis of this course is the advanced computational tools including artificial intelligence, machine learning, and natural language processing, and their applications in e-Construction.

Mar 28th 2022
5-12 Weeks
Introduction to Apache Spark (edX) EdX
University of California, Berkeley

Introduction to Apache Spark (edX)

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

Not Available
Course Not Available
Robotics: Vision Intelligence and Machine Learning (edX) EdX
University of Pennsylvania,PennX

Robotics: Vision Intelligence and Machine Learning (edX)

Learn how to design robot vision systems that avoid collisions, safely work with humans and understand their environment. How do robots “see”, respond to and learn from their interactions with the world around them? This is the fascinating field of visual intelligence and machine learning. Visual intelligence allows a robot to “sense” and “recognize” the surrounding environment. It also enables a robot to “learn” from the memory of past experiences by extracting patterns in visual signals.

No sessions available
5-12 Weeks
Data Science: R Basics (edX) EdX
HarvardX,Harvard University

Data Science: R Basics (edX)

Build a foundation in R and learn how to wrangle, analyze, and visualize data. This course will introduce you to the basics of R programming. You can better retain R when you learn it to solve a specific problem, so you’ll use a real-world dataset about crime in the United States. You will learn the R skills needed to answer essential questions about differences in crime across the different states.

Self Paced
Self-Paced