Spark (Udacity)

Offered by Udacity, Insight,
Spark (Udacity)

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

Spark is a top open source project used by the largest companies and startups around the world to efficiently analyze messy data sets.

What You Will Learn

Lesson 1
The Power of Spark
Understand the big data ecosystem
Understand when to use Spark and when not to use it

Lesson 2
Data Wrangling with Spark
Manipulate data with SparkSQL and Spark Dataframes
Use Spark for wrangling massive datasets

Lesson 3
Debugging and Optimization
Troubleshoot common errors and optimize their code using the Spark WebUI

Lesson 4
Machine Learning with Spark
Use Spark’s Machine Learning Library to train machine learning models at scale

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Data Visualization and D3.js (Udacity) Udacity
Udacity,Zipfian Academy

Data Visualization and D3.js (Udacity)

Communicating with Data. Learn the fundamentals of data visualization and practice communicating with data. This course covers how to apply design principles, human perception, color theory, and effective storytelling to data visualization. If you present data to others, aspire to be an analyst or data scientist, or if you’d like to become more technical with visualization tools, then you can grow your skills with this course.

Self Paced
Self-Paced
Segmentation and Clustering (Udacity) Udacity
Udacity

Segmentation and Clustering (Udacity)

Use machine learning to create segments. The Segmentation and Clustering course provides students with the foundational knowledge to build and apply clustering models to develop more sophisticated segmentation in business contexts. In this course, you'll learn how to use an advanced analytical method called clustering to create useful segments for business contexts, whether its stores, customers, geographies, etc. You'll learn this through improving your fluency in Alteryx, a data analytics tool that enables you prepare, blend, and analyze data quickly.

Self Paced
Self-Paced
Machine Learning (Udacity) Udacity
Georgia Institute of Technology,Udacity

Machine Learning (Udacity)

Supervised, Unsupervised & Reinforcement. Machine Learning is a graduate-level course covering the area of Artificial Intelligence concerned with computer programs that modify and improve their performance through experiences. The first part of the course covers Supervised Learning, a machine learning task that makes it possible for your phone to recognize your voice, your email to filter spam, and for computers to learn a bunch of other cool stuff. In part two, you will learn about Unsupervised Learning. Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? Such answers can be found in this section!

Self Paced
Self-Paced
Model Building and Validation (Udacity) Udacity
Udacity

Model Building and Validation (Udacity)

Advanced Techniques for Analyzing Data. This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

Self Paced
Self-Paced
Machine Learning Interview Preparation (Udacity) Udacity
Udacity

Machine Learning Interview Preparation (Udacity)

Prove your qualifications in your machine learning interviews. In this course, you’ll learn exactly what to expect during a machine learning interview. You’ll cover all the common questions and technical strategies, and review a range of important topics, from machine learning algorithms to image categorization. You’ll also learn best practices for data structure questions and whiteboard problems, and at the end of the course, you’ll get unlimited access to mock interviews on Pramp.

Self Paced
Self-Paced
Data Wrangling with MongoDB (Udacity) Udacity
Udacity,MongoDB University

Data Wrangling with MongoDB (Udacity)

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Self Paced
Self-Paced
Big Data Analytics in Healthcare (Udacity) Udacity
Georgia Institute of Technology,Udacity

Big Data Analytics in Healthcare (Udacity)

Data science plays an important role in many industries. In facing massive amount of heterogeneous data, scalable machine learning and data mining algorithms and systems become extremely important for data scientists. The growth of volume, complexity and speed in data drives the need for scalable data analytic algorithms and systems. In this course, we study such algorithms and systems in the context of healthcare applications.

Self Paced
Self-Paced
Introduction to Machine Learning Course (Udacity) Udacity
Udacity

Introduction to Machine Learning Course (Udacity)

This class will teach you the end-to-end process of investigating data through a machine learning lens. Learn online, with Udacity. Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.

Self Paced
Self-Paced
Web Tooling & Automation (Udacity) Udacity
Udacity,Google

Web Tooling & Automation (Udacity)

Gulp, Sass, and BabelJS, Oh My! In this course, you’ll learn how to setup your development, get super productive during daily work and iteration, prevent yourself and your site from disasters and save a lot of time and effort with automatic optimization and automation. Finally, you’ll learn how to do all this while being confident your code runs on a multitude of devices in the real world.

Self Paced
Self-Paced
Intro to Relational Databases (Udacity) Udacity
Udacity

Intro to Relational Databases (Udacity)

SQL, DB-API, and More! This course is a quick, fun introduction to using a relational database from your code, using examples in Python. You'll learn the basics of SQL (the Structured Query Language) and database design, as well as the Python API for connecting Python code to a database. You'll also learn a bit about protecting your database-backed web apps from common security problems. After taking this course, you'll be able to write code using a database as a backend to store application data reliably and safely.

Self Paced
Self-Paced
Machine Learning: Unsupervised Learning (Udacity) Udacity
Georgia Institute of Technology,Udacity

Machine Learning: Unsupervised Learning (Udacity)

Conversations on Analyzing Data. Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? The answer can be found in Unsupervised Learning! Closely related to pattern recognition, Unsupervised Learning is about analyzing data and looking for patterns. It is an extremely powerful tool for identifying structure in data. This course focuses on how you can use Unsupervised Learning approaches -- including randomized optimization, clustering, and feature selection and transformation -- to find structure in unlabeled data.

Self Paced
Self-Paced