Udacity

Spark (Udacity)

Offered by Udacity, Insight,

Master how to work with big data and build machine learning models at scale using Spark! In this course, you’ll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark. In the first lesson, you will learn about big data and how Spark fits into the big data ecosystem. In lesson two, you will be practicing processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs. In the third lesson, you will debug and optimize your Spark code when running on a cluster. In lesson four, you will use Spark’s Machine Learning Library to train machine learning models at scale.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

Spark is a top open source project used by the largest companies and startups around the world to efficiently analyze messy data sets.

What You Will Learn

Lesson 1
The Power of Spark
Understand the big data ecosystem
Understand when to use Spark and when not to use it

Lesson 2
Data Wrangling with Spark
Manipulate data with SparkSQL and Spark Dataframes
Use Spark for wrangling massive datasets

Lesson 3
Debugging and Optimization
Troubleshoot common errors and optimize their code using the Spark WebUI

Lesson 4
Machine Learning with Spark
Use Spark’s Machine Learning Library to train machine learning models at scale

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Udacity

Introduction to Machine Learning Course (Udacity)

Data Science

This class will teach you the end-to-end process of investigating data through a machine learning lens. Learn online, with Udacity. Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.

Self Paced

Self-Paced

Statistics Machine Learning Clustering

Udacity

Georgia Institute of Technology,Udacity

Artificial Intelligence (Udacity)

Robotics & Computer Vision

Learn about the fundamentals of Artificial Intelligence in this introductory graduate-level course. It provides a survey of various topics in the field along with in-depth discussion of foundational concepts such as classical search, probability, machine learning, logic and planning.

Self Paced

Self-Paced

Planning Artificial Intelligence Probability

Udacity

Udacity,MongoDB University

Data Wrangling with MongoDB (Udacity)

CS: Software Engineering

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Self Paced

Self-Paced

Programming MongoDB Databases

Udacity

Deploying a Hadoop Cluster (Udacity)

Statistics & Data Analysis

Analyze Data with Hadoop and MapReduce. Learn how to tackle big data problems with your own Hadoop clusters! In this course, you’ll deploy Hadoop clusters in the cloud and use them to gain insights from large datasets.

Self Paced

Self-Paced

Data Analysis Hadoop MapReduce

Udacity

Udacity,AWS

Full Stack Foundations (Udacity)

CS: Software Engineering Computer Science

Build a data-driven web app with Python. In this course you will learn the fundamentals of back-end web development! You will create your own web application that queries a database for items on restaurant menus and then dynamically generates complete menus in the form of web pages and API endpoints.

Self Paced

Self-Paced

Python Web Development Ruby on Rails

Udacity

Data Science Interview Prep (Udacity)

Personal and Professional Development Data Science

Confidently take on the tech interview. Data science job interviews can be daunting. Technical interviewers often ask you to design an experiment or model. You may need to solve problems using Python and SQL. You will likely need to show how you connect data skills to business decisions and strategy. In this course, you'll review the common questions asked in data science, data analyst, and machine learning interviews.

Self Paced

Self-Paced

Data Structures Machine Learning Interview

Udacity

Segmentation and Clustering (Udacity)

Statistics & Data Analysis Data Science

Use machine learning to create segments. The Segmentation and Clustering course provides students with the foundational knowledge to build and apply clustering models to develop more sophisticated segmentation in business contexts. In this course, you'll learn how to use an advanced analytical method called clustering to create useful segments for business contexts, whether its stores, customers, geographies, etc. You'll learn this through improving your fluency in Alteryx, a data analytics tool that enables you prepare, blend, and analyze data quickly.

Self Paced

Self-Paced

Machine Learning Clustering Segmentation

Udacity

Udacity,Google

Web Tooling & Automation (Udacity)

CS: Software Engineering CS: Information & Technology

Gulp, Sass, and BabelJS, Oh My! In this course, you’ll learn how to setup your development, get super productive during daily work and iteration, prevent yourself and your site from disasters and save a lot of time and effort with automatic optimization and automation. Finally, you’ll learn how to do all this while being confident your code runs on a multitude of devices in the real world.

Self Paced

Self-Paced

Web Javascript Optimization

Udacity

Georgia Institute of Technology,Udacity

Machine Learning (Udacity)

Data Science

Supervised, Unsupervised & Reinforcement. Machine Learning is a graduate-level course covering the area of Artificial Intelligence concerned with computer programs that modify and improve their performance through experiences. The first part of the course covers Supervised Learning, a machine learning task that makes it possible for your phone to recognize your voice, your email to filter spam, and for computers to learn a bunch of other cool stuff. In part two, you will learn about Unsupervised Learning. Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? Such answers can be found in this section!

Self Paced

Self-Paced

Machine Learning Information Theory Game Theory

Udacity

Georgia Institute of Technology,Udacity

Machine Learning: Unsupervised Learning (Udacity)

Robotics & Computer Vision

Conversations on Analyzing Data. Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? The answer can be found in Unsupervised Learning! Closely related to pattern recognition, Unsupervised Learning is about analyzing data and looking for patterns. It is an extremely powerful tool for identifying structure in data. This course focuses on how you can use Unsupervised Learning approaches -- including randomized optimization, clustering, and feature selection and transformation -- to find structure in unlabeled data.

Self Paced

Self-Paced

Machine Learning Clustering Information Theory

Udacity

Georgia Institute of Technology,Udacity

Data Analysis and Visualization (Udacity)

Statistics & Data Analysis Data Science

Data and visual analytics is an emerging field concerned with analyzing, modeling, and visualizing complex high dimensional data. This course will introduce students to the field by covering state-of-the-art modeling, analysis and visualization techniques. It will emphasize practical challenges involving complex real world data and include several case studies and hands-on work with the R programming language.

Self Paced

Self-Paced

Data Structures Regression Data Analysis

Udacity

Intro to Relational Databases (Udacity)

Computer Science

SQL, DB-API, and More! This course is a quick, fun introduction to using a relational database from your code, using examples in Python. You'll learn the basics of SQL (the Structured Query Language) and database design, as well as the Python API for connecting Python code to a database. You'll also learn a bit about protecting your database-backed web apps from common security problems. After taking this course, you'll be able to write code using a database as a backend to store application data reliably and safely.

Self Paced

Self-Paced

Python Databases SQL