Udacity

Deploying a Hadoop Cluster (Udacity)

Offered by Udacity,

Analyze Data with Hadoop and MapReduce. Learn how to tackle big data problems with your own Hadoop clusters! In this course, you’ll deploy Hadoop clusters in the cloud and use them to gain insights from large datasets.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

Using massive datasets to guide decisions is becoming more and more important for modern businesses. Hadoop and MapReduce are fundamental tools for working with big data. By knowing how to deploy your own Hadoop clusters, you’ll be able to start exploring big data on your own.

What You Will Learn

Lesson 1
Deploying a Hadoop cluster on Amazon EC2
Learn how to deploy a small Hadoop cluster on Amazon EC2 instances.

Lesson 2
Deploy a Hadoop cluster with Ambari
Use Apache Ambari to automatically deploy a larger
more powerful Hadoop cluster.

Lesson 3
On-demand Hadoop clusters
Use Amazon’s ElasticMapReduce to deploy a Hadoop cluster on-demand.

Lesson 4
Analyzing a big dataset with Hadoop and MapReduce
Use Hadoop and MapReduce to analyze a 150 GB dataset of Wikipedia page views.

Prerequisites and Requirements
This course is intended for students with some experience with Hadoop and MapReduce, Python, and bash commands. You’ll have to be able to work with HDFS and write MapReduce programs. You can learn about these in our Intro to Hadoop and MapReduce course. The MapReduce programs in the course are written in Python. It is possible to use Java and other languages, but we suggest using Python, on the level of our Intro to Computer Science course. You’ll also be using remote cloud machines, so you’ll need to know these bash commands: ssh, scp, cat, head/tail.
You’ll also need to be able to work in an editor such as vim or nano. You can learn about these in our Linux Command Line Basics course.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Johns Hopkins University

The Data Scientist's Toolbox (Coursera)

Statistics & Data Analysis Data Science

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

Jun 22nd 2026

4 Weeks

Data Github Data Analysis

Coursera

University of Washington

Practical Predictive Analytics: Models and Methods (Coursera)

Statistics & Data Analysis Data Science

Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Jun 22nd 2026

4 Weeks

Machine Learning Models Methods

Udacity

Intro to Data Science (Udacity)

Statistics & Data Analysis Data Science

Learn what it takes to become a data scientist. The Introduction to Data Science class will survey the foundational topics in data science, namely: Data Manipulation; Data Analysis with Statistics and Machine Learning; Data Communication with Information Visualization; Data at Scale -- Working with Big Data.

Self Paced

Self-Paced

Statistics Machine Learning Big Data

Udacity

Udacity,Twitter

Real-Time Analytics with Apache Storm (Udacity)

Statistics & Data Analysis Data Science

The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream, in real-time to drive d3 visualizations using Apache Storm, the "Hadoop of Real Time." Storm is free, open source, and fun to use! Learn from Karthik Ramasamy, about the distributed, fault-tolerant, and flexible technology used to power Twitter’s real-time data flow pipeline. Twitter open sourced Storm in 2011, and it graduated to a top-level Apache project in September, 2014.

Self Paced

Self-Paced

Data Analysis Hadoop Data Science

Coursera

Johns Hopkins University

Python for Genomic Data Science (Coursera)

Statistics & Data Analysis Data Science

This class provides an introduction to the Python programming language and the iPython notebook. This is the third course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Jun 22nd 2026

4 Weeks

Programming Python Big Data

Coursera

University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Statistics & Data Analysis Data Science

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 22nd 2026

4 Weeks

Algebra Algorithms Databases

Udacity

Intro to Statistics (Udacity)

Statistics & Data Analysis

Get ready to analyze, visualize, and interpret data! Thought-provoking examples and chances to combine statistics and programming will keep you engaged and challenged.

Self Paced

Self-Paced

Math Algebra Statistics

Coursera

University of Illinois at Urbana-Champaign

Text Retrieval and Search Engines (Coursera)

Statistics & Data Analysis Data Science

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text.

Jun 22nd 2026

5-12 Weeks

Machine Learning Search Data Mining

Udacity

Introduction to Machine Learning Course (Udacity)

Data Science

This class will teach you the end-to-end process of investigating data through a machine learning lens. Learn online, with Udacity. Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.

Self Paced

Self-Paced

Statistics Machine Learning Clustering

Coursera

Johns Hopkins University

Statistical Inference (Coursera)

Statistics & Data Analysis Data Science

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 22nd 2026

4 Weeks

Statistics Probability Data Analysis

Coursera

University of Minnesota

Interprofessional Healthcare Informatics (Coursera)

Statistics & Data Analysis Data Science

Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. We will be incorporating technology-enabled educational innovations to bring the subject matter to life. Over the 10 modules, we will create a vital online learning community and a working healthcare informatics network.

Jun 22nd 2026

5-12 Weeks

Healthcare Informatics Telehealth

Coursera

University of Illinois at Urbana-Champaign

Cloud Computing Concepts, Part 1 (Coursera)

CS: Theory CS: Information & Technology

Cloud computing systems today, whether open-source or used inside companies, are built using a common set of core techniques, algorithms, and design philosophies—all centered around distributed systems. Learn about such fundamental distributed computing "concepts" for cloud computing. Some of these concepts include: clouds, MapReduce, key-value/NoSQL stores, classical distributed algorithms, widely-used distributed algorithms, scalability, trending areas, and much, much more!

Jun 22nd 2026

5-12 Weeks

Programming Cloud Algorithms