Coursera

Machine Learning for Data Analysis (Coursera)

Offered by Wesleyan University,

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.

Course 4 of 5 in the Data Analysis and Interpretation Specialization.

Syllabus

WEEK 1
Decision Trees
In this session, you will learn about decision trees, a type of data mining algorithm that can select from among a large number of variables those and their interactions that are most important in predicting the target or response variable to be explained. Decision trees create segmentations or subgroups in the data, by applying a series of simple rules or criteria over and over again, which choose variable constellations that best predict the target variable.

WEEK 2
Random Forests
In this session, you will learn about random forests, a type of data mining algorithm that can select from among a large number of variables those that are most important in determining the target or response variable to be explained. Unlike decision trees, the results of random forests generalize well to new data.

WEEK 3
Lasso Regression
Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that causes regression coefficients for some variables to shrink toward zero. Variables with a regression coefficient equal to zero after the shrinkage process are excluded from the model. Variables with non-zero regression coefficients variables are most strongly associated with the response variable. Explanatory variables can be either quantitative, categorical or both. In this session, you will apply and interpret a lasso regression analysis. You will also develop experience using k-fold cross validation to select the best fitting model and obtain a more accurate estimate of your model’s test error rate.
To test a lasso regression model, you will need to identify a quantitative response variable from your data set if you haven’t already done so, and choose a few additional quantitative and categorical predictor (i.e. explanatory) variables to develop a larger pool of predictors. Having a larger pool of predictors to test will maximize your experience with lasso regression analysis. Remember that lasso regression is a machine learning method, so your choice of additional predictors does not necessarily need to depend on a research hypothesis or theory. Take some chances, and try some new variables. The lasso regression analysis will help you determine which of your predictors are most important. Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets. The cross-validation method you apply is designed to eliminate the need to split your data when you have a limited number of observations.

WEEK 4
K-Means Cluster Analysis
Cluster analysis is an unsupervised machine learning method that partitions the observations in a data set into a smaller set of clusters where each observation belongs to only one cluster. The goal of cluster analysis is to group, or cluster, observations into subsets based on their similarity of responses on multiple variables. Clustering variables should be primarily quantitative variables, but binary variables may also be included. In this session, we will show you how to use k-means cluster analysis to identify clusters of observations in your data set. You will gain experience in interpreting cluster analysis results by using graphing methods to help you determine the number of clusters to interpret, and examining clustering variable means to evaluate the cluster profiles. Finally, you will get the opportunity to validate your cluster solution by examining differences between clusters on a variable not included in your cluster analysis.
You can use the same variables that you have used in past weeks as clustering variables. If most or all of your previous explanatory variables are categorical, you should identify some additional quantitative clustering variables from your data set. Ideally, most of your clustering variables will be quantitative, although you may also include some binary variables. In addition, you will need to identify a quantitative or binary response variable from your data set that you will not include in your cluster analysis. You will use this variable to validate your clusters by evaluating whether your clusters differ significantly on this response variable using statistical methods, such as analysis of variance or chi-square analysis, which you learned about in Course 2 of the specialization (Data Analysis Tools). Note also that if you are working with a relatively small data set, you do not need to split your data into training and test data sets.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Google

Crash Course on Python (Coursera)

CS: Information & Technology Computer Science

This course is designed to teach you the foundations in order to write simple programs in Python using the most common structures. No previous exposure to programming is needed. By the end of this course, you'll understand the benefits of programming in IT roles; be able to write simple programs using Python; figure out how the building blocks of programming fit together; and combine all of this knowledge to solve a complex programming problem.

Jun 23rd 2026

5-12 Weeks

Python Data Structures Object-Oriented Programming

Coursera

Johns Hopkins University

Bioconductor for Genomic Data Science (Coursera)

Statistics & Data Analysis Data Science

Learn to use tools from the Bioconductor project to perform analysis of genomic data. This is the fifth course in the Genomic Big Data Specialization from Johns Hopkins University.

Jun 22nd 2026

4 Weeks

Bioinformatics Data Analysis Data Science

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 2 (Coursera)

Statistics & Data Analysis Data Science

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Jun 22nd 2026

4 Weeks

Math Statistics Probability

Coursera

Johns Hopkins University

R Programming (Coursera)

Statistics & Data Analysis Data Science

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language.

Jun 22nd 2026

4 Weeks

Programming Debugging Data Analysis

Coursera

University of Toronto

Learn to Program: The Fundamentals (Coursera)

CS: Programming

Behind every mouse click and touch-screen tap, there is a computer program that makes things happen. This course introduces the fundamental building blocks of programming and teaches you how to write fun and useful programs using the Python language.

Jun 22nd 2026

5-12 Weeks

Programming Python Semantics

Coursera

Johns Hopkins University

Statistics for Genomic Data Science (Coursera)

Statistics & Data Analysis Data Science

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Jun 22nd 2026

4 Weeks

Statistics Biostatistics Data Analysis

Coursera

Duke University

Introduction to Machine Learning (Coursera)

Data Science

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 26th 2026

5-12 Weeks

ML NLP Machine Learning

Coursera

Johns Hopkins University

Exploratory Data Analysis (Coursera)

Statistics & Data Analysis Data Science

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026

4 Weeks

Statistics Data Analysis Data Science

Coursera

University of California, San Diego

Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera)

Sci: Biology & Life Sciences Health & Society

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different. In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.

Jun 22nd 2026

5-12 Weeks

Genes DNA Genomes

Coursera

Johns Hopkins University

Python for Genomic Data Science (Coursera)

Statistics & Data Analysis Data Science

This class provides an introduction to the Python programming language and the iPython notebook. This is the third course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Jun 22nd 2026

4 Weeks

Programming Python Big Data

Coursera

Johns Hopkins University

Practical Machine Learning (Coursera)

Statistics & Data Analysis Data Science

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates.

Jun 22nd 2026

4 Weeks

Algorithms Machine Learning Regression

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 1 (Coursera)

Sci: Mathematics

This class presents the fundamental probability and statistical concepts used in elementary data analysis. It will be taught at an introductory level for students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.

Jun 22nd 2026

4 Weeks

Math Statistics Probability