Modeling Data in the Tidyverse (Coursera)

Modeling Data in the Tidyverse (Coursera)

Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships. This course covers the types of questions you can ask of data and the various modeling approaches that you can apply.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Topics covered include hypothesis testing, linear regression, nonlinear modeling, and machine learning. With this collection of tools at your disposal, as well as the techniques learned in the other courses in this specialization, you will be able to make key discoveries from your data for improving decision-making throughout your organization.
In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
Course 5 of 5 in the Tidyverse Skills for Data Science in R Specialization.

What You Will Learn

  • Describe different types of data analytic questions
  • Conduct hypothesis tests of your data
  • Apply linear modeling techniques to answer multivariable questions
  • Apply machine learning workflows to detect complex patterns in your data

Syllabus

WEEK 1
Modeling Data Basics
Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships.

WEEK 2
Inference
Inferential Analysis is what analysts carry out after they’ve described and explored their dataset. After understanding your dataset better, analysts often try to infer something from the data. This is done using statistical tests. We discussed a bit about how we can use models to perform inference and prediction analyses. What does this mean?

WEEK 3
Linear Modeling
Linear models are the most commonly used models in data analysis because of their computational efficiency and their ease of interpretation. Having a solid understanding of linear models and how they work is critical for any work in data science. The tidyverse provides a set of tools for making linear modeling more efficient and streamlined.

WEEK 4
Multiple Linear Regression
Multiple linear regression is needed when you want to include confounding factors or other predictors in your model for the response. R provides a straightforward way to do this via the formula interface to the lm() function.

WEEK 5
Beyond Linear Regression
While we’ve focused on linear regression in this lesson on inference, linear regression isn’t the only analytical approach out there. However, it is arguably the most commonly used. And, beyond that, there are many statistical tests and approaches that are slight variations on linear regression, so having a solid foundation and understanding of linear regression makes understanding these other tests and approaches much simpler. For example, what if you didn’t want to measure the linear relationship between two variables, but instead wanted to know whether or not the average observed is different from expectation?

WEEK 6
Hypothesis Testing
Hypothesis testing describes a family of statistical techniques for determining whether the data you collect provides evidence for the value of an unknown parameter of interest. The goal of hypothesis tests is to make inferences while accounting for variability in the data that can lead to spurious results.

WEEK 7
Prediction Modeling
Prediction modeling is an essential activity in data science and involves building systems for making predictions based on previously observed data. These models are typically very flexible (much more than linear models) and can capture a range of different relationships.

WEEK 8
The tidymodels Ecosystem
There are incredibly helpful packages available in R thanks to the work of RStudio. As mentioned above, there are hundreds of different machine learning algorithms. The tidymodels R packages have compiled all of them into a single framework, allowing you to use many different machine learning models easily.

WEEK 9
Case Studies
This case study will demonstrate an approach to building a prediction model for predicting outdoor air pollution concentrations in the United States.

WEEK 10
Summary of tidymodels
The tidymodels collection of packages can be overwhelming at first glance. Here, we provide a quick summary chart to help navigate all of the packages and when they should be used.

WEEK 11
Project: Modeling Data in the Tidyverse
In this project, you will practice building models with the tidyverse for classifying consumer complaints data from the Consumer Financial Protection Bureau (CFPB). This project includes both a Peer Review step in which you'll upload R Markdown and knitted HTML files AND a Quiz step in which you'll answer questions about the predictions made by your classification algorithm.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Introduction to Genomic Technologies (Coursera) Coursera
Johns Hopkins University

Introduction to Genomic Technologies (Coursera)

This course introduces you to the basic biology of modern genomics and the experimental tools that we use to measure it. We'll introduce the Central Dogma of Molecular Biology and cover how next-generation sequencing can be used to measure DNA, RNA, and epigenetic patterns. You'll also get an introduction to the key concepts in computing and data science that you'll need to understand how data from next-generation sequencing experiments are generated and analyzed.

Jun 22nd 2026
4 Weeks
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 22nd 2026
5-12 Weeks
Interprofessional Healthcare Informatics (Coursera) Coursera
University of Minnesota

Interprofessional Healthcare Informatics (Coursera)

Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. We will be incorporating technology-enabled educational innovations to bring the subject matter to life. Over the 10 modules, we will create a vital online learning community and a working healthcare informatics network.

Jun 22nd 2026
5-12 Weeks
Marketing Analytics (Coursera) Coursera
University of Virginia

Marketing Analytics (Coursera)

Organizations large and small are inundated with data about consumer choices. But that wealth of information does not always translate into better decisions. Knowing how to interpret data is the challenge -- and marketers in particular are increasingly expected to use analytics to inform and justify their decisions. Marketing analytics enables marketers to measure, manage and analyze marketing performance to maximize its effectiveness and optimize return on investment (ROI). Beyond the obvious sales and lead generation applications, marketing analytics can offer profound insights into customer preferences and trends, which can be further utilized for future marketing and business decisions.

Jun 22nd 2026
5-12 Weeks
Leadership Through Marketing (Coursera) Coursera
Northwestern University

Leadership Through Marketing (Coursera)

The success of every organization depends on attracting and retaining customers. Although the marketing concepts for doing so are well established, digital technology has empowered customers, while producing massive amounts of data, revolutionizing the processes through which organizations attract and retain customers. In this course, students will learn how to identify new opportunities to create value for empowered consumers, develop strategies that yield an advantage over rivals, and develop the data science skills to lead more effectively, allocate resources, and to confront this very challenging environment with confidence.

Jun 28th 2026
4 Weeks
Mathematical Biostatistics Boot Camp 1 (Coursera) Coursera
Johns Hopkins University

Mathematical Biostatistics Boot Camp 1 (Coursera)

This class presents the fundamental probability and statistical concepts used in elementary data analysis. It will be taught at an introductory level for students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.

Jun 22nd 2026
4 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 22nd 2026
4 Weeks
Practical Predictive Analytics: Models and Methods (Coursera) Coursera
University of Washington

Practical Predictive Analytics: Models and Methods (Coursera)

Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Jun 22nd 2026
4 Weeks
Experimentation for Improvement (Coursera) Coursera
McMaster University

Experimentation for Improvement (Coursera)

We are always using experiments to improve our lives, our community, and our work. Are you doing it efficiently? Or are you (incorrectly) changing one thing at a time and hoping for the best? In this course, you will learn how to plan efficient experiments - testing with many variables. Our goal is to find the best results using only a few experiments. A key part of the course is how to optimize a system.

Jun 22nd 2026
5-12 Weeks