Sample-based Learning Methods (Coursera)

Sample-based Learning Methods (Coursera)

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

By the end of this course you will be able to:

  • Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience
  • Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model
  • Understand the connections between Monte Carlo and Dynamic Programming and TD.
  • Implement and apply the TD algorithm, for estimating value functions
  • Implement and apply Expected Sarsa and Q-learning (two TD methods for control)
  • Understand the difference between on-policy and off-policy control
  • Understand planning with simulated experience (as opposed to classic planning strategies)
  • Implement a model-based approach to RL, called Dyna, which uses simulated experience
  • Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Course 2 of 4 in the Reinforcement Learning Specialization.

Syllabus

WEEK 1
Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.

WEEK 2
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.

WEEK 3
Temporal Difference Learning Methods for Control
This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.

WEEK 4
Planning, Learning & Acting
Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 15th 2026
5-12 Weeks
Learning How to Learn: Powerful mental tools to help you master tough subjects (Coursera) Coursera
University of California, San Diego,McMaster University

Learning How to Learn: Powerful mental tools to help you master tough subjects (Coursera)

This course gives you easy access to the invaluable learning techniques used by experts in art, music, literature, math, science, sports, and many other disciplines. We’ll learn about the how the brain uses two very different learning modes and how it encapsulates (“chunks”) information. We’ll also cover illusions of learning, memory techniques, dealing with procrastination, and best practices shown by research to be most effective in helping you master tough subjects.

Jun 15th 2026
4 Weeks
e-Learning Ecologies: Innovative Approaches to Teaching and Learning for the Digital Age (Coursera) Coursera
University of Illinois at Urbana-Champaign

e-Learning Ecologies: Innovative Approaches to Teaching and Learning for the Digital Age (Coursera)

For three decades and longer we have heard educators and technologists making a case for the transformative power of technology in learning. However, despite the rhetoric, in many ways and at most institutional sites, education is still relatively untouched by technology. Even when technologies are introduced, the changes sometimes seem insignificant and the results seem disappointing. If the print textbook is replaced by an e-book, do the social relations of knowledge and learning necessarily change at all or for the better?

Jun 15th 2026
4 Weeks
Engaging ELLs and Their Families in the School and Community (Coursera) Coursera
Arizona State University

Engaging ELLs and Their Families in the School and Community (Coursera)

In this course, you will learn how to better and more successfully engage your ELL(s) and their families in the school and community. You will learn how to engage your ELL student in the classroom setting as well as in various aspects of the school including extracurricular activities and the inner workings of the school and education system. You will also be introduced to strategies for engaging the families of your ELL students in the school community and the wider community of your city and state.

Jun 15th 2026
5-12 Weeks
Machine Learning: Clustering & Retrieval (Coursera) Coursera
University of Washington

Machine Learning: Clustering & Retrieval (Coursera)

Case Studies: Finding Similar Documents. A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?

Jun 15th 2026
5-12 Weeks
Introduction to Computer Vision with Watson and OpenCV (Coursera) Coursera
IBM

Introduction to Computer Vision with Watson and OpenCV (Coursera)

Computer Vision is one of the most exciting fields in Machine Learning and AI. It has applications in many industries such as self-driving cars, robotics, augmented reality, face detection in law enforcement agencies. In this beginner-friendly course you will understand about computer vision, and will learn about its various applications across many industries.

Jun 15th 2026
4 Weeks
The Economics of AI (Coursera) Coursera
University of Virginia

The Economics of AI (Coursera)

The course introduces you to cutting-edge research in the economics of AI and the implications for economic growth and labor markets. We start by analyzing the nature of intelligence and information theory. Then we connect our analysis to modeling production and technological change in economics, and how these processes are affected by AI. Next we turn to how technological change drives aggregate economic growth, covering a range of scenarios including a potential growth singularity.

Jun 16th 2026
5-12 Weeks
Studying at Japanese Universities (Coursera) Coursera
The University of Tokyo

Studying at Japanese Universities (Coursera)

Are you interested in studying at Japanese universities? Do you want to learn about Japan’s university application and enrollment processes, as well as the types of programs on offer? This course will help you to both discover great programs offered by different Japanese universities and prepare a study plan through project-based learning. We introduce a number of options to match a variety of goals, from full degree to non-degree programs, programs taught in English, as well as short-term programs in Japan. During the course, international students at UTokyo will provide you with useful information and advice to start you on the path to studying in Japan.

Jun 15th 2026
4 Weeks
Nearest Neighbor Collaborative Filtering (Coursera) Coursera
University of Minnesota

Nearest Neighbor Collaborative Filtering (Coursera)

In this course, you will learn the fundamental techniques for making personalized recommendations through nearest-neighbor techniques. First you will learn user-user collaborative filtering, an algorithm that identifies other people with similar tastes to a target user and combines their ratings to make recommendations for that user.

Jun 15th 2026
4 Weeks
Google Cloud Platform Fundamentals: Core Infrastructure (Coursera) Coursera
Google

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera)

This course introduces you to important concepts and terminology for working with Google Cloud Platform (GCP). You learn about, and compare, many of the computing and storage services available in Google Cloud Platform, including Google App Engine, Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, Google Cloud SQL, and BigQuery. You learn about important resource and policy management tools, such as the Google Cloud Resource Manager hierarchy and Google Cloud Identity and Access Management. Hands-on labs give you foundational skills for working with GCP.

Jun 15th 2026
1 Week
Assessing Achievement with the ELL in Mind (Coursera) Coursera
Arizona State University

Assessing Achievement with the ELL in Mind (Coursera)

In this course, you will learn how to design assessments around the needs of your ELL students and their language level. You will learn how to incorporate language and content requirements for both formative and summative assessment types. You will learn to assess your ELL students through the use of project and task-based assignments.

Jun 15th 2026
5-12 Weeks