Decision Making and Reinforcement Learning (Coursera)

Offered by Columbia University,
Decision Making and Reinforcement Learning (Coursera)

This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.

What You Will Learn

  • Map between qualitative preferences and appropriate quantitative utilities.
  • Model non-associative and associative sequential decision problems with multi-armed bandit problems and Markov decision processes respectively
  • Implement dynamic programming algorithms to find optimal policies
  • Implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods

Syllabus

WEEK 1
Decision Making and Utility Theory
Welcome to Decision Making and Reinforcement Learning! During this week, Professor Tony Dear provides an overview of the course. You will also view guidelines to support your learning journey towards modeling sequential decision problems and implementing reinforcement learning algorithms.

WEEK 2
Bandit Problems
Welcome to week 2! This week, we will learn about multi-armed bandit problems, a type of optimization problem in which the algorithm balances exploration and exploitation to maximize rewards. Topics include action values and sample averaging estimation, ?-greedy action selection, and the upper confidence bound. You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 3
Markov Decision Processes
Welcome to week 3! This week, we will focus on the basics of the Markov decision process, including rewards, utilities, discounting, policies, value functions, and Bellman equations. You will model sequential decision problems, understand the impact of rewards and discount factors on outcomes, define policies and value functions, and write Bellman equations for optimal solutions. You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 4
Dynamic Programming
Welcome to week 4! This week, we will cover dynamic programming algorithms for solving Markov decision processes (MDPs). Topics include value iteration and policy iteration, nonlinear Bellman equations, complexity and convergence, and a comparison of the two approaches.You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 5
Partially Observable Markov Decision Processes
Welcome to week 5! This week, we will go through topics on partial observability and POMDPs, belief states, representation as belief MDPs, and online planning in MDPs and POMDPs. You will also apply your knowledge to update the belief state and employ a belief transition function to calculate state values. You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 6
Monte Carlo Methods
Welcome to week 6! This week, we will introduce Monte Carlo methods, and cover topics related to state value estimation using sample averaging and Monte Carlo prediction, state-action values and epsilon-greedy policies, and importance sampling for off-policy vs on-policy Monte Carlo control. You will learn to estimate state values, state-action values, use importance sampling, and implement off-policy Monte Carlo control for optimal policy learning. You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 7
Temporal-Difference Learning
Welcome to week 7! This week, we will cover topics related to temporal difference learning for prediction, TD batch methods, SARSA for on-policy control, and Q-learning for off-policy control. You will learn to implement TD prediction, TD batch and offline methods, SARSA and Q-learning, and compare on-policy vs off-policy TD learning. You will then apply your knowledge in solving a Tic-tac-toe programming assignment.You could post in the discussion forum if you need assistance on the quiz and assignment.

WEEK 8
Reinforcement Learning - Generalization
Welcome to week 8! This module covers n-step temporal difference prediction, n-step SARSA (on-policy and off-policy), model-based RL with Dyna-Q, and function approximation. You will be prepared to implement n-step TD learning, n-step SARSA, Dyna-Q for model-based learning, and use function approximation for reinforcement learning. You will apply your knowledge in the Frozen Lake programming environment. You could post in the discussion forum if you need assistance on the quiz and assignment.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Foundations of Everyday Leadership (Coursera) Coursera
University of Illinois at Urbana-Champaign

Foundations of Everyday Leadership (Coursera)

In this course you will learn about the “head and heart” of everyday leadership, individual decision making, group decision making, and managing motivation. The objectives are to understand why and how leadership skills are so critical to organizational success, and learn the foundations of effective leadership skills. Learners: understand why and how leadership skills are so critical to organizational success; know how to use leadership skills to work more effectively with others, and are able to organize teams to work more effectively together; will be able to apply the foundations of effective leadership skills to everyday situations faced by leaders.

Jun 22nd 2026
4 Weeks
Managing the Organization (Coursera) Coursera
University of Illinois at Urbana-Champaign

Managing the Organization (Coursera)

This course is intended to help you become a better manager by helping you to more fully understand and deal with some of the complexities and challenges associated with managerial life in organizations. In this course, you will learn theories, principles, and frameworks that will help you to more effectively manage and lead the organizations that you belong to. We will view organizations from different perspectives that we will use as lenses to help us highlight common managerial challenges and point us toward solutions to those challenges. Some of these common challenges that we will explore in this course include using power effectively, implementing organizational change, understanding and managing organizational culture, decision-making including decision-making pitfalls and ethical traps, and leadership. As you learn and apply the principles from this course, you will be better prepared to navigate some of the complex challenges that you face as a manager.

Jun 24th 2026
4 Weeks
Pattern Discovery in Data Mining (Coursera) Coursera
University of Illinois at Urbana-Champaign

Pattern Discovery in Data Mining (Coursera)

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern discovery in data mining. We will also introduce methods for data-driven phrase mining and some interesting applications of pattern discovery. This course provides you the opportunity to learn skills and content to practice and engage in scalable pattern discovery methods on massive transactional data, discuss pattern evaluation measures, and study methods for mining diverse kinds of patterns, sequential patterns, and sub-graph patterns.

Jun 22nd 2026
4 Weeks
Business Intelligence Concepts, Tools, and Applications (Coursera) Coursera
University of Colorado System

Business Intelligence Concepts, Tools, and Applications (Coursera)

This is the fourth course in the Data Warehouse for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will gain the knowledge and skills for using data warehouses for business intelligence purposes and for working as a business intelligence developer. You’ll have the opportunity to work with large data sets in a data warehouse environment and will learn the use of MicroStrategy's Online Analytical Processing (OLAP) and Visualization capabilities to create visualizations and dashboards.

Jun 22nd 2026
5-12 Weeks
Advanced Data Structures in Java (Coursera) Coursera
University of California, San Diego

Advanced Data Structures in Java (Coursera)

How does Google Maps plan the best route for getting around town given current traffic conditions? How does an internet router forward packets of network traffic to minimize delay? How does an aid group allocate resources to its affiliated local partners? To solve such problems, we first represent the key pieces of data in a complex data structure. In this course, you’ll learn about data structures, like graphs, that are fundamental for working with structured real world data.

Jun 22nd 2026
5-12 Weeks
Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera) Coursera
University of California, San Diego

Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera)

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different. In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.

Jun 22nd 2026
5-12 Weeks
Algorithms, Part II (Coursera) Coursera
Princeton University

Algorithms, Part II (Coursera)

This course covers the essential information that every serious programmer needs to know about algorithms and data structures, with emphasis on applications and scientific performance analysis of Java implementations. Part I covers elementary data structures, sorting, and searching algorithms. Part II focuses on graph- and string-processing algorithms.

Jun 22nd 2026
5-12 Weeks
Unordered Data Structures (Coursera) Coursera
University of Illinois at Urbana-Champaign

Unordered Data Structures (Coursera)

The Unordered Data Structures course covers the data structures and algorithms needed to implement hash tables, disjoint sets and graphs. These fundamental data structures are useful for unordered data. For example, a hash table provides immediate access to data indexed by an arbitrary key value, that could be a number (such as a memory address for cached memory), a URL (such as for a web cache) or a dictionary.

Jun 24th 2026
4 Weeks
Finding Hidden Messages in DNA (Bioinformatics I) (Coursera) Coursera
University of California, San Diego

Finding Hidden Messages in DNA (Bioinformatics I) (Coursera)

This course begins a series of classes illustrating the power of computing in modern biology. Please join us on the frontier of bioinformatics to look for hidden messages in DNA without ever needing to put on a lab coat. In the first half of the course, we investigate DNA replication, and ask the question, where in the genome does DNA replication begin? We will see that we can answer this question for many bacteria using only some straightforward algorithms to look for hidden messages in the genome.

Jun 22nd 2026
5-12 Weeks
Practical Predictive Analytics: Models and Methods (Coursera) Coursera
University of Washington

Practical Predictive Analytics: Models and Methods (Coursera)

Statistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Jun 22nd 2026
4 Weeks