Finding Mutations in DNA and Proteins (Bioinformatics VI) (Coursera)

Finding Mutations in DNA and Proteins (Bioinformatics VI) (Coursera)

In previous courses in the Specialization, we have discussed how to sequence and compare genomes. This course will cover advanced topics in finding mutations lurking within DNA and proteins. In the first half of the course, we would like to ask how an individual's genome differs from the "reference genome" of the species.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Our goal is to take small fragments of DNA from the individual and "map" them to the reference genome. We will see that the combinatorial pattern matching algorithms solving this problem are elegant and extremely efficient, requiring a surprisingly small amount of runtime and memory.
In the second half of the course, we will learn how to identify the function of a protein even if it has been bombarded by so many mutations compared to similar proteins with known functions that it has become barely recognizable. This is the case, for example, in HIV studies, since the virus often mutates so quickly that researchers can struggle to study it. The approach we will use is based on a powerful machine learning tool called a hidden Markov model.
Finally, you will learn how to apply popular bioinformatics software tools applying hidden Markov models to compare a protein against a related family of proteins.
Course 6 of 7 in the Bioinformatics Specialization.

Syllabus

WEEK 1
Introduction to Read Mapping
In this class, we will consider the following two central biological questions (the computational approaches needed to solve them are shown in parentheses): How Do We Locate Disease-Causing Mutations? (Combinatorial Pattern Matching)Why Have Biologists Still Not Developed an HIV Vaccine?(Hidden Markov Models)

WEEK 2
The Burrows-Wheeler Transform
This week, we will introduce a paradigm called the Burrows-Wheeler transform; after seeing how it can be used in string compression, we will demonstrate that it is also the foundation of modern read-mapping algorithms.

WEEK 3
Speeding Up Burrows-Wheeler Read Mapping
Last week, we saw how the Burrows-Wheeler transform could be applied to multiple pattern matching. This week, we will speed up our algorithm and generalize it to the case that patterns have errors, which models the biological problem of mapping reads with errors to a reference genome.

WEEK 4
Introduction to Hidden Markov Models
This week, we will start examining the case of aligning sequences with many mutations -- such as related genes from different HIV strains -- and see that our problem formulation for sequence alignment is not adequate for highly diverged sequences. To improve our algorithms, we will introduce a machine-learning paradigm called a hidden Markov model and see how dynamic programming helps us answer questions about these models.

WEEK 5
Profile HMMs for Sequence Alignment
Last week, we introduced hidden Markov models. This week, we will see how hidden Markov models can be applied to sequence alignment with a profile HMM. We will then consider some advanced topics in this area, which are related to advanced methods that we considered in a previous course for clustering.

WEEK 6
Bioinformatics Application Challenge
This week brings our Application Challenge, in which we apply the HMM sequence alignment algorithms that we have developed.

Suggested Readings:
Bioinformatics Algorithms An Active Learning Approach

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

The Little Stuff: Energy, Cells, and Genetics (Coursera) Coursera
University of Colorado Boulder

The Little Stuff: Energy, Cells, and Genetics (Coursera)

In this course, we will explore the smaller side of biology: molecular biology. We’ll cover basic topics including cell biology and how cells can go “rogue” and turn into cancer, how energy from the sun is transferred to fuel our bodies, basics of genetics and inheritance, and genetic technologies. At the end of this course, we will discuss ethical and moral implications of several exciting and new genetic technologies.

Jun 15th 2026
4 Weeks
Hacking COVID-19 — Course 1: Identifying a Deadly Pathogen (Coursera) Coursera
University of California, San Diego

Hacking COVID-19 — Course 1: Identifying a Deadly Pathogen (Coursera)

In this course, you will follow in the footsteps of the bioinformaticians investigating the COVID-19 outbreak by assembling the SARS-CoV-2 genome. Whether you’re new to the world of computational biology, or you’re a bioinformatics expert seeking to learn about its applications in the COVID-19 pandemic, or somewhere in between, this course is for you!

Jun 1st 2026
2 Weeks
Classical papers in molecular genetics (Coursera) Coursera
University of Geneva

Classical papers in molecular genetics (Coursera)

You have all heard about the DNA double helix and genes. Many of you know that mutations occur randomly, that the DNA sequence is read by successive groups of three bases (the codons), that many genes encode enzymes, and that gene expression can be regulated. These concepts were proposed on the basis of astute genetic experiments, as well as often on biochemical results. The original articles were these concepts appeared are however not frequently part of the normal curriculum of biologists, biochemists and medical students.

Jun 1st 2026
5-12 Weeks
Genome Assembly Programming Challenge (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Genome Assembly Programming Challenge (Coursera)

In Spring 2011, thousands of people in Germany were hospitalized with a deadly disease that started as food poisoning with bloody diarrhea and often led to kidney failure. It was the beginning of the deadliest outbreak in recent history, caused by a mysterious bacterial strain that we will refer to as E. coli X. Soon, German officials linked the outbreak to a restaurant in Lübeck, where nearly 20% of the patrons had developed bloody diarrhea in a single week. At this point, biologists knew that they were facing a previously unknown pathogen and that traditional methods would not suffice – computational biologists would be needed to assemble and analyze the genome of the newly emerged pathogen.

Jun 15th 2026
3 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 22nd 2026
5-12 Weeks
DNA Decoded (Coursera) Coursera
McMaster University

DNA Decoded (Coursera)

Are you a living creature? Then, congratulations! You’ve got DNA. But how much do you really know about the microscopic molecules that make you unique? Why is DNA called the “blueprint of life”? What is a “DNA fingerprint”? How do scientists clone DNA? What can DNA teach you about your family history? Are Genetically Modified Organisms (GMOs) safe? Is it possible to revive dinosaurs by cloning their DNA?

Jun 15th 2026
4 Weeks
Bioinformatics: Introduction and Methods 生物信息学: 导论与方法 (Coursera) Coursera
Peking University

Bioinformatics: Introduction and Methods 生物信息学: 导论与方法 (Coursera)

A big welcome to “Bioinformatics: Introduction and Methods” from Peking University! In this MOOC you will become familiar with the concepts and computational methods in the exciting interdisciplinary field of bioinformatics and their applications in biology, the knowledge and skills in bioinformatics you acquired will help you in your future study and research.

Jun 22nd 2026
13-24 Weeks
Plant Bioinformatics (Coursera) Coursera
University of Toronto

Plant Bioinformatics (Coursera)

The past 15 years have been exciting ones in plant biology. Hundreds of plant genomes have been sequenced, RNA-seq has enabled transcriptome-wide expression profiling, and a proliferation of "-seq"-based methods has permitted protein-protein and protein-DNA interactions to be determined cheaply and in a high-throughput manner. These data sets in turn allow us to generate hypotheses at the click of a mouse.

Jun 1st 2026
5-12 Weeks
Introduction to Forensic Science (Coursera) Coursera
Nanyang Technological University

Introduction to Forensic Science (Coursera)

We have all seen forensic scientists in TV shows, but how do they really work? What is the science behind their work? The course aims to explain the scientific principles and techniques behind the work of forensic scientists and will be illustrated with numerous case studies from Singapore and around the world.

May 4th 2026
5-12 Weeks
Contemporary Biology (Coursera) Coursera
University of North Texas

Contemporary Biology (Coursera)

This course is an introduction to biology as it applies to our everyday life. Learners will explore the interplay between science and self through a personalized case study of themselves and their environment. By the end of the course, learners will be able to recognize the interactions among natural phenomena and the implications of the scientific principles behind the physical world and their experiences living in it.

Jun 22nd 2026
4 Weeks