Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera)

Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera)

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different. In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

In the second half of the course, we will "zoom out" to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution. Looking at the human and mouse genomes, we will ask ourselves: just as earthquakes are much more likely to occur along fault lines, are there locations in our genome that are "fragile" and more susceptible to be broken as part of genome rearrangements? We will see how combinatorial algorithms will help us answer this question.
Finally, you will learn how to apply popular bioinformatics software tools to solve problems in sequence alignment, including BLAST.
Course 3 of 7 in the Bioinformatics Specialization.

Syllabus

WEEK 1
Introduction to Sequence Alignment
If you joined us in the previous course in this Specialization, then you became an expert at assembling genomes and sequencing antibiotics. The next natural question to ask is how to compare DNA and amino acid sequences. This question will motivate this week's discussion of sequence alignment, which is the first of two questions that we will ask in this class (the algorithmic methods used to answer them are shown in parentheses): How Do We Compare DNA Sequences? (Dynamic Programming)Are There Fragile Regions in the Human Genome? (Combinatorial Algorithms)

WEEK 2
From Finding a Longest Path to Aligning DNA Strings
Last week, we saw how touring around Manhattan and making change in a Roman shop help us find a longest common subsequence of two DNA or protein strings. This week, we will study how to find a highest scoring alignment of two strings. We will see that regardless of the underlying assumptions that we make regarding how the strings should be aligned, we will be able to phrase our alignment problem as an instance of finding the longest path in a directed acyclic graph.

WEEK 3
Advanced Topics in Sequence Alignment
Last week, we saw how a variety of different applications of sequence alignment can all be reduced to finding the longest path in a Manhattan-like graph. This week, we will conclude the current chapter by considering a few advanced topics in sequence alignment. For example, if we need to align long strings, our current algorithm will consume a huge amount of memory. Can we find a more memory-efficient approach? And what should we do when we move from aligning just two strings at a time to aligning many strings?

WEEK 4
Genome Rearrangements and Fragility
You now know how to compare two DNA (or protein) strings. But what if we wanted to compare entire genomes? When we "zoom out" to the genome level, we find that substitutions, insertions, and deletions don't tell the whole story of evolution: we need to model more dramatic evolutionary events known as genome rearrangements, which wrench apart chromosomes and put them back together in a new order. A natural question to ask is whether there are "fragile regions" hidden in your genome where chromosome breakage has occurred more often over millions of years. This week, we will begin addressing this question by asking how we can compute the number of rearrangements on the evolutionary path connecting two species.

WEEK 5
Applying Genome Rearrangement Analysis to Find Genome Fragility
Last week, we asked whether there are fragile regions in the human genome. Then, we took a lengthy detour to see how to compute a distance between species genomes, a discussion that we will continue this week. It is probably unclear how computing the distance between two genomes can help us understand whether fragile regions exist. If so, please stay tuned -- we will see that the connection between these two concepts will yield a surprising conclusion to the class.

WEEK 6
Bioinformatics Application Challenge
In the sixth and final week of the course, we will apply sequence alignment algorithms to infer the non-ribosomal code.

Suggested Readings:
Bioinformatics Algorithms An Active Learning Approach

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning for Data Analysis (Coursera) Coursera
Wesleyan University

Machine Learning for Data Analysis (Coursera)

Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering.

Jun 1st 2026
4 Weeks
Hypothesis Testing with Python and Excel (Coursera) Coursera
Tufts University

Hypothesis Testing with Python and Excel (Coursera)

In today's job market, leaders need to understand the fundamentals of data to be competitive. An essential procedure to understand business and analytics is hypothesis testing. This short course, designed by Tufts University expert faculty, will teach the fundamentals of hypothesis testing of a population mean and a population proportion, using Excel and Python for calculations. You'll also discover the central limit theorem, which is essential for hypothesis testing. To conclude the course, you will apply your newfound skills by creating a plan for an experiment in your own workplace that uses hypothesis testing.

Jun 2nd 2026
1 Week
Advanced Algorithms and Complexity (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Advanced Algorithms and Complexity (Coursera)

You've learned the basic algorithms now and are ready to step into the area of more complex problems and algorithms to solve them. Advanced algorithms build upon basic ones and use new ideas. We will start with networks flows which are used in more typical applications such as optimal matchings, finding disjoint paths and flight scheduling as well as more surprising ones like image segmentation in computer vision.

Jun 1st 2026
5-12 Weeks
Hacking COVID-19 — Course 1: Identifying a Deadly Pathogen (Coursera) Coursera
University of California, San Diego

Hacking COVID-19 — Course 1: Identifying a Deadly Pathogen (Coursera)

In this course, you will follow in the footsteps of the bioinformaticians investigating the COVID-19 outbreak by assembling the SARS-CoV-2 genome. Whether you’re new to the world of computational biology, or you’re a bioinformatics expert seeking to learn about its applications in the COVID-19 pandemic, or somewhere in between, this course is for you!

Jun 1st 2026
2 Weeks
Integrating Test-Driven Development into Your Workflow (Coursera) Coursera
LearnQuest

Integrating Test-Driven Development into Your Workflow (Coursera)

In this course we will discuss how to integrate best practices of test-driven development into your programming workflow. We will start out by discussing how to refactor legacy codebases with the help of agile methodologies. Then, we will explore continuous integration and how to write automated tests in Python. Finally, we will work everything we've learned together to write code that contains error handlers, automated tests, and refactored functions.

Jun 1st 2026
4 Weeks
Introduction to Graph Theory (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Introduction to Graph Theory (Coursera)

We invite you to a fascinating journey into Graph Theory — an area which connects the elegance of painting and the rigor of mathematics; is simple, but not unsophisticated. Graph Theory gives us, both an easy way to pictorially represent many major mathematical results, and insights into the deep theories behind them. In this course, among other intriguing applications, we will see how GPS systems find shortest routes, how engineers design integrated circuits, how biologists assemble genomes, why a political map can always be colored using a few colors. We will study Ramsey Theory which proves that in a large system, complete disorder is impossible!

Jun 1st 2026
5-12 Weeks
Python and Machine Learning for Asset Management (Coursera) Coursera
EDHEC Business School

Python and Machine Learning for Asset Management (Coursera)

This course will enable you mastering machine-learning approaches in the area of investment management. It has been designed by two thought leaders in their field, Lionel Martellini from EDHEC-Risk Institute and John Mulvey from Princeton University. Starting from the basics, they will help you build practical skills to understand data science so you can make the best portfolio decisions.

Jun 1st 2026
5-12 Weeks
Bioinformatic Methods I (Coursera) Coursera
University of Toronto

Bioinformatic Methods I (Coursera)

Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.

Jun 1st 2026
5-12 Weeks
Developing AI Applications on Azure (Coursera) Coursera
LearnQuest

Developing AI Applications on Azure (Coursera)

This course introduces the concepts of Artificial Intelligence and Machine learning. We'll discuss machine learning types and tasks, and machine learning algorithms. You'll explore Python as a popular programming language for machine learning solutions, including using some scientific ecosystem packages which will help you implement machine learning.

Jun 1st 2026
5-12 Weeks
The Little Stuff: Energy, Cells, and Genetics (Coursera) Coursera
University of Colorado Boulder

The Little Stuff: Energy, Cells, and Genetics (Coursera)

In this course, we will explore the smaller side of biology: molecular biology. We’ll cover basic topics including cell biology and how cells can go “rogue” and turn into cancer, how energy from the sun is transferred to fuel our bodies, basics of genetics and inheritance, and genetic technologies. At the end of this course, we will discuss ethical and moral implications of several exciting and new genetic technologies.

Jun 1st 2026
4 Weeks
Data Management and Visualisation (Coursera) Coursera
Wesleyan University

Data Management and Visualisation (Coursera)

Whether being used to customize advertising to millions of website visitors or streamline inventory ordering at a small restaurant, data is becoming more integral to success. Too often, we’re not sure how use data to find answers to the questions that will make us more successful in what we do. In this course, you will discover what data is and think about what questions you have that can be answered by the data – even if you’ve never thought about data before. Based on existing data, you will learn to develop a research question, describe the variables and their relationships, calculate basic statistics, and present your results clearly.

Jun 1st 2026
4 Weeks
Python Data Representations (Coursera) Coursera
Rice University

Python Data Representations (Coursera)

This course will continue the introduction to Python programming that started with Python Programming Essentials. We'll learn about different data representations, including strings, lists, and tuples, that form the core of all Python programs. We will also teach you how to access files, which will allow you to store and retrieve data within your programs. These concepts and skills will help you to manipulate data and write more complex Python programs.

Jun 1st 2026
4 Weeks