Genomic Data Science and Clustering (Bioinformatics V) (Coursera)

Genomic Data Science and Clustering (Bioinformatics V) (Coursera)

How do we infer which genes orchestrate various processes in the cell? How did humans migrate out of Africa and spread around the world? In this class, we will see that these two seemingly different questions can be addressed using similar algorithmic and machine learning techniques arising from the general problem of dividing data points into distinct clusters.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

In the first half of the course, we will introduce algorithms for clustering a group of objects into a collection of clusters based on their similarity, a classic problem in data science, and see how these algorithms can be applied to gene expression data.
In the second half of the course, we will introduce another classic tool in data science called principal components analysis that can be used to preprocess multidimensional data before clustering in an effort to greatly reduce the number dimensions without losing much of the "signal" in the data.
Finally, you will learn how to apply popular bioinformatics software tools to solve a real problem in clustering.

Course 5 of 7 in the Bioinformatics Specialization.

Syllabus

WEEK 1
Introduction to Clustering Algorithms
At the beginning of the class, we will see how algorithms for clustering a set of data points will help us determine how yeast became such good wine-makers. At the bottom of this email is the Bioinformatics Cartoon for this chapter, courtesy of . How did the monkey lose a wine-drinking contest to a tiny mammal? Why have Pavel and Phillip become cavemen? And will flipping a coin help them escape their eternal boredom until they can return to the present? Start learning to find out!

WEEK 2
Advanced Clustering Techniques
This week, we will see how we can move from a "hard" assignment of points to clusters toward a "soft" assignment that allows the boundaries of the clusters to blend. We will also see how to adapt the Lloyd algorithm that we encountered in the first week in order to produce an algorithm for soft clustering. We will also see another clustering algorithm called "hierarchical clustering" that groups objects into larger and larger clusters.

WEEK 3
Introductory Algorithms in Population Genetics

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 8th 2026
4 Weeks
Tools for Data Science (Coursera) Coursera
IBM

Tools for Data Science (Coursera)

What are some of the most popular data science tools, how do you use them, and what are their features? In this course, you'll learn about Jupyter Notebooks, RStudio IDE, Apache Zeppelin and Data Science Experience. You will learn about what each tool is used for, what programming languages they can execute, their features and limitations. With the tools hosted in the cloud on Cognitive Class Labs, you will be able to test each tool and follow instructions to run simple code in Python, R or Scala.

Jun 8th 2026
4 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 8th 2026
5-12 Weeks
Machine Learning Using SAS Viya (Coursera) Coursera
SAS

Machine Learning Using SAS Viya (Coursera)

This course covers the theoretical foundation for different techniques associated with supervised machine learning models. In addition, a business case study is defined to guide participants through all steps of the analytical life cycle, from problem understanding to model deployment, through data preparation, feature selection, model training and validation, and model assessment. A series of demonstrations and exercises is used to reinforce the concepts and the analytical approach to solving business problems.

Jun 8th 2026
5-12 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 8th 2026
5-12 Weeks
Introduction to Machine Learning (Coursera) Coursera
Duke University

Introduction to Machine Learning (Coursera)

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 12th 2026
5-12 Weeks
Introducción a Data Science: Programación Estadística con R (Coursera) Coursera
Universidad Nacional Autónoma de México

Introducción a Data Science: Programación Estadística con R (Coursera)

Este curso te proporcionará las bases del lenguaje de programación estadística R, la lengua franca de la estadística, el cual te permitirá escribir programas que lean, manipulen y analicen datos cuantitativos. Te explicaremos la instalación del lenguaje; también verás una introducción a los sistemas base de gráficos y al paquete para graficar ggplot2, para visualizar estos datos. Además también abordarás la utilización de uno de los IDEs más populares entre la comunidad de usuarios de R, llamado RStudio.

Jun 8th 2026
4 Weeks
Optimizing Machine Learning Performance (Coursera) Coursera
Alberta Machine Intelligence Institute

Optimizing Machine Learning Performance (Coursera)

This course synthesizes everything your have learned in the applied machine learning specialization. You will now walk through a complete machine learning project to prepare a machine learning maintenance roadmap. You will understand and analyze how to deal with changing data. You will also be able to identify and interpret potential unintended effects in your project. You will understand and define procedures to operationalize and maintain your applied machine learning model.

Jun 8th 2026
4 Weeks
Reproducible Research (Coursera) Coursera
Johns Hopkins University

Reproducible Research (Coursera)

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations.

Jun 8th 2026
4 Weeks
Bioinformatic Methods II (Coursera) Coursera
University of Toronto

Bioinformatic Methods II (Coursera)

Large-scale biology projects such as the sequencing of the human genome and gene expression surveys using RNA-seq, microarrays and other technologies have created a wealth of data for biologists. However, the challenge facing scientists is analyzing and even accessing these data to extract useful information pertaining to the system being studied. This course focuses on employing existing bioinformatic resources – mainly web-based programs and databases – to access the wealth of data to answer questions relevant to the average biologist, and is highly hands-on.

Jun 8th 2026
5-12 Weeks
Getting started with TensorFlow 2 (Coursera) Coursera
Imperial College London

Getting started with TensorFlow 2 (Coursera)

Welcome to this course on Getting started with TensorFlow 2! In this course you will learn a complete end-to-end workflow for developing deep learning models with Tensorflow, from building, training, evaluating and predicting with models using the Sequential API, validating your models and including regularisation, implementing callbacks, and saving and loading models.

Jun 8th 2026
5-12 Weeks