Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The Library of Integrative Network-based Cellular Signatures (LINCS) is an NIH Common Fund program. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of single genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. These perturbations are applied to various types of human cells including induced pluripotent stem cells from patients, differentiated into various lineages such as neurons or cardiomyocytes. Then, to better understand the molecular networks that are affected by these perturbations, changes in level of many different variables are measured including: mRNAs, proteins, and metabolites, as well as cellular phenotypic changes such as changes in cell morphology. The BD2K-LINCS Data Coordination and Integration Center (DCIC) is commissioned to organize, analyze, visualize and integrate this data with other publicly available relevant resources.

Syllabus

WEEK 1
The Library of Integrated Network-based Cellular Signatures (LINCS) Program Overview
This module provides an overview of the concept behind the LINCS program; and tutorials on how to get started with using the LINCS L1000 dataset.
Metadata and Ontologies
This module includes a broad high level description of the concepts behind metadata and ontologies and how these are applied to LINCS datasets.
Serving Data with APIs
In this module we explain the concept of accessing data through an application programming interface (API).

WEEK 2
Bioinformatics Pipelines
This module describes the important concept of a Bioinformatics pipeline.
The Harmonizome
This module describes a project that integrates many resources that contain knowledge about genes and proteins.

WEEK 3
Data Normalization
This module describes the mathematical concepts behind data normalization.
Data Clustering
This module describes the mathematical concepts behind data clustering, or in other words unsupervised learning - the identification of patterns within data without considering the labels associated with the data.
Midterm Exam
The Midterm Exam consists of 45 multiple choice questions which covers modules 1-7. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.

WEEK 4
Enrichment Analysis
This module introduces the important concept of performing gene set enrichment analyses. Enrichment analysis is the process of querying gene sets from genomics and proteomics studies against annotated gene sets collected from prior biological knowledge.
Machine Learning
This module describes the mathematical concepts of supervised machine learning, the process of making predictions from examples that associate observations/features/attribute with one or more properties that we wish to learn/predict.

WEEK 5
Benchmarking
This module discusses how Bioinformatics pipelines can be compared and evaluated.
Interactive Data Visualization
This module provides programming examples on how to get started with creating interactive web-based data visualization elements/figures.

WEEK 6
Crowdsourcing Projects
This final module describes opportunities to work on LINCS related projects that go beyond the course.

WEEK 7
Final Exam
The Final Exam consists of 60 multiple choice questions which covers all of the modules of the course. Some of the questions may require you to perform some analysis with the methods you learned throughout the course on new datasets.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Cryptographic Hash and Integrity Protection (Coursera) Coursera
University of Colorado System

Cryptographic Hash and Integrity Protection (Coursera)

This course reviews cryptographic hash functions in general and their use in the forms of hash chain and hash tree (Merkle tree). Building on hash functions, the course describes message authentication focusing on message authentication code (MAC) based on symmetric keys. We then discuss digital signatures based on asymmetric cryptography, providing security properties such as non-repudiation which were unavailable in symmetric-cryptography-based message authentication.

Jun 22nd 2026
4 Weeks
Interfacing with the Raspberry Pi (Coursera) Coursera
University of California, Irvine

Interfacing with the Raspberry Pi (Coursera)

The Raspberry Pi uses a variety of input/output devices based on protocols such as HDMI, USB, and Ethernet to communicate with the outside world. In this class you will learn how to use these protocols with other external devices (sensors, motors, GPS, orientation, LCD screens etc.) to get your IoT device to interact with the real world.

Jun 22nd 2026
4 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 22nd 2026
4 Weeks
Dino 101: Dinosaur Paleobiology (Coursera) Coursera
University of Alberta

Dino 101: Dinosaur Paleobiology (Coursera)

Dino 101: Dinosaur Paleobiology is a 12-lesson course teaching a comprehensive overview of non-avian dinosaurs. Topics covered: anatomy, eating, locomotion, growth, environmental and behavioral adaptations, origins and extinction. Lessons are delivered from museums, fossil-preparation labs and dig sites. Estimated workload: 3-5 hrs/week.

Jun 27th 2026
5-12 Weeks
Ecology: Ecosystem Dynamics and Conservation (Coursera) Coursera
American Museum of Natural History,Howard Hughes Medical Institute

Ecology: Ecosystem Dynamics and Conservation (Coursera)

This course is an introduction to ecology and ecosystem dynamics using a systems thinking lens. Through a case study on Mozambique's Gorongosa National Park, learners will explore how scientists study ecosystems, and investigate the complex array of factors that inform management efforts.

Jun 22nd 2026
5-12 Weeks
Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera) Coursera
University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Jun 22nd 2026
4 Weeks
Reproducible Templates for Analysis and Dissemination (Coursera) Coursera
Emory University

Reproducible Templates for Analysis and Dissemination (Coursera)

This course will assist you with recreating work that a previous coworker completed, revisiting a project you abandoned some time ago, or simply reproducing a document with a consistent format and workflow. Incomplete information about how the work was done, where the files are, and which is the most recent version can give rise to many complications.

Jun 22nd 2026
5-12 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026
5-12 Weeks
Finding Hidden Messages in DNA (Bioinformatics I) (Coursera) Coursera
University of California, San Diego

Finding Hidden Messages in DNA (Bioinformatics I) (Coursera)

This course begins a series of classes illustrating the power of computing in modern biology. Please join us on the frontier of bioinformatics to look for hidden messages in DNA without ever needing to put on a lab coat. In the first half of the course, we investigate DNA replication, and ask the question, where in the genome does DNA replication begin? We will see that we can answer this question for many bacteria using only some straightforward algorithms to look for hidden messages in the genome.

Jun 22nd 2026
5-12 Weeks
Genome Sequencing (Bioinformatics II) (Coursera) Coursera
University of California, San Diego

Genome Sequencing (Bioinformatics II) (Coursera)

You may have heard a lot about genome sequencing and its potential to usher in an era of personalized medicine, but what does it mean to sequence a genome? Biologists still cannot read the nucleotides of an entire genome as you would read a book from beginning to end. However, they can read short pieces of DNA. In this course, we will see how graph theory can be used to assemble genomes from these short pieces. We will further learn about brute force algorithms and apply them to sequencing mini-proteins called antibiotics.

Jun 22nd 2026
5-12 Weeks
Feeding the World (Coursera) Coursera
University of Pennsylvania

Feeding the World (Coursera)

This course will explore the concepts driving current food production science (population growth, urbanization, emerging affluence, resource constraints, and underlying biological limits) with the main focus on livestock production. Each of the major food animal species (dairy, swine, beef, and poultry) will be covered in terms of their universal life cycles, constraints to production and emerging societal issues.

Jun 22nd 2026
5-12 Weeks