Data-driven Astronomy (Coursera)

Data-driven Astronomy (Coursera)

Science is undergoing a data explosion, and astronomy is leading the way. Modern telescopes produce terabytes of data per observation, and the simulations required to model our observable Universe push supercomputers to their limits. To analyse this data scientists need to be able to think computationally to solve problems. In this course you will investigate the challenges of working with large datasets: how to implement algorithms that work; how to use databases to manage your data; and how to learn from your data with machine learning tools. The focus is on practical skills - all the activities will be done in Python 3, a modern programming language used throughout astronomy.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Regardless of whether you’re already a scientist, studying to become one, or just interested in how modern astronomy works ‘under the bonnet’, this course will help you explore astronomy: from planets, to pulsars to black holes.
Each week will also have an interview with a data-driven astronomy expert.
Note that some knowledge of Python is assumed, including variables, control structures, data structures, functions, and working with files.

Syllabus

Week 1
Thinking about data

  • Principles of computational thinking
  • Discovering pulsars in radio images

This module introduces the idea of computational thinking, and how big data can make simple problems quite challenging to solve. We use the example of calculating the median and mean stack of a set of radio astronomy images to illustrate some of the issues you encounter when working with large datasets.

Week 2
Big data makes things slow

  • How to work out the time complexity of algorithms
  • Exploring the black holes at the centres of massive galaxies

In this module we explore the idea of scaling your code. Some algorithms scale well as your dataset increases, but others become impossibly slow. We look at some of the reason for this, and use the example of cross-matching astronomical catalogues to demonstrate what kind of improvements you can make.

Week 3
Querying data using SQL

  • How to use databases to analyse your data
  • Investigating exoplanets in other solar systems

Most large astronomy projects use databases to manage their data. In this module we introduce SQL - the language most commonly used to query databases. We use SQL to query the NASA Exoplanet database and investigate the habitability of planets in other solar systems.

Week 4
Managing your data

  • How to set up databases to manage your data
  • Exploring the lifecycle of stars in our Galaxy

This module introduces the basic principles of setting up databases. We look at how to set up new tables, and then how to combine Python and SQL to get the best out of both approaches. We use these tools to explore the life of stars in a stellar cluster.

Week 5
Learning from data: regression

  • Using machine learning tools to investigate your data
  • Calculating the redshifts of distant galaxies

This module introduces the idea of machine learning. We look at standard methodology for running machine learning experiments, and then apply this to calculating redshifts of distant galaxies using decision trees for regression.

Week 6
Learning from data: classification

  • Using machine learning tools to classify your data
  • Investigating different types of galaxies

In this final module we explore the limitations of decision tree classifiers. We then look at ensemble classifiers, using the random forest algorithm to classify images of galaxies into different types.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera) Coursera
Icahn School of Medicine at Mount Sinai

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Jun 29th 2026
5-12 Weeks
Data Privacy Fundamentals (Coursera) Coursera
Northeastern University

Data Privacy Fundamentals (Coursera)

This course is designed to introduce data privacy to a wide audience and help each participant see how data privacy has evolved as a compelling concern to public and private organizations as well as individuals. In this course, you will hear from legal and technical experts and practitioners who encounter data privacy issues daily.

Jul 1st 2026
3 Weeks
Computational Thinking for K-12 Educators: Abstraction, Methods, and Lists (Coursera) Coursera
University of California, San Diego

Computational Thinking for K-12 Educators: Abstraction, Methods, and Lists (Coursera)

How do gamers cause things to happen when they hit buttons on their controller? How does the computer keep track of gamer's scores? This class teaches the concepts of nested loops, events, and variables. For each concept, we'll start by helping you connect real-world experiences you are already familiar with to the programming concept you are about to learn. Next, through a cognitively scaffolded process we'll engage you in developing your fluency with problem solving with nested loops, events, and variables in a way that keeps frustration at a minimum.

Jul 1st 2026
5-12 Weeks
Regression Models (Coursera) Coursera
Johns Hopkins University

Regression Models (Coursera)

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models.

Jun 29th 2026
4 Weeks
Database Management Essentials (Coursera) Coursera
University of Colorado System

Database Management Essentials (Coursera)

Database Management Essentials provides the foundation you need for a career in database development, data warehousing, or business intelligence, as well as for the entire Data Warehousing for Business Intelligence specialization. In this course, you will create relational databases, write SQL statements to extract information to satisfy business reporting requests, create entity relationship diagrams (ERDs) to design databases, and analyze table designs for excessive redundancy.

Jun 29th 2026
5-12 Weeks
Foundations: Data, Data, Everywhere (Coursera) Coursera
Google

Foundations: Data, Data, Everywhere (Coursera)

This is the first course in the Google Data Analytics Certificate. These courses will equip you with the skills you need to apply to introductory-level data analyst jobs. Organizations of all kinds need data analysts to help them improve their processes, identify opportunities and trends, launch new products, and make thoughtful decisions. In this course, you’ll be introduced to the world of data analytics through hands-on curriculum developed by Google. The material shared covers plenty of key data analytics topics, and it’s designed to give you an overview of what’s to come in the Google Data Analytics Certificate. Current Google data analysts will instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources.

Jun 30th 2026
5-12 Weeks
Teaching Impacts of Technology: Data Collection, Use, and Privacy (Coursera) Coursera
University of California, San Diego

Teaching Impacts of Technology: Data Collection, Use, and Privacy (Coursera)

In this course you’ll focus on how constant data collection and big data analysis have impacted us, exploring the interplay between using your data and protecting it, as well as thinking about what it could do for you in the future. This will be done through a series of paired teaching sections, exploring a specific “Impact of Computing” in your typical day and the “Technologies and Computing Concepts” that enable that impact, all at a K12-appropriate level.

Jul 1st 2026
4 Weeks
Algorithmic Toolbox (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Algorithmic Toolbox (Coursera)

The course covers basic algorithmic techniques and ideas for computational problems arising frequently in practical applications: sorting and searching, divide and conquer, greedy algorithms, dynamic programming. We will learn a lot of theory: how to sort data and how it helps for searching; how to break a large problem into pieces and solve them recursively; when it makes sense to proceed greedily; how dynamic programming is used in genomic studies. You will practice solving computational problems, designing new algorithms, and implementing solutions efficiently (so that they run in less than a second).

Jun 29th 2026
5-12 Weeks
Introduction to Big Data (Coursera) Coursera
University of California, San Diego

Introduction to Big Data (Coursera)

Interested in increasing your knowledge of the Big Data landscape? This course is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career. It provides an introduction to one of the most common frameworks, Hadoop, that has made big data analysis easier and more accessible -- increasing the potential for data to transform our world!

Jun 29th 2026
3 Weeks