Data Engineering Capstone Project (Coursera)

Offered by IBM,
Data Engineering Capstone Project (Coursera)

In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate. You will assume the role of a Junior Data Engineer who has recently joined the organization and be presented with a real-world use case that requires a data engineering solution.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Course 13 of 13 in the IBM Data Engineering Professional Certificate.

Syllabus

WEEK 1
Data Platform Architecture and OLTP Database
In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.

WEEK 2
Querying Data in NoSQL Databases
In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.

WEEK 3
Build a Data Warehouse
In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.

WEEK 4
Data Analytics
In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.

WEEK 5
ETL & Data Pipelines
In this module, you will use the given python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.

WEEK 6
Big Data Analytics with Spark
In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.

WEEK 7
Final Submission and Peer Review
In this final module you will complete your submission of screenshots from the hands-on labs for your peers to review. Once you have completed your submission you will then review the submission of one of your peers and grade their submission.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Applying Data Analytics in Marketing (Coursera) Coursera
University of Illinois at Urbana-Champaign

Applying Data Analytics in Marketing (Coursera)

This course introduces students to the science of business analytics while casting a keen eye toward the artful use of numbers found in the digital space. The goal is to provide businesses and managers with the foundation needed to apply data analytics to real-world challenges they confront daily in their professional lives.

Jun 27th 2026
4 Weeks
Distributed Programming in Java (Coursera) Coursera
Rice University

Distributed Programming in Java (Coursera)

This course teaches learners (industry professionals and students) the fundamental concepts of Distributed Programming in the context of Java 8. Distributed programming enables developers to use multiple nodes in a data center to increase throughput and/or reduce latency of selected applications. By the end of this course, you will learn how to use popular distributed programming frameworks for Java programs, including Hadoop, Spark, Sockets, Remote Method Invocation (RMI), Multicast Sockets, Kafka, Message Passing Interface (MPI), as well as different approaches to combine distribution with multithreading.

Jun 22nd 2026
4 Weeks
Introduction to Business Analytics with R (Coursera) Coursera
University of Illinois at Urbana-Champaign

Introduction to Business Analytics with R (Coursera)

Nearly every aspect of business is affected by data analytics. There are many powerful tools that can quickly process large amounts of data. For businesses to capitalize on data analytics, they need leaders who understand the data analytic process. Even more valuable are leaders who know how to analyze big data. This course addresses the human skills gap by providing a foundational set of data analytic skills that can be applied to many business settings.

Jun 22nd 2026
4 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026
5-12 Weeks
Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera) Coursera
University of California, San Diego

Comparing Genes, Proteins, and Genomes (Bioinformatics III) (Coursera)

Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different. In the first half of the course, we will compare two short biological sequences, such as genes (i.e., short sequences of DNA) or proteins. We will encounter a powerful algorithmic tool called dynamic programming that will help us determine the number of mutations that have separated the two genes/proteins.

Jun 22nd 2026
5-12 Weeks
Machine Learning: Regression (Coursera) Coursera
University of Washington

Machine Learning: Regression (Coursera)

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 22nd 2026
5-12 Weeks
Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 22nd 2026
5-12 Weeks
Delivery Problem (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Delivery Problem (Coursera)

We’ll implement (in Python) together efficient programs for a problem needed by delivery companies all over the world millions times per day — the travelling salesman problem. The goal in this problem is to visit all the given places as quickly as possible. How to find an optimal solution to this problem quickly? We still don’t have provably efficient algorithms for this difficult computational problem and this is the essence of the P versus NP problem, the most important open question in Computer Science.

Jun 22nd 2026
3 Weeks
Crash Course on Python (Coursera) Coursera
Google

Crash Course on Python (Coursera)

This course is designed to teach you the foundations in order to write simple programs in Python using the most common structures. No previous exposure to programming is needed. By the end of this course, you'll understand the benefits of programming in IT roles; be able to write simple programs using Python; figure out how the building blocks of programming fit together; and combine all of this knowledge to solve a complex programming problem.

Jun 23rd 2026
5-12 Weeks