EdX

Data Engineering Capstone Project (edX)

Offered by IBM,
Data Engineering Capstone Project (edX)

This Capstone Project is designed for you to apply and demonstrate your Data Engineering skills and knowledge in SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools and Big Data.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this Capstone you’ll demonstrate your ability to perform like a Data Engineer. Your mission is to design, implement, and manage a complete data and analytics platform consisting of relational and non-relational databases, data warehouses, data pipelines, big data processing engines, and Business Intelligence (BI) tools.
This Capstone project will require that you apply and sharpen the skills and knowledge you developed in the various courses in the IBM Data Engineering Professional Certificate and utilize multiple tools and technologies to design databases, collect data from multiple sources, extract, transform and load data into a data warehouse, and utilize a cloud-based BI tool to create analytic reports and visualizations. You will also implement predictive analytics and machine learning models using big data tools and techniques.
This capstone requires significant amount of hands-on lab effort throughout the course. You’ll exhibit your knowledge and proficiency working with Python, Bash scripts, SQL, NoSQL, RDBMSes, ETL, MySQL, PostgreSQL, Db2, MongoDB, Apache Airflow, Apache Spark, and Cognos Analytics.
Upon successfully completing this Capstone, you should have the confidence and portfolio to take on real-world data engineering projects and showcase your abilities to perform as an entry-level data engineer.
This course is part of the Data Engineering Professional Certificate.

What you'll learn

  • Build a complete data and analytics platform.
  • Setup, manage and query relational and NoSQL databases.
  • Create data pipelines and ETL processes using Apache Airflow.
  • Design and populate a star/snowflake schema data warehouse and query it using SQL.
  • Analyze warehouse data using Business Intelligence (BI) tool Cognos Analytics to create reports and dashboards.
  • Deploy a big data machine learning model using Apache Spark.
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Fundamentos TIC para profesionales de negocios: Programación (edX) EdX
Universitat Politècnica de València,UPValenciaX

Fundamentos TIC para profesionales de negocios: Programación (edX)

¿Tienes que trabajar con las Tecnologías de la Información y te faltan conocimientos? Conoce los fundamentos de la programación software. Este curso forma parte de una serie de 5 cursos de introducción al uso de sistemas de información en las empresas que te introducirá en el apasionante mundo de las TIC.

Self Paced
Self-Paced
Computational Thinking and Big Data (edX) EdX
University of Adelaide,AdelaideX

Computational Thinking and Big Data (edX)

Learn the core concepts of computational thinking and how to collect, clean and consolidate large-scale datasets. Computational thinking is an invaluable skill that can be used across every industry, as it allows you to formulate a problem and express a solution in such a way that a computer can effectively carry it out.

Self Paced
Self-Paced
Programming for Data Science (edX) EdX
University of Adelaide,AdelaideX

Programming for Data Science (edX)

Learn how to apply fundamental programming concepts, computational thinking and data analysis techniques to solve real-world data science problems. There is a rising demand for people with the skills to work with Big Data sets and this course can start you on your journey through our Big Data MicroMasters program towards a recognised credential in this highly competitive area. Using practical activities you will learn how digital technologies work and will develop your coding skills through engaging and collaborative assignments.

Self Paced
Self-Paced
Introduction to Management Information Systems (MIS): A Survival Guide (edX) EdX
Universidad Carlos III de Madrid - UC3M,UC3Mx

Introduction to Management Information Systems (MIS): A Survival Guide (edX)

Gain the skills and knowledge needed to succeed in an MIS-dominated corporate world. This MIS course will cover supporting tech infrastructures (Cloud, Databases, Big Data), the MIS development/ procurement process, and the main integrated systems, ERPs, such as SAP®, Oracle® or Microsoft Dynamics Navision®, as well as their relationship with Business Process Redesign.

Self Paced
Self-Paced
Using Python for Research (edX) EdX
HarvardX,Harvard University

Using Python for Research (edX)

Take your introductory knowledge of Python programming to the next level and learn how to use Python 3 for your research. This course bridges the gap between introductory and advanced courses in Python. While there are many excellent introductory Python courses available, most typically do not go deep enough for you to apply your Python skills to research projects.

Self Paced
Self-Paced
Python for Data Science (edX) EdX
University of California, San Diego,UC San DiegoX

Python for Data Science (edX)

Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets. In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a large dataset, would you be able to find the answers you seek?

Self Paced
Self-Paced
Big Data Analytics Using Spark (edX) EdX
University of California, San Diego,UC San DiegoX

Big Data Analytics Using Spark (edX)

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation. The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

Dec 5th 2023
5-12 Weeks