Understanding Data Engineering (DataCamp)

Offered by DataCamp,
Understanding Data Engineering (DataCamp)

Discover how data engineers lay the groundwork that makes data science possible. No coding involved! In 2019, the average salary for data engineers overtook data scientists. How did this happen? Companies wanting to find the gold within their data realized it wasn’t possible if they hadn’t yet built the mine. Data engineers lay the foundations that make data science possible.

Class Deals by MOOC List - Click here and see DataCamp's Active Discounts, Deals, and Promo Codes.

In this course, you’ll learn about a data engineer’s core responsibilities, how they differ from data scientists, and facilitate the flow of data through an organization. Through hands-on exercises you’ll follow Spotflix, a fictional music streaming company, to understand how their data engineers collect, clean, and catalog their data. By the end of the course, you’ll understand what your company's data engineers do, be ready to have a conversation with a data engineer, and have a solid foundation to start your own data engineer journey.

Chapter 1: What is data engineering?
In this chapter, you’ll learn what data engineering is and why demand for them is increasing. You’ll then discover where data engineering sits in relation to the data science lifecycle, how data engineers differ from data scientists, and have an introduction to your first complete data pipeline.

Chapter 2: Storing data
It’s time to talk about data storage—one of the main responsibilities for a data engineer. In this chapter, you’ll learn how data engineers manage different data structures, work in SQL—the programming language of choice for querying and storing data, and implement appropriate data storage solutions with data lakes and data warehouses.

Chapter 3: Moving and processing data
Data engineers make life easy for data scientists by preparing raw data for analysis using different processing techniques at different steps. These steps need to be combined to create pipelines, which is when automation comes into play. Finally, data engineers use parallel and cloud computing to keep pipelines flowing smoothly.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

What is Data Science? (Coursera) Coursera
IBM

What is Data Science? (Coursera)

The art of uncovering the insights and trends in data has been around since ancient times. The ancient Egyptians used census data to increase efficiency in tax collection and they accurately predicted the flooding of the Nile river every year. Since then, people working in data science have carved out a unique and distinct field for the work they do. This field is data science. In this course, we will meet some data science practitioners and we will get an overview of what data science is today.

Jun 22nd 2026
3 Weeks
Communicating Data Science Results (Coursera) Coursera
University of Washington

Communicating Data Science Results (Coursera)

Making predictions is not enough! Effective data scientists know how to explain and interpret their results, and communicate findings accurately to stakeholders to inform business decisions. Visualization is the field of research in computer science that studies effective communication of quantitative results by linking perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will learn to recognize, design, and use effective visualizations.

Jun 22nd 2026
3 Weeks
Introduction to R (DataCamp) DataCamp
DataCamp

Introduction to R (DataCamp)

Master the basics of data analysis by manipulating common data structures such as vectors, matrices, and data frames. In this introduction to R, you will master the basics of this beautiful open source language, including factors, lists and data frames. With the knowledge gained in this course, you will be ready to undertake your first very own data analysis.

Self Paced
Self-Paced
Applied Text Mining in Python (Coursera) Coursera
University of Michigan

Applied Text Mining in Python (Coursera)

This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes. The third week will apply basic natural language processing methods to text, and demonstrate how text classification is accomplished. The final week will explore more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling).

Jun 22nd 2026
4 Weeks
Practical Machine Learning (Coursera) Coursera
Johns Hopkins University

Practical Machine Learning (Coursera)

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates.

Jun 22nd 2026
4 Weeks
Introducción a Data Science: Programación Estadística con R (Coursera) Coursera
Universidad Nacional Autónoma de México

Introducción a Data Science: Programación Estadística con R (Coursera)

Este curso te proporcionará las bases del lenguaje de programación estadística R, la lengua franca de la estadística, el cual te permitirá escribir programas que lean, manipulen y analicen datos cuantitativos. Te explicaremos la instalación del lenguaje; también verás una introducción a los sistemas base de gráficos y al paquete para graficar ggplot2, para visualizar estos datos. Además también abordarás la utilización de uno de los IDEs más populares entre la comunidad de usuarios de R, llamado RStudio.

Jun 22nd 2026
4 Weeks
Introduction to Artificial Intelligence (AI) (Coursera) Coursera
IBM

Introduction to Artificial Intelligence (AI) (Coursera)

In this course you will learn what Artificial Intelligence (AI) is, explore use cases and applications of AI, understand AI concepts and terms like machine learning, deep learning and neural networks. You will be exposed to various issues and concerns surrounding AI such as ethics and bias, & jobs, and get advice from experts about learning and starting a career in AI. You will also demonstrate AI in action with a mini project.

Jun 22nd 2026
4 Weeks
Bayesian Statistics: From Concept to Data Analysis (Coursera) Coursera
University of California, Santa Cruz

Bayesian Statistics: From Concept to Data Analysis (Coursera)

This course introduces the Bayesian approach to statistics, starting with the concept of probability and moving to the analysis of data. We will learn about the philosophy of the Bayesian approach as well as how to implement it for common types of data. We will compare the Bayesian approach to the more commonly-taught Frequentist approach, and see some of the benefits of the Bayesian approach.

Jun 22nd 2026
4 Weeks
Regression Models (Coursera) Coursera
Johns Hopkins University

Regression Models (Coursera)

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models.

Jun 22nd 2026
4 Weeks
Data Manipulation at Scale: Systems and Algorithms (Coursera) Coursera
University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 22nd 2026
4 Weeks