EdX

Serverless Data Processing with Dataflow: Foundations (edX)

Offered by Google Cloud,

This course is part 1 of a 3-course series on Serverless Data Processing with Dataflow. This course is part 1 of a 3-course series on Serverless Data Processing with Dataflow. In this first course, we start with a refresher of what Apache Beam is and its relationship with Dataflow.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Next, we talk about the Apache Beam vision and the benefits of the Beam Portability framework. The Beam Portability framework achieves the vision that a developer can use their favorite programming language with their preferred execution backend. We then show you how Dataflow allows you to separate compute and storage while saving money, and how identity, access, and management tools interact with your Dataflow pipelines. Lastly, we look at how to implement the right security model for your use case on Dataflow.
This course is part of the Google Cloud Data Engineer Learning Path Professional Certificate.

What you'll learn

Demonstrate how Apache Beam and Cloud Dataflow work together to fulfill your organization’s data processing needs
Summarize the benefits of the Beam Portability Framework and enable it for your Dataflow pipelines
Enable Shuffle & Streaming Engine for batch & streaming pipelines respectively for maximum performance
Enable Flexible Resource Scheduling for more cost efficient performance
Select the right combination of IAM permissions for your Dataflow job
Implement best practices for a secure data processing environment

Syllabus

Introduction

This module covers the course outline and does a quick refresh on the Apache Beam programming model and Google’s Dataflow managed service.

Beam Portability

In this module we are going to learn about four sections, Beam Portability, Runner v2, Container Environments, and Cross-Language Transforms.

Separating Compute and Storage with Dataflow

IIn this module we discuss how to separate compute and storage with Dataflow. This module contains four sections Dataflow, Dataflow Shuffle Service, Dataflow Streaming Engine, Flexible Resource Scheduling.

IAM, Quotas, and Permissions

In this module, we talk about the different IAM roles, quotas, and permissions required to run Dataflow.

Security

In this module, we will look at how to implement the right security model for your use case on Dataflow.

Summary

In this course, we started with the refresher of what Apache Beam is, and its relationship with Dataflow.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Google Cloud

Building Resilient Streaming Systems on GCP em Português Brasileiro (Coursera)

Statistics & Data Analysis

Este curso rápido sob demanda tem uma semana de duração e é baseado no Google Cloud Platform Big Data and Machine Learning Fundamentals. Por meio de videoaulas, demonstrações e laboratórios práticos, os participantes aprenderão a criar pipelines de dados de streaming usando o Google Cloud Pub/Sub e o Dataflow para a tomada de decisões em tempo real. Você também aprenderá a criar painéis para renderizar respostas personalizadas para vários tipos de público das partes interessadas.

Jun 15th 2026

1 Week

GCP Analytics Google Cloud Platform

EdX

IBM

Microservices and Serverless (edX)

Computer Science

Design, develop, deploy, manage and secure applications and solutions on public, private or hybrid cloud platforms. This course will introduce you to 12-factor apps and microservices, concepts that emerged to help organizations work better and faster in a cloud-native manner. You’ll then learn about serverless computing—how it works, what value it brings, and what are specific serverless technologies. You’ll get hands-on with IBM Cloud Functions, a serverless platform on IBM Cloud that lets you develop serverless apps with ease. Finally, you will learn to build and deploy applications using container images on the code engine.

Self Paced

Self-Paced

Cloud Computing REST APIs GraphQL

Hacking PostgreSQL: Data Access Methods (edX)

EdX

Ural Federal University,UrFUx

Hacking PostgreSQL: Data Access Methods (edX)

CS: Software Engineering Statistics & Data Analysis

Learn the science, engineering practices and hacking techniques of data access – core aspects of information processing in a database. This course is about data storage and data processing technologies with examples from PostgreSQL. It is geared toward database core developers, operation systems developers, system architects, and all those who want to understand databases in more detail.

No sessions available

13-24 Weeks

Algorithms Hacking Data Access

Coursera

MathWorks

Predictive Modeling and Machine Learning with MATLAB (Coursera)

Data Science

In this course, you will build on the skills learned in Exploratory Data Analysis with MATLAB and Data Processing and Feature Engineering with MATLAB to increase your ability to harness the power of MATLAB to analyze data relevant to the work you do. These skills are valuable for those who have domain knowledge and some exposure to computational tools, but no programming background.

Jun 22nd 2026

4 Weeks

ML MATLAB Machine Learning

Introduction to Serverless on Kubernetes (edX)

EdX

Linux Foundation,LinuxFoundationX

Introduction to Serverless on Kubernetes (edX)

Computer Science

Learn how to build serverless functions that can be run on any cloud, without being restricted by limits on the execution duration, languages available, or the size of your code. With the advent of systems like AWS Lambda, the term serverless gained much popularity. However, many people are still unsure what it is for, and how it can help them build applications faster than traditional approaches. Other potential users are turned off by the arbitrary limits and lock-in of cloud-based serverless products.

Self Paced

Self-Paced

Python Kubernetes Serverless

Data Processing and Analysis with Excel (edX)

EdX

Rochester Institute of Technology,RITx

Data Processing and Analysis with Excel (edX)

Statistics & Data Analysis

Learn to use Excel to organize and clean data so it can be manipulated and analyzed. In this course, you will learn how to organize your data within the Microsoft Office Excel software tool. Once organized, we will discuss data cleaning. You will learn how to identify outliers and anomalies in the data, and how to identify and change data-types. Together we will develop a data analysis plan, after which we will apply analysis methods and tools, including exploratory analysis, evaluation of results, and comparison with other findings.

Self Paced

Self-Paced

Excel Data Analysis Microsoft Excel

How Computers Work: Demystifying Computation (edX)

EdX

Raspberry Pi Foundation

How Computers Work: Demystifying Computation (edX)

Computer Science

Explore the fundamentals of computing: computer architecture, binary logic, data processing, circuits & more. On this course, you’ll gain an understanding of how computers work at a fundamental level.

Self Paced

Self-Paced

Computing Logic Von Neumann Architecture

Coursera

University of California, San Diego

Basic Data Processing and Visualization (Coursera)

Statistics & Data Analysis Data Science

This is the first course in the four-course specialization Python Data Products for Predictive Analytics, introducing the basics of reading and manipulating datasets in Python. In this course, you will learn what a data product is and go through several Python libraries to perform data retrieval, processing, and visualization.

Jun 22nd 2026

5-12 Weeks

Python Data Visualization Data Processing

Coursera

University of Michigan

Data Collection and Processing with Python (Coursera)

CS: Software Engineering CS: Programming

This course teaches you to fetch and process data from services on the Internet. It covers Python list comprehensions and provides opportunities to practice extracting from and processing deeply nested data. You'll also learn how to use the Python requests module to interact with REST APIs and what to look for in documentation of those APIs. For the final project, you will construct a “tag recommender” for the flickr photo sharing site.

Jun 15th 2026

3 Weeks

Python APIs Data Collection

Serverless Data Processing with Dataflow: Operations (edX)

EdX

Google Cloud

Serverless Data Processing with Dataflow: Operations (edX)

Computer Science

In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance.

Self Paced

Self-Paced

Testing Performance Monitoring

Coursera

Google Cloud

Serverless Data Processing with Dataflow: Operations (Coursera)

Statistics & Data Analysis

In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance. We will then review testing, deployment, and reliability best practices for Dataflow pipelines.

Jun 22nd 2026

2 Weeks

Testing Monitoring Debug

Serverless Data Processing with Dataflow: Develop Pipelines (edX)

EdX

Google Cloud

Serverless Data Processing with Dataflow: Develop Pipelines (edX)

Computer Science

In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts.

Self Paced

Self-Paced

Data Processing Data Pipelines Dataflow