EdX

Serverless Data Processing with Dataflow: Operations (edX)

Offered by Google Cloud,
Serverless Data Processing with Dataflow: Operations (edX)

In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

We will then review testing, deployment, and reliability best practices for Dataflow pipelines. We will conclude with a review of Templates, which makes it easy to scale Dataflow pipelines to organizations with hundreds of users. These lessons will help ensure that your data platform is stable and resilient to unanticipated circumstances.
This course is part of the Google Cloud Data Engineer Learning Path Professional Certificate.

What you'll learn

  • Perform monitoring, troubleshooting, testing and CI/CD on Dataflow pipelines.
  • Deploy Dataflow pipelines with reliability in mind to maximize stability for your data processing platform.

Syllabus

  1. Introduction

This module covers the course outline

  1. Monitoring

In this module, we learn how to use the Jobs List page to filter for jobs that we want to monitor or investigate. We look at how the Job Graph, Job Info, and Job Metrics tabs collectively provide a comprehensive summary of your Dataflow job. Lastly, we learn how we can use Dataflow’s integration with Metrics Explorer to create alerting policies for Dataflow metrics.

  1. Logging and Error Reporting

In this module, we learn how to use the Log panel at the bottom of both the Job Graph and Job Metrics pages, and learn about the centralized Error Reporting page.

  1. Troubleshooting and Debug

In this module, we learn how to troubleshoot and debug Dataflow pipelines. We will also review the four common modes of failure for Dataflow: failure to build the pipeline, failure to start the pipeline on Dataflow, failure during pipeline execution, and performance issues.

  1. Performance

In this module, we will discuss performance considerations we should be aware of while developing batch and streaming pipelines in Dataflow.

  1. Testing and CI/CD

This module will discuss unit testing your Dataflow pipelines. We also introduce frameworks and features available to streamline your CI/CD workflow for Dataflow pipelines.

  1. Reliability

In this module we will discuss methods for building systems that are resilient to corrupted data and data center outages.

  1. Flex Templates

This module covers Flex Templates, a feature that helps data engineering teams standardize and reuse Dataflow pipeline code. Many operational challenges can be solved with Flex Templates.

  1. Summary

This module reviews the topics covered in the course.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning Operations 2 (MLOps2-GCP): Data Pipeline Automation & Optimization using Google Cloud Platform (GCP) (edX) EdX
Statistics.comX,Statistics.com

Machine Learning Operations 2 (MLOps2-GCP): Data Pipeline Automation & Optimization using Google Cloud Platform (GCP) (edX)

Most data science projects fail. There are various reasons why, but one of the primary reasons is the challenge of deployment. One piece to the deployment puzzle is understanding how to automate your pipeline’s functions and continuously optimize its performance, which is why we developed this course, MLOp2s: Data Pipeline Automation & Optimization using Google Cloud Platform (GCP).

Self Paced
Self-Paced
Automated Software Testing: Model and State-based Testing (edX) EdX
Delft University of Technology,DelftX

Automated Software Testing: Model and State-based Testing (edX)

Learn the advanced software testing techniques, tools, and best practices required to deliver high-quality software. Software testing gets a bad rap for being difficult, time-consuming, redundant, and above all - boring. But in fact, it is a proven way to ensure that your software will work flawlessly and can meet release schedules.

Self Paced
Self-Paced
Hacking PostgreSQL: Data Access Methods (edX) EdX
Ural Federal University,UrFUx

Hacking PostgreSQL: Data Access Methods (edX)

Learn the science, engineering practices and hacking techniques of data access – core aspects of information processing in a database. This course is about data storage and data processing technologies with examples from PostgreSQL. It is geared toward database core developers, operation systems developers, system architects, and all those who want to understand databases in more detail.

No sessions available
13-24 Weeks
Fundamentos y Herramientas de DevOps (edX) EdX
Universidad Anáhuac,AnahuacX

Fundamentos y Herramientas de DevOps (edX)

Sé un elemento crucial para la empresa conociendo los pilares de DevOps para asegurar la integración y entrega continua de software. Aprende a usar diferentes comandos de Linux/Unix que son esenciales para que puedas eficazmente administrar aplicaciones desde la línea de comandos. Así como las bases de la gestión de código fuente a través del uso de la herramienta Git y Github.

Self Paced
Self-Paced
Microservices and Serverless (edX) EdX
IBM

Microservices and Serverless (edX)

Design, develop, deploy, manage and secure applications and solutions on public, private or hybrid cloud platforms. This course will introduce you to 12-factor apps and microservices, concepts that emerged to help organizations work better and faster in a cloud-native manner. You’ll then learn about serverless computing—how it works, what value it brings, and what are specific serverless technologies. You’ll get hands-on with IBM Cloud Functions, a serverless platform on IBM Cloud that lets you develop serverless apps with ease. Finally, you will learn to build and deploy applications using container images on the code engine.

Self Paced
Self-Paced
Introduction to Serverless on Kubernetes (edX) EdX
Linux Foundation,LinuxFoundationX

Introduction to Serverless on Kubernetes (edX)

Learn how to build serverless functions that can be run on any cloud, without being restricted by limits on the execution duration, languages available, or the size of your code. With the advent of systems like AWS Lambda, the term serverless gained much popularity. However, many people are still unsure what it is for, and how it can help them build applications faster than traditional approaches. Other potential users are turned off by the arbitrary limits and lock-in of cloud-based serverless products.

Self Paced
Self-Paced
Synthetic Aperture Radar: Ecosystems (edX) EdX
University of Alaska Fairbanks,AlaskaX

Synthetic Aperture Radar: Ecosystems (edX)

This course will introduce the contributions of Synthetic Aperture Radar (SAR) remote sensing to the monitoring of Earth’s ecosystems. Learn how the weather-independence of SAR combined with its ability to penetrate into vegetation canopies make SAR an excellent information source to characterize vegetation structure, measure above-ground biomass, and analyze the change of vegetation long term and throughout the seasons.

Self Paced
Self-Paced
Building Microservice Platforms with TARS (edX) EdX
Linux Foundation,LinuxFoundationX

Building Microservice Platforms with TARS (edX)

Are you interested in microservices? Don’t miss out on TARS! Get an in-depth primer on the powerful TARS framework for building your microservice platform. This course is an introduction to microservices and the TARS framework for beginners. TARS is a new generation distributed microservice applications framework designed to support multiple programming languages, including C++, Golang, Java, Node.js, PHP, and Python, which allows developers and enterprises to quickly build stable and reliable applications that run at scale.

Self Paced
Self-Paced