EdX

Introduction to Designing Data Lakes on AWS (edX)

Offered by AWS,
Introduction to Designing Data Lakes on AWS (edX)

In this class, we will help you understand how to create and operate a data lake in a secure and scalable way, without previous knowledge of data science! Designing a data lake is challenging because of the scale and growth of data. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale.
This course is part of the Cloud Solutions Architecture Professional Certificate.

What you'll learn

  • Where to start with a Data Lake?
  • How to build a secure and scalable Data Lake?
  • What are the common components of a Data Lake?
  • Why do you need a Data Lake and what it's value?

Syllabus

Week 1: Hello World, I mean, Hello Data Lakes!
Video: Meet the Instructors
Video: Introduction to Week 1
Video: Why Data Lakes?
Video: Characteristics of a Data Lake
Video: Data Lake Components
Reading: Data Lake Characteristics and Components
Video: Comparison of a Data Lake to a Data Warehouse
Reading: Data Lakes and Data Warehouses
Video: Discussing sample Data Lake Architectures
Quiz/Assessment: Week 1 quiz

Week 2: AWS data related services
Video: Introduction to Week 2
Video: AWS Data Lake related services
Video: Amazon S3
Video: AWS Glue Data Catalog
Reading: S3 and Glue Data Catalog
Video: AWS Services used for data movement
Reading: Kinesis, API Gateway, etc
Video: AWS Services for Data processing
Video: AWS Services for Analytics
Video: AWS Services used for Predictive Analytics and Machine Learning
Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
Video: Introduction to AWS LakeFormation
Reading: LakeFormation
Lab: Get familiar with AWS Services and create your first simple data lake

Week 3: Ingesting the rivers
Video: Introduction to Week 3
Video: Use the right tool for the job
Video: Understanding Data Structure and when to process data
Video: Data Streaming ingestion with Amazon Kinesis Services
Video: Diving Deep on Amazon Kinesis
Demo: Batch Data Ingestion with AWS Transfer Family
Reading: Batch Data Ingestion with AWS Services
Video: Data Cataloging
Demo: Using Glue Crawlers
Reading: The importance of data cataloging
Video: Reviewing the ingestion part of some Data Lake architectures
Lab: Ingesting Web Logs

Week 4: Processing and Analyzing data that sits in the Data Lake
Video: Introduction to Week 4
Video: Data prep and AWS Glue jobs
Video: File optimizations
Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
Video: Introduction to Data Lake security
Reading: Security and compliance
Video: The power of data visualization
Video: Introduction to Amazon QuickSight
Demo: Amazon Quicksight
Reading: Data visualization, Amazon QuickSight
Video: Registry of Open Data on AWS
Lab: Create an end-to-end Data Lake with AWS Services
Video: Course wrap-up!

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Architecting Solutions on AWS (edX) EdX
AWS

Architecting Solutions on AWS (edX)

Are you looking to get more technical? Are you looking to begin working in the cloud, but don’t know where to go next? Are you puzzled how to match a customer’s requirements with the right AWS services/solutions? If so, you are in the right place!! You’ll learn how to plan, think, and act like a Solution Architect in a real-life customer scenario.

Self Paced
Self-Paced
The Power of Data (edX) EdX
Rolls Royce

The Power of Data (edX)

'The power of data’ is an interactive introduction to data and why it matters, with a focus on data analytics. In under an hour, you'll discover what data is and how it improves our world, through a series of animations, case studies and tips. You’ll learn that to get the most from your data, you need to go on a journey, gaining insight along the way. This course highlights key factors relating to collecting, sharing, analysing and generating value from data.

No sessions available
2 Weeks
Foundations of Data Analytics (edX) EdX
The Hong Kong University of Science and Technology - HKUST,HKUSTx

Foundations of Data Analytics (edX)

Learn the fundamental techniques for data analytics and to be prepared for learning and applying more advanced big data technologies. Foundations of Data Analytics: This course will provide fundamental techniques for data analytics, including data collection, data extraction, data integration, data cleansing, and basic machine learning techniques.

Self Paced
Self-Paced
Introduction to AWS Identity and Access Management (edX) EdX
AWS

Introduction to AWS Identity and Access Management (edX)

This course will focus on one of the key security services, AWS Identity and Access Management (IAM). It is meant to provide learners with an introduction to and some deeper level content on AWS IAM. Security should be your first priority when developing cloud native applications. The goal of this course is to provide you with foundational knowledge and skills that will enable you to grow in your use of both AWS IAM and the rest of the AWS ecosystem.

Self Paced
Self-Paced
Data Analytics in Accounting and Finance (edX) EdX
The Hong Kong Polytechnic University,HKPolyUx

Data Analytics in Accounting and Finance (edX)

Understand key concepts in the data analytics process, along with the practical application of data sets in the accounting and finance context. This data analytics course takes an interdisciplinary approach to describe the data analytics process in the context of accounting and finance. The growing volume of both structured and unstructured data has pushed forward a more data-driven form of decision making in accounting and finance. To get along with the advancements, accountants and finance professionals need to have an analytics mindset to excel in their jobs.

Self Paced
Self-Paced
Building Modern Nodejs Applications on AWS (edX) EdX
AWS

Building Modern Nodejs Applications on AWS (edX)

In this course, we will be covering how to build a modern, greenfield serverless backend on AWS. In modern cloud native application development, it’s often times the goal to build out serverlessarchitectures that are scalable, are highly available, and are fully managed. This mean, less operational overhead for you and your business, and more focusing on the applications and business specific projects that differentiate you in your marketplace. In this course, we will be covering how to build a modern, greenfield serverless backend on AWS.

Self Paced
Self-Paced
Machine Learning Operations 2 (MLOps2-AWS): Data Pipeline Automation & Optimization using Amazon Web Services (AWS) (edX) EdX
Statistics.comX,Statistics.com

Machine Learning Operations 2 (MLOps2-AWS): Data Pipeline Automation & Optimization using Amazon Web Services (AWS) (edX)

Most data science projects fail. There are various reasons why, but one of the primary reasons is the challenge of deployment. One piece to the deployment puzzle is understanding how to automate your pipeline’s functions and continuously optimize its performance, which is why we developed this course - MLOp2s: Data Pipeline Automation & Optimization using Amazon Web Services (AWS).

Self Paced
Self-Paced