Introduction to Designing Data Lakes on AWS (Coursera)

Offered by AWS,
Introduction to Designing Data Lakes on AWS (Coursera)

In this class, Introduction to Designing Data Lakes on AWS, we will help you understand how to create and operate a data lake in a secure and scalable way, without previous knowledge of data science! Starting with the "WHY" you may want a data lake, we will look at the Data-Lake value proposition, characteristics and components.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.
Course 3 of 4 in the AWS Cloud Solutions Architect Professional Certificate.

What You Will Learn

  • Where to start with a Data Lake?
  • How to build a secure and scalable Data Lake?
  • What are the common components of a Data Lake?
  • Why do you need a Data Lake and what it's value?

Syllabus

WEEK 1
Welcome to the course! In Week 1, you'll discover why you may want a Data Lake, its characteristics and components, and how it compares to other data data scenarios, such as databases and data warehouses.

WEEK 2
In Week 2, you'll build on your knowledge of what data lakes are and why they may be a solution for your needs. You'll explore AWS services that can be used in data lake architectures, like Amazon S3, AWS Glue, Amazon Athena, Amazon Elasticsearch Service, LakeFormation, Amazon Rekognition, API Gateway and other services used for data movement, processing and visualization.

WEEK 3
In Week 3, you'll explore specifics of data cataloging and ingestion, and learn about services like AWS Transfer Family, Amazon Kinesis Data Streams, Kinesis Firehose, Kinesis Analytics, AWS Snow Family, AWS Glue Crawlers, and others. You'll also discover when is the right time to process data--before, after, or while data is being ingested. Given scenarios, you'll be able to easily identify when to process data and match the most appropriate AWS services to each scenario.

WEEK 4
In Week 4, you are going to dive deeper into data optimization and data processing. Demos around best practices will show you how to optimize your dataset for performance and cost--just by using the right tool for the job! You will also discover data security, data visualization tools, and AWS datasets you can use to experiment and get started.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Applying Data Analytics in Marketing (Coursera) Coursera
University of Illinois at Urbana-Champaign

Applying Data Analytics in Marketing (Coursera)

This course introduces students to the science of business analytics while casting a keen eye toward the artful use of numbers found in the digital space. The goal is to provide businesses and managers with the foundation needed to apply data analytics to real-world challenges they confront daily in their professional lives.

Jun 27th 2026
4 Weeks
The Data Scientist's Toolbox (Coursera) Coursera
Johns Hopkins University

The Data Scientist's Toolbox (Coursera)

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

Jun 22nd 2026
4 Weeks
Machine Learning With Big Data (Coursera) Coursera
University of California, San Diego

Machine Learning With Big Data (Coursera)

Want to make sense of the volumes of data you have collected? Need to incorporate data-driven decisions into your process? This course provides an overview of machine learning techniques to explore, analyze, and leverage data. You will be introduced to tools and algorithms you can use to create machine learning models that learn from data, and to scale those models up to big data problems.

Jun 22nd 2026
5-12 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 22nd 2026
4 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 22nd 2026
5-12 Weeks
Introduction to Machine Learning (Coursera) Coursera
Duke University

Introduction to Machine Learning (Coursera)

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 26th 2026
5-12 Weeks
Communicating Data Science Results (Coursera) Coursera
University of Washington

Communicating Data Science Results (Coursera)

Making predictions is not enough! Effective data scientists know how to explain and interpret their results, and communicate findings accurately to stakeholders to inform business decisions. Visualization is the field of research in computer science that studies effective communication of quantitative results by linking perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will learn to recognize, design, and use effective visualizations.

Jun 22nd 2026
3 Weeks
Bayesian Statistics: From Concept to Data Analysis (Coursera) Coursera
University of California, Santa Cruz

Bayesian Statistics: From Concept to Data Analysis (Coursera)

This course introduces the Bayesian approach to statistics, starting with the concept of probability and moving to the analysis of data. We will learn about the philosophy of the Bayesian approach as well as how to implement it for common types of data. We will compare the Bayesian approach to the more commonly-taught Frequentist approach, and see some of the benefits of the Bayesian approach.

Jun 22nd 2026
4 Weeks
Exploratory Data Analysis (Coursera) Coursera
Johns Hopkins University

Exploratory Data Analysis (Coursera)

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026
4 Weeks
Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 22nd 2026
4 Weeks