EdX

Python and Pandas for Data Engineering (edX)

Python and Pandas for Data Engineering (edX)

Master Python essentials and Pandas for data engineering. Learn to set up development environments, manipulate data, and efficiently solve real-world problems.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this course, you'll gain the Python and Pandas skills essential for data engineering:

  • Set up version-controlled Python environments with necessary libraries
  • Write Python programs using key language features and data structures
  • Manipulate and analyze data using the powerful Pandas library
  • Explore alternative data structures like NumPy arrays and PySpark DataFrames
  • Utilize Vim, Visual Studio Code, and Git for productive development

Whether you're a beginner or have some programming experience, you'll learn to harness Python and Pandas to tackle data engineering challenges. Hands-on exercises reinforce your learning each step of the way.
This course is part of the Data Engineering Foundations Professional Certificate.

What you'll learn

  • Python environment setup and package management
  • Core Python syntax and data structures
  • Pandas DataFrames for data manipulation
  • Alternatives to Pandas for big data
  • Development with Vim, VS Code, and Git

Syllabus

Module 1: Getting Started with Python (14 hours)
\- Overview of Python, Bash and SQL Essentials for Data Engineering (video, 7 minutes)
\- Meet your Course Instructor: Kennedy Behrman (video, 0 minutes)
\- Overview of Key Concepts (video, 5 minutes)
\- Introduction to Setting Up Your Python Environment (video, 0 minutes)
\- Installing Packages with pip in Python (video, 6 minutes)
\- Saving Requirements File in Python (video, 3 minutes)
\- Creating and Using a Python Virtual Environment (video, 5 minutes)
\- Expression Statements in Python (video, 3 minutes)
\- Assignment Statements in Python (video, 5 minutes)
\- Import Statements in Python (video, 4 minutes)
\- Other Simple Statements in Python (video, 5 minutes)
\- Compound Statements in Python (video, 5 minutes)
\- If Statements in Python (video, 6 minutes)
\- While Loops in Python (video, 4 minutes)
\- Functions in Python (video, 7 minutes)
\- Key Terms (reading, 10 minutes)
\- Key Terms (reading, 10 minutes)
\- Meet your Supporting Instructors: Alfredo Deza and Noah Gift (reading, 10 minutes)
\- Course Structure and Discussion Etiquette (reading, 10 minutes)
\- Getting Started and Best Practices (reading, 10 minutes)
\- Key Terms (reading, 10 minutes)
\- Lesson Reflection (reading, 10 minutes)
\- Key Terms (reading, 10 minutes)
\- Lesson Reflection (reading, 10 minutes)
\- Key Terms (reading, 10 minutes)
\- Evaluating to True or False (reading, 10 minutes)
\- Lesson Reflection (reading, 10 minutes)
\- Python Statements (quiz, 30 minutes)
\- Assignment Statements (quiz, 30 minutes)
\- Import Statements (quiz, 30 minutes)
\- If Statements (quiz, 30 minutes)
\- While Loops (quiz, 30 minutes)
\- Quiz-Setting Up Your Python Environment (assignment, 180 minutes)
\- Meet and Greet (optional) (discussion prompt, 10 minutes)
\- Install a Package with the pip Command (ungraded lab, 60 minutes)
\- Export a Requirements File (ungraded lab, 60 minutes)
\- Create a Virtual Environment (ungraded lab, 60 minutes)
\- Practicing with Expression Statements (ungraded lab, 60 minutes)
\- Decorator Functions (ungraded lab, 60 minutes)
\- Setting up a Python Environment (ungraded lab, 60 minutes)

Module 2: Essential Python (11 hours)

  • Introduction to Python Essentials (video, 0 minutes)
  • Sequences in Python (video, 8 minutes)
  • Lists and Tuples in Python (video, 5 minutes)
  • Strings in Python (video, 10 minutes)
  • Creating Range Objects in Python (video, 2 minutes)
  • Creating Dictionaries in Python (video, 4 minutes)
  • Accessing Dictionary Data in Python (video, 3 minutes)
  • Dictionary Views in Python (video, 2 minutes)
  • Sets and Set Operations in Python (video, 6 minutes)
  • List Comprehensions in Python (video, 6 minutes)
  • Generator Expressions in Python (video, 4 minutes)
  • Generator Functions in Python (video, 7 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Essential Python Concepts (quiz, 30 minutes)
  • Sequence Operations (quiz, 30 minutes)
  • Lists and Tuples (quiz, 30 minutes)
  • Range Objects (quiz, 30 minutes)
  • Accessing Data in Dictionaries (quiz, 30 minutes)
  • Sets and Set Operations (quiz, 30 minutes)
  • List Comprehensions (quiz, 30 minutes)
  • Generator Expressions (quiz, 30 minutes)
  • Practicing with Strings in Python (ungraded lab, 60 minutes)
  • Creating Dictionaries in Python (ungraded lab, 60 minutes)
  • Dictionary Views in Python (ungraded lab, 60 minutes)
  • Comprehensions and Generators in Python (ungraded lab, 60 minutes)
  • Practicing Essential Python (ungraded lab, 60 minutes)

Module 3: Data in Python: Pandas and Alternatives (12 hours)

  • Introduction to Data in Python: Pandas and Alternatives (video, 0 minutes)
  • Creating Pandas DataFrames in Python (video, 4 minutes)
  • Investigating Data in a Pandas DataFrame (video, 6 minutes)
  • Selecting Data in a Pandas DataFrame (video, 6 minutes)
  • Manipulating Pandas DataFrames (video, 4 minutes)
  • Updating Pandas DataFrame Data (video, 5 minutes)
  • Applying Functions in a Pandas DataFrame (video, 6 minutes)
  • Creating NumPy Arrays in Python (video, 15 minutes)
  • Spark and PySpark DataFrames in Python (video, 6 minutes)
  • Creating Dask DataFrames in Python (video, 6 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Polars (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Pandas and Alternatives (quiz, 30 minutes)
  • NumPy (quiz, 30 minutes)
  • PySpark (quiz, 30 minutes)
  • Dask (quiz, 30 minutes)
  • Creating DataFrames (ungraded lab, 60 minutes)
  • Looking at Data in DataFrames (ungraded lab, 60 minutes)
  • Selecting Data in a Pandas DataFrame (ungraded lab, 60 minutes)
  • Manipulating DataFrames (ungraded lab, 60 minutes)
  • Updating Data in a DataFrame (ungraded lab, 60 minutes)
  • Applying Functions in a Pandas DataFrame (ungraded lab, 60 minutes)
  • Manipulate DataFrames with Polars to gain insights (ungraded lab, 60 minutes)
  • Pandas and Alternatives (ungraded lab, 60 minutes)

Module 4: Python Development Environments (13 hours)

  • Introduction to Python Development Environments (video, 0 minutes)
  • Introduction to Vim Normal Mode (video, 6 minutes)
  • Switching from Normal to Insert and Visual Modes in Vim (video, 4 minutes)
  • Working with the Vim Command Line (video, 6 minutes)
  • Vim Configuration (video, 3 minutes)
  • Introduction to Visual Studio Code (video, 1 minute)
  • Setting Up Visual Studio Code (video, 2 minutes)
  • Debugging Visual Studio Code (video, 3 minutes)
  • What is Version Control? (video, 3 minutes)
  • Introduction to Git and Git Concepts (video, 7 minutes)
  • Version Control with GitHub (video, 6 minutes)
  • Summary of Python and Pandas for Data Engineering (video, 0 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Key Terms (reading, 10 minutes)
  • Lesson Reflection (reading, 10 minutes)
  • Next Steps (reading, 10 minutes)
  • Cumulative Python and Pandas for Data Engineering Quiz (quiz, 45 minutes)
  • Insert and Visual Modes (quiz, 30 minutes)
  • Vim Command Line Mode (quiz, 30 minutes)
  • Features of Visual Studio Code (quiz, 30 minutes)
  • Version Control (quiz, 30 minutes)
  • Git Commands (quiz, 30 minutes)
  • Hosted Git (quiz, 30 minutes)
  • Basic Vim Commands (ungraded lab, 60 minutes)
  • Explore Visual Studio Code (ungraded lab, 60 minutes)
  • Visual Studio Code Debugger (ungraded lab, 60 minutes)
  • Setup and Provision a Python Project (ungraded lab, 60 minutes)
  • Pandas Final Challenge: Life Expectancy and Happiness (ungraded lab, 60 minutes)
  • Final Jupyter Sandbox (ungraded lab, 60 minutes)
  • Final VS Code Sandbox (ungraded lab, 60 minutes)
  • Final Sandbox Linux Desktop (ungraded lab, 60 minutes)
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Computing for Data Analysis (edX) EdX
Georgia Institute of Technology,GTx

Computing for Data Analysis (edX)

A hands-on introduction to basic programming principles and practice relevant to modern data analysis, data mining, and machine learning. The modern data analysis pipeline involves collection, preprocessing, storage, analysis, and interactive visualization of data. In the course, you’ll see how computing and mathematics come together.

Aug 19th 2024
13-24 Weeks
Understanding the World Through Data (edX) EdX
MIT,MITx

Understanding the World Through Data (edX)

Become a data explorer – learn how to leverage data and basic machine learning algorithms to understand the world. Speech recognition, drones, and self-driving cars – things that once seemed like pure science fiction – are now widely available technologies, and just a few examples of how humans have taught machines to analyze data and make decisions. In this hands-on, introductory course, you will examine all the forms in which data exists, learn tools that uncover relationships between data, and leverage basic algorithms to understand the world from a new perspective.

Mar 13th 2024
5-12 Weeks
SQL for Data Science (edX) EdX
IBM

SQL for Data Science (edX)

Learn how to use and apply the powerful language of SQL to better communicate and extract data from databases - a must for anyone working in the data science field. Much of the world's data lives in databases. SQL (or Structured Query Language) is a powerful programming language that is used for communicating with and extracting various data types from databases.

Self Paced
Self-Paced
Stochastic Processes: Data Analysis and Computer Simulation (edX) EdX
Kyoto University,KyotoUx

Stochastic Processes: Data Analysis and Computer Simulation (edX)

The course deals with how to simulate and analyze stochastic processes, in particular the dynamics of small particles diffusing in a fluid. The motion of falling leaves or small particles diffusing in a fluid is highly stochastic in nature. Therefore, such motions must be modeled as stochastic processes, for which exact predictions are no longer possible. This is in stark contrast to the deterministic motion of planets and stars, which can be perfectly predicted using celestial mechanics.

This course is archived
5-12 Weeks
Introduction to Apache Spark (edX) EdX
University of California, Berkeley

Introduction to Apache Spark (edX)

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

Not Available
Course Not Available
Programa con Python (edX) EdX
The Pontificia Universidad Javeriana,JaverianaX

Programa con Python (edX)

En este MOOC, aprenderás a desarrollar tus primeros programas en Python, un lenguaje de programación que, por su simplicidad y posibilidades, permite la creación de programas sencillos, de forma rápida y ágil. Este es tu primer paso para desarrollar habilidades básicas de programación. Hoy en día, la programación es una habilidad fundamental para el crecimiento y evolución de la tecnología. Python es uno de los lenguajes de programación que se ha popularizado para el desarrollo de software, tanto para las personas expertas como para las personas que inician su camino en la programación, debido a su accesibilidad, facilidad y usabilidad en varios entornos, en comparación con otros lenguajes.

No sessions available
5-12 Weeks
Python Data Structures (edX) EdX
University of Michigan,MichiganX

Python Data Structures (edX)

The second course in Python for Everybody explores variables that contain collections of data like string, lists, dictionaries, and tuples. Learning how to store and represent and manipulate data collections while a program is running is an important part of learning how to program.

Self Paced
Self-Paced
Introducción al desarrollo de aplicaciones web (edX) EdX
Universidad Autonoma de Madrid

Introducción al desarrollo de aplicaciones web (edX)

Aprende a desarrollar una aplicación web desde cero con diferentes tecnologías como HTML, CSS, Python, JSON, JavaScript y Ajax. Hoy en día utilizamos la web para todo tipo de tareas: buscar un vuelo, comprar entradas, ver el pronóstico meteorológico, leer noticias, etc. Todo esto es posible gracias a las aplicaciones web creadas para darnos estos servicios.

Self Paced
Self-Paced
Advanced Algorithmics and Graph Theory with Python (edX) EdX
Institut Mines-Telecom,IMTx

Advanced Algorithmics and Graph Theory with Python (edX)

Strengthen your skills in algorithmics and graph theory, and gain experience in programming in Python along the way. Algorithmics and programming are fundamental skills for engineering students, data scientists and analysts, computer hobbyists or developers. Learning how to program algorithms can be tedious if you aren’t given an opportunity to immediately practice what you learn. In this course, you won't just focus on theory or study a simple catalog of methods, procedures, and concepts. Instead, you’ll be given a challenge wherein you'll be asked to beat an algorithm we’ve written for you by coming up with your own clever solution.

Sep 4th 2023
5-12 Weeks
CS50's Introduction to Artificial Intelligence with Python (edX) EdX
HarvardX,Harvard University

CS50's Introduction to Artificial Intelligence with Python (edX)

Learn to use machine learning in Python in this introductory course on artificial intelligence. AI is transforming how we live, work, and play. By enabling new technologies like self-driving cars and recommendation systems or improving old ones like medical diagnostics and search engines, the demand for expertise in AI and machine learning is growing rapidly. This course will enable you to take the first step toward solving important real-world problems and future-proofing your career.

Self Paced
Self-Paced