Data Processing and Manipulation (Coursera)

Data Processing and Manipulation (Coursera)

The "Data Processing and Manipulation" course provides students with a comprehensive understanding of various data processing and manipulation concepts and tools. Participants will learn how to handle missing values, detect outliers, perform sampling and dimension reduction, apply scaling and discretization techniques, and explore data cube and pivot table operations. This course equips students with essential skills for efficiently preparing and transforming data for analysis and decision-making.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Learning Objectives:

  1. Understand the importance of data processing and manipulation in the data analysis pipeline.
  2. Learn techniques to handle missing values in datasets, including imputation and exclusion strategies.
  3. Identify and detect outliers to assess their impact on data analysis and decision-making.
  4. Explore sampling methods and dimension reduction techniques for large datasets and high-dimensional data.
  5. Apply data scaling techniques to normalize and standardize variables for meaningful comparisons.
  6. Utilize discretization to transform continuous data into categorical representations, simplifying analysis.
  7. Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.
  8. Create pivot tables to summarize and reshape data, gaining valuable insights from complex datasets.

Throughout the course, students will actively engage in practical exercises and projects, allowing them to apply data processing and manipulation techniques to real-world datasets. By the end of the course, participants will be well-equipped to effectively prepare, clean, and transform data for subsequent analysis tasks and data-driven decision-making.
This course is part of the Data Wrangling with Python Specialization.

What you'll learn

  • Understand the importance of data processing and manipulation in the data analysis pipeline.
  • Learn techniques to handle missing values and outliers, data reduction, and data scaling and discretization.
  • Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.

Syllabus

Missing Values and Outliers
Module 1
The "Missing Values and Outliers" week focuses on how to handle missing values and detect outliers using the Pandas library. You will learn essential techniques to identify and address missing data effectively, as well as methods to detect and manage outliers in datasets.

Data Reduction
Module 2
The "Data Reduction" week focuses on how to reduce data through sampling and dimensionality reduction using the Pandas library. You will learn essential techniques to obtain manageable subsets of data while preserving meaningful information for analysis and visualization.

Scaling and Discretization
Module 3
The "Scaling and Discretization" week focuses on the importance of data scaling and discretization in the data preprocessing process. You will learn why and how to perform data scaling to normalize variables and handle data with different scales. Additionally, you will explore the concept of data discretization and its application in transforming continuous data into categorical representations.

Data Warehouse
Module 4
The "Data Warehouse" week focuses on the concepts and methodologies of organizing data using data cubes and pivot tables in Pandas. You will learn the importance of data warehousing for efficient data management and analysis, as well as how to construct data cubes and pivot tables to facilitate multidimensional data exploration.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Python for Data Science, AI & Development (Coursera) Coursera
IBM

Python for Data Science, AI & Development (Coursera)

Kickstart your learning of Python for data science, as well as programming in general, with this beginner-friendly introduction to Python. Python is one of the world’s most popular programming languages, and there has never been greater demand for professionals with the ability to apply Python fundamentals to drive business solutions across industries.

Jun 23rd 2026
5-12 Weeks
Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 22nd 2026
5-12 Weeks
Data Warehouse Concepts, Design, and Data Integration (Coursera) Coursera
University of Colorado System

Data Warehouse Concepts, Design, and Data Integration (Coursera)

This is the second course in the Data Warehousing for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will learn exciting concepts and skills for designing data warehouses and creating data integration workflows. These are fundamental skills for data warehouse developers and administrators. You will have hands-on experience for data warehouse design and use open source products for manipulating pivot tables and creating data integration workflows.

Jun 22nd 2026
5-12 Weeks
Fitting Statistical Models to Data with Python (Coursera) Coursera
University of Michigan

Fitting Statistical Models to Data with Python (Coursera)

In this course, we will expand our exploration of statistical inference techniques by focusing on the science and art of fitting statistical models to data. We will build on the concepts presented in the Statistical Inference course (Course 2) to emphasize the importance of connecting research questions to our data analysis methods. We will also focus on various modeling objectives, including making inference about relationships between variables and generating predictions for future observations.

Jun 22nd 2026
4 Weeks
Python Classes and Inheritance (Coursera) Coursera
University of Michigan

Python Classes and Inheritance (Coursera)

This course introduces classes, instances, and inheritance. You will learn how to use classes to represent data in concise and natural ways. You'll also learn how to override built-in methods and how to create "inherited" classes that reuse functionality. You'll also learn about how to design classes. Finally, you will be introduced to the good programming habit of writing automated tests for their own code.

Jun 22nd 2026
3 Weeks
Accounting Data Analytics with Python (Coursera) Coursera
University of Illinois at Urbana-Champaign

Accounting Data Analytics with Python (Coursera)

This course focuses on developing Python skills for assembling business data. It will cover some of the same material from Introduction to Accounting Data Analytics and Visualization, but in a more general purpose programming environment (Jupyter Notebook for Python), rather than in Excel and the Visual Basic Editor. These concepts are taught within the context of one or more accounting data domains (e.g., financial statement data from EDGAR, stock data, loan data, point-of-sale data).

Jun 22nd 2026
5-12 Weeks
Foundations for Big Data Analysis with SQL (Coursera) Coursera
Cloudera

Foundations for Big Data Analysis with SQL (Coursera)

In this course, you'll get a big-picture view of using SQL for big data, starting with an overview of data, database systems, and the common querying language (SQL). Then you'll learn the characteristics of big data and SQL tools for working on big data platforms. You'll also install an exercise environment (virtual machine) to be used through the specialization courses, and you'll have an opportunity to do some initial exploration of databases and tables in that environment.

Jun 22nd 2026
5-12 Weeks
Machine Learning with Python (Coursera) Coursera
IBM

Machine Learning with Python (Coursera)

This course dives into the basics of machine learning using an approachable, and well-known programming language, Python. In this course, we will be reviewing two main components: First, you will be learning about the purpose of Machine Learning and where it applies to the real world. Second, you will get a general overview of Machine Learning topics such as supervised vs unsupervised learning, model evaluation, and Machine Learning algorithms.

Jun 22nd 2026
5-12 Weeks
Basic Data Processing and Visualization (Coursera) Coursera
University of California, San Diego

Basic Data Processing and Visualization (Coursera)

This is the first course in the four-course specialization Python Data Products for Predictive Analytics, introducing the basics of reading and manipulating datasets in Python. In this course, you will learn what a data product is and go through several Python libraries to perform data retrieval, processing, and visualization.

Jun 22nd 2026
5-12 Weeks