EdX

Data Science: Wrangling (edX)

Data Science: Wrangling (edX)

Learn to process and convert raw data into formats needed for analysis. In this course, we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

This course is part of our Data Science Professional Certificate.
Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.
This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.

What you'll learn

  • Importing data into R from different file formats
  • Web scraping
  • How to tidy data using the tidyverse to better facilitate analysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to work with dates and times as file formats
  • Text mining
Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Python for Data Science (edX) EdX
University of California, San Diego,UC San DiegoX

Python for Data Science (edX)

Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets. In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a large dataset, would you be able to find the answers you seek?

Self Paced
Self-Paced
Introduction to Data Science (edX) EdX
IBM

Introduction to Data Science (edX)

Learn about the world of data science first-hand from real data scientists. The art of uncovering the insights and trends in data has been around for centuries. The ancient Egyptians applied census data to increase efficiency in tax collection and they accurately predicted the flooding of the Nile river every year.

Self Paced
Self-Paced
Computational Thinking and Big Data (edX) EdX
University of Adelaide,AdelaideX

Computational Thinking and Big Data (edX)

Learn the core concepts of computational thinking and how to collect, clean and consolidate large-scale datasets. Computational thinking is an invaluable skill that can be used across every industry, as it allows you to formulate a problem and express a solution in such a way that a computer can effectively carry it out.

Self Paced
Self-Paced
Programming for Data Science (edX) EdX
University of Adelaide,AdelaideX

Programming for Data Science (edX)

Learn how to apply fundamental programming concepts, computational thinking and data analysis techniques to solve real-world data science problems. There is a rising demand for people with the skills to work with Big Data sets and this course can start you on your journey through our Big Data MicroMasters program towards a recognised credential in this highly competitive area. Using practical activities you will learn how digital technologies work and will develop your coding skills through engaging and collaborative assignments.

Self Paced
Self-Paced
Principles, Statistical and Computational Tools for Reproducible Science (edX) EdX
HarvardX,Harvard University

Principles, Statistical and Computational Tools for Reproducible Science (edX)

Learn skills and tools that support data science and reproducible research, to ensure you can trust your own research results, reproduce them yourself, and communicate them to others. Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

Self Paced
Self-Paced
Data Science: Probability (edX) EdX
HarvardX,Harvard University

Data Science: Probability (edX)

Learn probability theory — essential for a data scientist — using a case study on the financial crisis of 2007–2008. In this course, you will learn valuable concepts in probability theory. The motivation for this course is the circumstances surrounding the financial crisis of 2007–2008. Part of what caused this financial crisis was that the risk of some securities sold by financial institutions was underestimated. To begin to understand this very complicated event, we need to understand the basics of probability.

Self Paced
Self-Paced
Data Science: Inference and Modeling (edX) EdX
HarvardX,Harvard University

Data Science: Inference and Modeling (edX)

Learn inference and modeling, two of the most widely used statistical tools in data analysis. Statistical inference and modeling are indispensable for analyzing data affected by chance, and thus essential for data scientists. In this course, you will learn these key concepts through a motivating case study on election forecasting.

Self Paced
Self-Paced
Data Science Tools (edX) EdX
IBM

Data Science Tools (edX)

Learn about the most popular data science tools, including how to use them and what their features are. In this course, you'll learn about Data Science tools like Jupyter Notebooks, RStudio IDE, and Watson Studio. You will learn what each tool is used for, what programming languages they can execute, their features and limitations and how data scientists use these tools today.

Self Paced
Self-Paced
Data Science and Agile Systems for Product Management (edX) EdX
University of Maryland, College Park,University System of Maryland - USM,USMx,UMD

Data Science and Agile Systems for Product Management (edX)

Deliver faster, higher quality, and fault-tolerant products regardless of industry using the latest in Agile, DevOps, and Data Science. Modern systems today must be designed for agility in order to outpace the competition. Concepts like Agile, DevOps, and Data Science were once considered only for the technology-based companies. Today that means every company. Because there is no greater currency than timely information for optimizing operations and meeting the needs of customers.

Self Paced
Self-Paced