Data for Machine Learning (Coursera)

Data for Machine Learning (Coursera)

This course is all about data and how it is critical to the success of your applied machine learning model.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Completing this course will give learners the skills to:

  • Understand the critical elements of data in the learning, training and operation phases
  • Understand biases and sources of data
  • Implement techniques to improve the generality of your model
  • Explain the consequences of overfitting and identify mitigation measures
  • Implement appropriate test and validation measures.
  • Demonstrate how the accuracy of your model can be improved with thoughtful feature engineering.
  • Explore the impact of the algorithm parameters on model strength

To be successful in this course, you should have at least beginner-level background in Python programming (e.g., be able to read and code trace existing code, be comfortable with conditionals, loops, variables, lists, dictionaries and arrays). You should have a basic understanding of linear algebra (vector notation) and statistics (probability distributions and mean/median/mode).
Course 3 of 4 in the Machine Learning: Algorithms in the Real World Specialization.

Syllabus

WEEK 1
What Does Good Data look like?
We all know that data is important for machine learning success, but what does it really look like? What steps do you need to take to get from scattered, unprocessed data to nice clean learning data? This week takes an overarching view to describe how your problem and data needs interact, and what processes need to be in place for successful data preparation.

WEEK 2
Preparing your Data for Machine Learning Success
Now that you have your data sources identified, you need to bring it all together. This week describes what you need to prepare data overall.

WEEK 3
Feature Engineering for MORE Fun & Profit
Data is particular to a problem. This week we'll discuss how to turn generic data into successful fuel for specific machine learning projects.

WEEK 4
Bad Data
There are so many ways data can go wrong! This week discussed some of the pitfalls in data identification and processing.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Advanced Algorithms and Complexity (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Advanced Algorithms and Complexity (Coursera)

You've learned the basic algorithms now and are ready to step into the area of more complex problems and algorithms to solve them. Advanced algorithms build upon basic ones and use new ideas. We will start with networks flows which are used in more typical applications such as optimal matchings, finding disjoint paths and flight scheduling as well as more surprising ones like image segmentation in computer vision.

Jun 8th 2026
5-12 Weeks
Text Retrieval and Search Engines (Coursera) Coursera
University of Illinois at Urbana-Champaign

Text Retrieval and Search Engines (Coursera)

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text.

Jun 8th 2026
5-12 Weeks
Using Python to Interact with the Operating System (Coursera) Coursera
Google

Using Python to Interact with the Operating System (Coursera)

By the end of this course, you’ll be able to manipulate files and processes on your computer’s operating system. You’ll also have learned about regular expressions -- a very powerful tool for processing text files -- and you’ll get practice using the Linux command line on a virtual machine. And, this might feel like a stretch right now, but you’ll also write a program that processes a bunch of errors in an actual log file and then generates a summary file. That’s a super useful skill for IT Specialists to know.

Jun 9th 2026
5-12 Weeks
Foundations of Objective-C App Development (Coursera) Coursera
University of California, Irvine

Foundations of Objective-C App Development (Coursera)

An introduction to the Objective-C programming language. This will prepare you for more extensive iOS app development and build a foundation for advanced iOS development topics. Objective-C programming requires a Mac laptop or desktop computer. An iOS device is optional if the learner is willing to working exclusively with the simulator. Some learners have been able to work with an OS X virtual machine on Windows, but explaining how to do that is beyond the scope of this course.

Jun 8th 2026
4 Weeks
Machine Learning: Clustering & Retrieval (Coursera) Coursera
University of Washington

Machine Learning: Clustering & Retrieval (Coursera)

Case Studies: Finding Similar Documents. A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?

Jun 8th 2026
5-12 Weeks
Interfacing with the Raspberry Pi (Coursera) Coursera
University of California, Irvine

Interfacing with the Raspberry Pi (Coursera)

The Raspberry Pi uses a variety of input/output devices based on protocols such as HDMI, USB, and Ethernet to communicate with the outside world. In this class you will learn how to use these protocols with other external devices (sensors, motors, GPS, orientation, LCD screens etc.) to get your IoT device to interact with the real world.

Jun 8th 2026
4 Weeks
Introduction to Programming with MATLAB (Coursera) Coursera
Vanderbilt University

Introduction to Programming with MATLAB (Coursera)

This course teaches computer programming to those with little to no previous experience. It uses the programming system and language called MATLAB to do so because it is easy to learn, versatile and very useful for engineers and other professionals. MATLAB is a special-purpose language that is an excellent choice for writing moderate-size programs that solve problems involving the manipulation of numbers.

Jun 8th 2026
5-12 Weeks
Understanding Clinical Research: Behind the Statistics (Coursera) Coursera
University of Cape Town

Understanding Clinical Research: Behind the Statistics (Coursera)

If you’ve ever skipped over`the results section of a medical paper because terms like “confidence interval” or “p-value” go over your head, then you’re in the right place. You may be a clinical practitioner reading research articles to keep up-to-date with developments in your field or a medical student wondering how to approach your own research. Greater confidence in understanding statistical analysis and the results can benefit both working professionals and those undertaking research themselves.

Jun 8th 2026
5-12 Weeks
Machine Learning: Classification (Coursera) Coursera
University of Washington

Machine Learning: Classification (Coursera)

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 8th 2026
5-12 Weeks
Python for Data Science, AI & Development (Coursera) Coursera
IBM

Python for Data Science, AI & Development (Coursera)

Kickstart your learning of Python for data science, as well as programming in general, with this beginner-friendly introduction to Python. Python is one of the world’s most popular programming languages, and there has never been greater demand for professionals with the ability to apply Python fundamentals to drive business solutions across industries.

Jun 9th 2026
5-12 Weeks
Introduction to Machine Learning (Coursera) Coursera
Duke University

Introduction to Machine Learning (Coursera)

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 12th 2026
5-12 Weeks