Coursera

Practical Crowdsourcing for Efficient Machine Learning (Coursera)

Offered by Yandex,

This course will teach you efficient and scalable data labeling for ML and various business processes. The key here is the crowdsourcing approach, based on splitting complex challenges into small tasks and distributing them among a vast cloud of performers. You will get acquainted with crowdsourcing as a methodology, mastering certain steps and techniques that ensure quality and stable performance. All these techniques will be implemented in practice straight away: throughout the course, you’ll design your own crowdsourcing project.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

What You Will Learn

Understand the applicability, benefits and limits of the crowdsourcing approach
Integrate an on-demand workforce directly into your processes and build human-in-the-loop processes
Control the quality and accuracy of data labeling to develop high performing ML models
Design and run a full-cycle crowdsourcing project: from planning to getting labeled data

Syllabus

WEEK 1
Introduction to crowdsourcing
We will start the course with discussing what crowdsourcing is and how it is applicable to Machine Learning. By showing examples of large-scale data labeling processes we will learn how diverse and powerful crowdsourcing is. We will also go through the steps necessary to prepare a crowdsourcing projects. This basic understanding will be developed in the following weeks, as well as your own crowdsourcing projects. This time you will choose a project most relevant to you and draft its pipeline. Last but not least – you will meet a team of Yandex’s Crowd Solutions Architects. They will give a short introduction to their crowdsourcing projects and share experience on how to design an efficient task pipeline.

WEEK 2
Instructions and interfaces
This week we will dive into designing crowdsourcing projects. After a task has been decomposed to smaller pieces, it is time to create interfaces and guidelines. We will go through some tips on performer-friendly interface design and learn how to compose guidelines that will help performers along the way.
Week 2 is an important step in developing your own crowdsourcing project. Based on the pipeline from last week, you will create your projects on a real crowdsourcing platform. Stepping into the performers’ shoes, you will try to label some data and create instructions about it. We recommend to invest a decent amount of time into this week’s assignments. It will contribute a lot into your final task of collecting labeled data.

WEEK 3
Quality control
It’s time to talk about ensuring data quality. This week we will discuss how to select and train performers and learn how to configure quality checks depending on task specifics. Most crowdsourcing platforms offer a vide range of quality control mechanisms, but it is important to choose those that are most applicable to your task.
You will also develop training and quality control for your own crowdsourcing projects. And our Crowd Solutions Architects will share their experience about setting up complicated quality controls.

WEEK 4
Smart techniques to enhance quality
This week is an introduction to the research field dealing with crowdsourcing challenges. It is a variety of topics that mostly follow the same goal: get more quality while keeping budget limits.
The first aspect we will discuss is performers’ motivation. Even though we say that crowdsourcing is an engineering task, its most important resource are people. It is necessary to thinks about their possible benefits and intentions for working on your tasks. Second topic of discussion is enhancing quality by working with collected answers. There are several answer aggregation algorithms that allow to get more quality out of the same label set. Watch the videos and learn how it works!

WEEK 5
How projects are launched and maintained
Wow, we have made it to Week 5! Congratulations :)
This week we will talk about crowdsourcing projects in a long-term perspective. Most of them are not just one-time launches. For most business processes data needs to be collected and labeled constantly. We will share our experience about making the cloud of performers a stable and loyal community and provide a list of certain metrics that help to understand what is going on in your projects. The team of Crowd Solutions Architects will appear in whole to give a full retrospective into the projects they have been talking about previously.

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

McMaster University

Experimentation for Improvement (Coursera)

Statistics & Data Analysis Data Science

We are always using experiments to improve our lives, our community, and our work. Are you doing it efficiently? Or are you (incorrectly) changing one thing at a time and hoping for the best? In this course, you will learn how to plan efficient experiments - testing with many variables. Our goal is to find the best results using only a few experiments. A key part of the course is how to optimize a system.

Jun 22nd 2026

5-12 Weeks

Statistics Data Science Regression Models

Coursera

Johns Hopkins University

Bioconductor for Genomic Data Science (Coursera)

Statistics & Data Analysis Data Science

Learn to use tools from the Bioconductor project to perform analysis of genomic data. This is the fifth course in the Genomic Big Data Specialization from Johns Hopkins University.

Jun 22nd 2026

4 Weeks

Bioinformatics Data Analysis Data Science

Coursera

Vanderbilt University

Data Management for Clinical Research (Coursera)

Health & Society Medicine & Pharmacology

This course presents critical concepts and practical methods to support planning, collection, storage, and dissemination of data in clinical research. Understanding and implementing solid data management principles is critical for any scientific domain. Regardless of your current (or anticipated) role in the research enterprise, a strong working knowledge and skill set in data management principles and practice will increase your productivity and improve your science. Our goal is to use these modules to help you learn and practice this skill set.

Jun 22nd 2026

5-12 Weeks

Data Data Management Clinical Research

Coursera

University of Illinois at Urbana-Champaign

Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud (Coursera)

Security & Networking

Welcome to the Cloud Computing Applications course, the second part of a two-course series designed to give you a comprehensive view on the world of Cloud Computing and Big Data! In this second course we continue Cloud Computing Applications by exploring how the Cloud opens up data analytics of huge volumes of data that are static or streamed at high velocity and represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways that society is informed by, and uses information.

Jun 22nd 2026

4 Weeks

Cloud Machine Learning Big Data

Coursera

University of Washington

Machine Learning: Regression (Coursera)

Statistics & Data Analysis Data Science

Case Study - Predicting Housing Prices. In our first case study, predicting house prices, you will create models that predict a continuous value (price) from input features (square footage, number of bedrooms and bathrooms,...). This is just one of the many places where regression can be applied. Other applications range from predicting health outcomes in medicine, stock prices in finance, and power usage in high-performance computing, to analyzing which regulators are important for gene expression.

Jun 22nd 2026

5-12 Weeks

Python Algorithms Machine Learning

Coursera

Johns Hopkins University

The Data Scientist's Toolbox (Coursera)

Statistics & Data Analysis Data Science

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

Jun 22nd 2026

4 Weeks

Data Github Data Analysis

Coursera

University of Washington

Machine Learning: Classification (Coursera)

Statistics & Data Analysis Data Science

Case Studies: Analyzing Sentiment & Loan Default Prediction. In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank.

Jun 22nd 2026

5-12 Weeks

Python Machine Learning Classification

Coursera

Johns Hopkins University

Exploratory Data Analysis (Coursera)

Statistics & Data Analysis Data Science

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026

4 Weeks

Statistics Data Analysis Data Science

Coursera

University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Statistics & Data Analysis Data Science

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026

5-12 Weeks

Python Machine Learning Clustering

Coursera

University of Washington

Communicating Data Science Results (Coursera)

Statistics & Data Analysis Data Science

Making predictions is not enough! Effective data scientists know how to explain and interpret their results, and communicate findings accurately to stakeholders to inform business decisions. Visualization is the field of research in computer science that studies effective communication of quantitative results by linking perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will learn to recognize, design, and use effective visualizations.

Jun 22nd 2026

3 Weeks

Ethics Cloud Computing Privacy

Coursera

Johns Hopkins University

Regression Models (Coursera)

Statistics & Data Analysis Data Science

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models.

Jun 22nd 2026

4 Weeks

Statistics Regression Linear Regression

Coursera

IBM

Data Science Methodology (Coursera)

Statistics & Data Analysis Data Science

Despite the recent increase in computing power and access to data over the last couple of decades, our ability to use the data within the decision making process is either lost or not maximized at all too often, we don't have a solid understanding of the questions being asked and how to apply the data correctly to the problem at hand.

Jun 26th 2026

3 Weeks

Data Modeling Methodology Data Science