EdX

Data Creation and Collection for Artificial Intelligence via Crowdsourcing (edX)

Data Creation and Collection for Artificial Intelligence via Crowdsourcing (edX)

A one-stop shop to get started on the key considerations about data for AI! Learn how crowdsourcing offers a viable means to leverage human intelligence at scale for data creation, enrichment and interpretation, demonstrating a great potential to improve both the performance of AI systems and their trustworthiness and increase the adoption of AI in general.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

Advances in Artificial Intelligence and Machine Learning have led to technological revolutions. Yet, AI systems at the forefront of such innovations have been the center of growing concerns. These involve reports of system failure when conditions are only slightly different from the training phase and they also trigger ethical and societal considerations that arise as a result of their use.
Machine learning models have been criticized for lacking robustness, fairness and transparency. Such model-related problems can generally be attributed to a large extent to issues with data. In order to learn comprehensive, fine-grained and unbiased patterns, models have to be trained on a large number of high-quality data instances with distribution that accurately represents real application scenarios. Creating such data is not only a long, laborious and expensive process, but sometimes even impossible when the data is extremely imbalanced, or the distribution constantly evolves over time.
This course will introduce an important method that can be used to gather data for training machine learning models and building AI systems. Crowdsourcing offers a viable means of leveraging human intelligence at scale for data creation, enrichment and interpretation with great potential to improve the performance of AI systems and increase the wider adoption of AI in general.
By the end of this course you will be able to understand and apply crowdsourcing methods to elicit human input as a means of gathering high-quality data for machine learning. You will be able to identify biases in datasets as a result of how they are gathered or created and select from task design choices that can optimize data quality. These learnings will contribute to an important set of skills that are essential for career trajectories in the field of Data Science, Machine Learning, and the broader realms of Artificial Intelligence.

What you'll learn
At the end of this course you will be able to:

  • Examine the use of crowdsourcing for gathering data
  • Explain how cognitive biases and other human factors influence data quality
  • Describe the use of active learning in the creation of crowdsourced training data
  • Demonstrate the design of crowdsourcing tasks with quality control mechanisms
  • Discuss the evaluation of ML models with humans in the loop

Syllabus

Week 1: Crowdsourcing for High-quality Data Collection and The ImageNet Story
Artificial Intelligence is at the center of many recent advancements across areas such as transportation and finance. One of the reasons for this is that in the past decade we have designed methods to harness human intelligence at scale.
We will introduce and discuss the crowdsourcing paradigm and the importance of high-quality data.
Topics we will cover this week:
The intuition behind crowdsourcing
The role of crowdsourcing platforms
The need for high-quality data for AI models
What is ImageNet, the gap it filled, and how it was built

Week 2: Quality Control Mechanisms for Crowdsourcing
The quality of crowdsourced human input is one of the most crucial aspects affecting the overall value of the paradigm. In this week we will discuss the challenges that make quality control difficult to guarantee.
Topics we will cover this week:
Workers' motives and behaviors
Quality control mechanisms in crowdsourcing
Incentives in crowdsourcing (like gamification)
Cognitive aspects and psychometric methods

Week 3: Factors Affecting Quality in Crowdsourcing
Researchers and practitioners in human computation and crowdsourcing have identified several factors that affect the quality of crowdsourced data. In this week we will discuss some of the recent works in this regard.
Topics we will cover this week:
Tradeoff between task pricing and quality of output
The role of workers' demographics, qualifications and skills
The importance of task clarity and work environments
The concepts of task packaging, task framing and task priming

Week 4: Human Input for Data Creation and Model Evaluation in AI
In this week, we will cover the importance of data collection, annotation and engineering.
Topics we will cover this week:
The importance of data collection
Data generation
The role of crowdsourcing in advanced machine learning
Taxonomy of microtasks

Week 5: Reducing Worker Effort: Active Learning
In this week we explore the challenges of collecting large scale data and how to overcome them.
Topics we will cover this week:
Approaches to reducing worker effort
The implications of reducing labeling effort
The key idea of active learning
Query strategies for selecting informative instances

Week 6: Interpreting, Evaluating, and Debugging ML models
In this week, we discuss strategies for evaluating, debugging, and interpreting machine learning models.
Topics we will cover this week:
The notion of model interpretability
The role of humans in the interpretability process
Debugging ML pipelines and related challenges

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Analítica avanzada y seguridad cibernética (edX) EdX
Galileo University,GalileoX

Analítica avanzada y seguridad cibernética (edX)

La digitalización del sector energético brinda una gran oportunidad para alcanzar una matriz energética diversificada y sostenible. Sin embargo existen grandes retos por delante, los cuales pueden ser superados gracias a los avances en los sistemas de analítica avanzada. Por otra parte, la digitalización del sector energético requiere la implementación de las mejores prácticas para proteger los sistemas y la información de ciberataques y así, mejorar la seguridad operativa y la confiabilidad de los sistemas.

Self Paced
Self-Paced
Introduction to Computer Science and Programming (edX) EdX
Tokyo Institute of Technology,TokyoTechX

Introduction to Computer Science and Programming (edX)

The term “Computation” refers to the action performed by a computer. A computation can be a basic operation and it can also be a sophisticated computer simultation requiring a large amount of data and substantial resources. This course aims at introducing learners with no prior knowledge to basics and key concepts of computer science. By following the lectures and exercises of this course you will have an understanding of algorithms and you will get a real experience of programming using the language Ruby.

Self Paced
Self-Paced
¿Cómo hacer uso responsable de la inteligencia artificial en el sector público? (edX) EdX
Inter-American Development Bank - IDB,IDBx

¿Cómo hacer uso responsable de la inteligencia artificial en el sector público? (edX)

Este MOOC aborda los conceptos, principios, desafíos y oportunidades del uso ético y responsable de la inteligencia artificial (IA) para el sector público. Presenta herramientas para garantizar estándares mínimos, así como para fortalecer la calidad de los datos y de los modelos de IA desde su diseño hasta su implementación y monitoreo. Todo ello con el fin de reducir los posibles riesgos asociados a los sistemas de IA para una adopción de dicha tecnología de forma ética y responsable.

Self Paced
Self-Paced
Making Evidence-Based Strategic Decisions (edX) EdX
University of Maryland, College Park,University System of Maryland - USM,USMx,UMD

Making Evidence-Based Strategic Decisions (edX)

Drive alignment among managers, employees and the organizational goals through data analytics and data products. This course on digital transformation will show you how to turn your organization into a decision-making factory. What makes a good business decision? How can we combine effective data analytics and feed robust foresight and scenario planning processes?

Self Paced
Self-Paced
AI Chatbots without Programming (edX) EdX
IBM

AI Chatbots without Programming (edX)

Chatbots are increasingly in demand among global businesses. This course will teach you how to build, analyze, deploy and monetize chatbots - with the help of IBM Watson and the power of AI. Please Note: Learners who successfully complete this IBM course can earn a skill badge — a detailed, verifiable and digital credential that profiles the knowledge and skills you’ve acquired in this course. Enroll to learn more, complete the course and claim your badge!

Self Paced
Self-Paced
Computer Applications of Artificial Intelligence and e-Construction (edX) EdX
Purdue University,PurdueX

Computer Applications of Artificial Intelligence and e-Construction (edX)

Learn the fundamentals of artificial intelligence, machine learning, natural language processing and their applications in e-Construction. This course is the third in a sequence of interrelated courses of the current computer applications in the construction industry. The emphasis of this course is the advanced computational tools including artificial intelligence, machine learning, and natural language processing, and their applications in e-Construction.

Mar 28th 2022
5-12 Weeks
Tech for Good: The Role of ICT in Achieving the SDGs (edX) EdX
SDGAcademyX,SDG Academy

Tech for Good: The Role of ICT in Achieving the SDGs (edX)

What opportunities and challenges do digital technologies present for the development of our society? Tech for Good was developed by UNESCO and Cetic, the Brazilian Network Information Center’s Regional Center for Studies on the Development of the Information Society. It brings together thought leaders and changemakers in the fields of information and communication technologies (ICT) and sustainable development to show how digital technologies are empowering billions of people around the world by providing access to education, healthcare, banking, and government services; and how “big data” is being used to inform smarter, evidence-based policies to improve people’s lives in fundamental ways.

Self Paced
Self-Paced
Computer Vision Fundamentals with Watson and OpenCV (edX) EdX
IBM

Computer Vision Fundamentals with Watson and OpenCV (edX)

Learn about computer vision, one of the most exciting fields in machine learning. artificial intelligence and computer science. Computer Vision is one of the most exciting fields in Machine Learning, computer science and AI. It has applications in many industries such as self-driving cars, robotics, augmented reality, face detection in law enforcement agencies.

Self Paced
Self-Paced
Data Analytics and Visualization in Health Care (edX) EdX
Rochester Institute of Technology,RITx

Data Analytics and Visualization in Health Care (edX)

Learn best practices in data analytics, informatics, and visualization to gain literacy in data-driven, strategic imperatives that affect all facets of health care. Big data is transforming the health care industry relative to improving quality of care and reducing costs—key objectives for most organizations. Employers are desperately searching for professionals who have the ability to extract, analyze, and interpret data from patient health records, insurance claims, financial records, and more to tell a compelling and actionable story using health care data analytics.

Self Paced
Self-Paced