EdX

Big Data Capstone Project (edX)

Big Data Capstone Project (edX)

Further develop your knowledge of big data by applying the skills you have learned to a real-world data science project. This project will give you the opportunity to deepen your learning by giving you valuable experience in evaluating, selecting and applying relevant data science techniques, principles and theory to a data science problem. This project will see you plan and execute a reasonably substantial project and demonstrate autonomy, initiative and accountability.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

The Big Data Capstone Project will allow you to apply the techniques and theory you have gained from the four courses in this Big Data MicroMasters program to a medium-scale data science project.
Working with organisations and stakeholders of your choice on a real-world dataset, you will further develop your data science skills and knowledge.
You’ll deepen your learning of social and ethical concerns in relation to data science, including an analysis of ethical concerns and ethical frameworks in relation to data selection and data management.
By communicating the knowledge, skills and ideas you have gained to other learners through online collaborative technologies, you will learn valuable communication skills, important for any career. You’ll also deliver a written oral presentation of your project design, plan, methodologies, and outcomes.
This course is part of the Big Data MicroMasters.

What you'll learn
The Big Data Capstone project will give you the chance to demonstrate practically what you have learned in the Big Data MicroMasters program including:

  • How to evaluate, select and apply data science techniques, principles and theory;
  • How to plan and execute a project;
  • Work autonomously using your own initiative;
  • Identify social and ethical concerns around your project;
  • Develop communication skills using online collaborative technologies.

Prerequisites
Candidates interested in pursuing this program are advised to complete Programming for Data Science, Computational Thinking and Big Data, Big Data Fundamentals & Big Data Analytics before this course.

Syllabus

Dataset overview, data selection and ethics
Understand ethical issues and concerns around big data projects;Describe how ethical issues apply to the sample dataset;Describe up to three ethical approaches;Apply ethical analysis to scenarios.

Exam (timed, proctored)
The exam will cover content from the first four courses in the Big Data MicroMasters program, including the Ethics section of this capstone course, DataCapX. Itwill include questions on topics such as code structure and testing, variable types, graphs, big data algorithms, regression and ethics.

Project Task 1: Data cleaning and Regression
Understand the basic data cleaning and preprocessing steps required in the analysis of a real data set;Create computer code to read data and perform data cleaning and preprocessing;Judge the appropriateness of a fitted regression model to the data;Determine whether simplification of a regression model is appropriate;Apply a fitted regression model to obtain predictions for new observations.

Project Task 2: Classification
Build classifiers to predict the output of a desired factor;Analyse learned classifiers;Design a feature selection scheme;Design a scheme for evaluating the performance of classifiers.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Enabling Technologies for Data Science and Analytics: The Internet of Things (edX) EdX
Columbia University,ColumbiaX

Enabling Technologies for Data Science and Analytics: The Internet of Things (edX)

Discover the relationship between Big Data and the Internet of Things (IoT). The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices will be connected by 2020. In this data science course, you will learn about the major components of the Internet of Things and how data is acquired from sensors. You will also examine ways of analyzing event data, sentiment analysis, facial recognition software and how data generated from devices can be used to make decisions.

Self Paced
Self-Paced
Introduction to Computational Thinking and Data Science (edX) EdX
MIT,MITx

Introduction to Computational Thinking and Data Science (edX)

This course is an introduction to using computation to understand real-world phenomena. This course will teach you how to use computation to accomplish a variety of goals and provides you with a brief introduction to a variety of topics in computational problem solving. This course is aimed at students with some prior programming experience in Python and a rudimentary knowledge of computational complexity.

Mar 20th 2024
5-12 Weeks
Data Analytics in Health - From Basics to Business (edX) EdX
KU Leuven University

Data Analytics in Health - From Basics to Business (edX)

Improve diagnostics, care and curing by effectively applying data analytics in healthcare and spot entrepreneurial opportunities. Many people talk about the promise of “big data” to health care. But how can the application of data analytics to big data actually improve health and health care? We will show that novel data analytics based solutions can result in better diagnosis, better care and better curing. This provides fertile ground for entrepreneurship and the development of new businesses.

No session available
4 Weeks
Probability and Statistics in Data Science using Python (edX) EdX
University of California, San Diego,UC San DiegoX

Probability and Statistics in Data Science using Python (edX)

Using Python, learn statistical and probabilistic approaches to understand and gain insights from data. The job of a data scientist is to glean knowledge from complex and noisy datasets. Reasoning about uncertainty is inherent in the analysis of noisy data. Probability and Statistics provide the mathematical foundation for such reasoning.

Self Paced
Self-Paced
Data Science: R Basics (edX) EdX
HarvardX,Harvard University

Data Science: R Basics (edX)

Build a foundation in R and learn how to wrangle, analyze, and visualize data. This course will introduce you to the basics of R programming. You can better retain R when you learn it to solve a specific problem, so you’ll use a real-world dataset about crime in the United States. You will learn the R skills needed to answer essential questions about differences in crime across the different states.

Self Paced
Self-Paced
Principles, Statistical and Computational Tools for Reproducible Science (edX) EdX
HarvardX,Harvard University

Principles, Statistical and Computational Tools for Reproducible Science (edX)

Learn skills and tools that support data science and reproducible research, to ensure you can trust your own research results, reproduce them yourself, and communicate them to others. Today the principles and techniques of reproducible research are more important than ever, across diverse disciplines from astrophysics to political science. No one wants to do research that can’t be reproduced. Thus, this course is really for anyone who is doing any data intensive research. While many of us come from a biomedical background, this course is for a broad audience of data scientists.

Self Paced
Self-Paced
Knowledge Management and Big Data in Business (edX) EdX
The Hong Kong Polytechnic University,HKPolyUx

Knowledge Management and Big Data in Business (edX)

Learn why and how knowledge management and Big Data are vital to the new business era. The business landscape is changing so rapidly that traditional management, business and computing courses do not meet the needs for the next generation of workers in the business world. Most traditional methods are of a repetitive, rule-based nature and will be gradually replaced by Artificial Intelligence.

Self Paced
Self-Paced
Computational Thinking and Big Data (edX) EdX
University of Adelaide,AdelaideX

Computational Thinking and Big Data (edX)

Learn the core concepts of computational thinking and how to collect, clean and consolidate large-scale datasets. Computational thinking is an invaluable skill that can be used across every industry, as it allows you to formulate a problem and express a solution in such a way that a computer can effectively carry it out.

Self Paced
Self-Paced