EdX

Big Data Analytics (edX)

Big Data Analytics (edX)

Learn key technologies and techniques, including R and Apache Spark, to analyse large-scale data sets to uncover valuable business information. Gain essential skills in today’s digital age to store, process and analyse data to inform business decisions.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this course, part of the Big Data MicroMasters program, you will develop your knowledge of big data analytics and enhance your programming and mathematical skills. You will learn to use essential analytic tools such as Apache Spark and R.
Topics covered in this course include:

  • cloud-based big data analysis;
  • predictive analytics, including probabilistic and statistical models;
  • application of large-scale data analysis;
  • analysis of problem space and data needs.

By the end of this course, you will be able to approach large-scale data science problems with creativity and initiative.
This course is part of the Big Data MicroMasters.

What you'll learn

  • How to develop algorithms for the statistical analysis of big data;
  • Knowledge of big data applications;
  • How to use fundamental principles used in predictive analytics;
  • Evaluate and apply appropriate principles, techniques and theories to large-scale data science problems.

Prerequisites
Candidates pursuing the MicroMasters program are advised to complete Programming for Data Science, Computational Thinking and Big Data & Big Data Fundamentals before undertaking this course.

Course Syllabus

Section 1: Simple linear regression
Fit a simple linear regression between two variables in R; Interpret output from R; Use models to predict a response variable; Validate the assumptions of the model.

Section 2: Modelling data
Adapt the simple linear regression model in R to deal with multiple variables; Incorporate continuous and categorical variables in their models; Select the best-fitting model by inspecting the R output.

Section 3: Many models
Manipulate nested dataframes in R; Use R to apply simultaneous linear models to large data frames by stratifying the data; Interpret the output of learner models.

Section 4: Classification
Adapt linear models to take into account when the response is a categorical variable; Implement Logistic regression (LR) in R; Implement Generalised linear models (GLMs) in R; Implement Linear discriminant analysis (LDA) in R.

Section 5: Prediction using models
Implement the principles of building a model to do prediction using classification; Split data into training and test sets, perform cross validation and model evaluation metrics; Use model selection for explaining data with models; Analyse the overfitting and bias-variance trade-off in prediction problems.

Section 6: Getting bigger
Set up and apply sparklyr; Use logical verbs in R by applying native sparklyr versions of the verbs.

Section 7: Supervised machine learning with sparklyr
Apply sparklyr to machine learning regression and classification models; Use machine learning models for prediction; Illustrate how distributed computing techniques can be used for “bigger” problems.

Section 8: Deep learning
Use massive amounts of data to train multi-layer networks for classification; Understand some of the guiding principles behind training deep networks, including the use of autoencoders, dropout, regularization, and early termination; Use sparklyr and H2O to train deep networks.

Section 9: Deep learning applications and scaling up
Understand some of the ways in which massive amounts of unlabelled data, and partially labelled data, is used to train neural network models; Leverage existing trained networks for targeting new applications; Implement architectures for object classification and object detection and assess their effectiveness.

Section 10: Bringing it all together
Consolidate your understanding of relationships between the methodologies presented in this course, theirrelative strengths, weaknesses and range of applicability of these methods.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Introduction to Apache Spark (edX) EdX
University of California, Berkeley

Introduction to Apache Spark (edX)

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

Not Available
Course Not Available
Biostatistics for Big Data Applications (edX) EdX
University of Texas Medical Branch

Biostatistics for Big Data Applications (edX)

Learn data analysis basics for working with biomedical big data with practical hands-on examples using R. This course provides a broad foundation of statistical terms and concepts as well as an introduction to the R statistical software package. The topics covered are fundamental components of biostatistical methods used in both omics and population health research.

No sessions Available
5-12 Weeks
Unix Tools: Data, Software and Production Engineering (edX) EdX
Delft University of Technology,DelftX

Unix Tools: Data, Software and Production Engineering (edX)

Grow from being a Unix novice to Unix wizard status! Process big data, analyze software code, run DevOps tasks and excel in your everyday job through the amazing power of the Unix shell and command-line tools. Processing information is the hallmark of all modern organizations, which are increasingly digital: absorbing, processing and generating information is a key element of their business.

Self Paced
Self-Paced
Big Data Strategies to Transform Your Business (edX) EdX
Delft University of Technology,DelftX

Big Data Strategies to Transform Your Business (edX)

Make your organization’s business strategy and model, as well as your own career path, future-proof by using big data’s disruptive power. While big data infiltrates all walks of life, most firms have not changed sufficiently to meet the challenges that come with it. In this course, you will learn how to develop a big data strategy, transform your business model and your organization. This course will enable professionals to take their organization and their own career to the next level, regardless of their background and position.

Self Paced
Self-Paced
Big Data Capstone Project (edX) EdX
University of Adelaide,AdelaideX

Big Data Capstone Project (edX)

Further develop your knowledge of big data by applying the skills you have learned to a real-world data science project. This project will give you the opportunity to deepen your learning by giving you valuable experience in evaluating, selecting and applying relevant data science techniques, principles and theory to a data science problem. This project will see you plan and execute a reasonably substantial project and demonstrate autonomy, initiative and accountability.

Self Paced
Self-Paced
Data Analytics and Visualization in Health Care (edX) EdX
Rochester Institute of Technology,RITx

Data Analytics and Visualization in Health Care (edX)

Learn best practices in data analytics, informatics, and visualization to gain literacy in data-driven, strategic imperatives that affect all facets of health care. Big data is transforming the health care industry relative to improving quality of care and reducing costs—key objectives for most organizations. Employers are desperately searching for professionals who have the ability to extract, analyze, and interpret data from patient health records, insurance claims, financial records, and more to tell a compelling and actionable story using health care data analytics.

Self Paced
Self-Paced
Predictive Analytics (edX) EdX
Indian Institute of Management, Bangalore,IIMBx

Predictive Analytics (edX)

Master the tools of predictive analytics in this statistics based analytics course. Decision makers often struggle with questions such as: What should be the right price for a product? Which customer is likely to default in his/her loan repayment? Which products should be recommended to an existing customer? Finding right answers to these questions can be challenging yet rewarding.

This course is archived
5-12 Weeks
Data Analytics in Health - From Basics to Business (edX) EdX
KU Leuven University

Data Analytics in Health - From Basics to Business (edX)

Improve diagnostics, care and curing by effectively applying data analytics in healthcare and spot entrepreneurial opportunities. Many people talk about the promise of “big data” to health care. But how can the application of data analytics to big data actually improve health and health care? We will show that novel data analytics based solutions can result in better diagnosis, better care and better curing. This provides fertile ground for entrepreneurship and the development of new businesses.

No session available
4 Weeks
Big Data Fundamentals (edX) EdX
University of Adelaide,AdelaideX

Big Data Fundamentals (edX)

Learn how big data is driving organisational change and essential analytical tools and techniques, including data mining and PageRank algorithms. Organizations now have access to massive amounts of data and it’s influencing the way they operate. They are realizing in order to be successful they must leverage their data to make effective business decisions.

Self Paced
Self-Paced