EdX

Big Data Fundamentals (edX)

Big Data Fundamentals (edX)

Learn how big data is driving organisational change and essential analytical tools and techniques, including data mining and PageRank algorithms. Organizations now have access to massive amounts of data and it’s influencing the way they operate. They are realizing in order to be successful they must leverage their data to make effective business decisions.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this course, part of the Big Data MicroMasters program, you will learn how big data is driving organizational change and the key challenges organizations face when trying to analyse massive data sets.
You will learn fundamental techniques, such as data mining and stream processing. You will also learn how to design and implement PageRank algorithms using MapReduce, a programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. You will learn how big data has improved web search and how online advertising systems work.
By the end of this course, you will have a better understanding of the various applications of big data methods in industry and research.
This course is part of the Big Data MicroMasters.

What you'll learn

  • Knowledge and application of MapReduce
  • Understanding the rate of occurrences of events in big data
  • How to design algorithms for stream processing and counting of frequent elements in Big Data
  • Understand and design PageRank algorithms
  • Understand underlying random walk algorithms

Prerequisites:Candidates interested in pursuing the MicroMasters program in Big Data are advised to complete Programming for Data Science and Computational Thinking and Big Data before undertaking this course.

Course Syllabus

Section 1: The basics of working with big data
Understand the four V’s of Big Data (Volume, Velocity, and Variety)
Build models for data
Understand the occurrence of rare events in random data

Section 2: Web and social networks
Understand characteristics of the web and social networks
Model social networks
Apply algorithms for community detection in networks

Section 3: Clustering big data
Clustering social networks
Apply hierarchical clustering
Apply k-means clustering

Section 4: Google web search
Understand the concept of PageRank
Implement the basic PageRank algorithm for strongly connected graphs
Implement PageRank with taxation for graphs that are not strongly connected

Section 5: Parallel and distributed computing using MapReduce
Understand the architecture for massive distributed and parallel computing
Apply MapReduce using Hadoop
Compute PageRank using MapReduce

Section 6: Computing similar documents in big data
Measure importance of words in a collection of documents
Measure similarity of sets and documents
Apply local sensitivity hashing to compute similar documents

Section 7: Products frequently bought together in stores
Understand the importance of frequent item sets
Design association rules
Implement the A-priori algorithm

Section 8: Movie and music recommendations
Understand the differences of recommendation systems
Design content-based recommendation systems
Design collaborative filtering recommendation systems

Section 9: Google's AdWordsTM System
Understand the AdWords System
Analyse online algorithms in terms of competitive ratio
Use online matching to solve the AdWords problem

Section 10: Mining rapidly arriving data streams
Understand types of queries for data streams
Analyse sampling methods for data streams
Count distinct elements in data streams
Filter data streams

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Industry 4.0: How to Revolutionize your Business (edX) EdX
The Hong Kong Polytechnic University,HKPolyUx

Industry 4.0: How to Revolutionize your Business (edX)

An introduction to the fourth industrial revolution, it's major systems and technologies and how new products and services will impact business and society. We have witnessed the power of mechanization in the early nineteen century, automation in the seventies, information and the internet in the last decades. But now, the adaptation of connected intelligence into the business and social fabrics is advancing at an astonishing speed, which will completely change the way we conduct business.

Self Paced
Self-Paced
UX Data Analysis (edX) EdX
HECMontrealX,HEC Montréal

UX Data Analysis (edX)

Become a UX data scientist! From qualitative data analysis to big data Web analytics, you will be able to leverage insights from data to make empirically-based recommendations. Do big data and UX speak to you? This MOOC will give you the methods and tools to analyze the whole spectrum of data we handle in UX, from qualitative user research and quantitative user testing data analysis to big data Web analytics.

Self Paced
Self-Paced
Introduction to Computer Science and Programming (edX) EdX
Tokyo Institute of Technology,TokyoTechX

Introduction to Computer Science and Programming (edX)

The term “Computation” refers to the action performed by a computer. A computation can be a basic operation and it can also be a sophisticated computer simultation requiring a large amount of data and substantial resources. This course aims at introducing learners with no prior knowledge to basics and key concepts of computer science. By following the lectures and exercises of this course you will have an understanding of algorithms and you will get a real experience of programming using the language Ruby.

Self Paced
Self-Paced
Unix Tools: Data, Software and Production Engineering (edX) EdX
Delft University of Technology,DelftX

Unix Tools: Data, Software and Production Engineering (edX)

Grow from being a Unix novice to Unix wizard status! Process big data, analyze software code, run DevOps tasks and excel in your everyday job through the amazing power of the Unix shell and command-line tools. Processing information is the hallmark of all modern organizations, which are increasingly digital: absorbing, processing and generating information is a key element of their business.

Self Paced
Self-Paced
Data Analytics and Visualization in Health Care (edX) EdX
Rochester Institute of Technology,RITx

Data Analytics and Visualization in Health Care (edX)

Learn best practices in data analytics, informatics, and visualization to gain literacy in data-driven, strategic imperatives that affect all facets of health care. Big data is transforming the health care industry relative to improving quality of care and reducing costs—key objectives for most organizations. Employers are desperately searching for professionals who have the ability to extract, analyze, and interpret data from patient health records, insurance claims, financial records, and more to tell a compelling and actionable story using health care data analytics.

Self Paced
Self-Paced
Introduction to Apache Spark (edX) EdX
University of California, Berkeley

Introduction to Apache Spark (edX)

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

Not Available
Course Not Available
Computational Thinking and Big Data (edX) EdX
University of Adelaide,AdelaideX

Computational Thinking and Big Data (edX)

Learn the core concepts of computational thinking and how to collect, clean and consolidate large-scale datasets. Computational thinking is an invaluable skill that can be used across every industry, as it allows you to formulate a problem and express a solution in such a way that a computer can effectively carry it out.

Self Paced
Self-Paced
Big Data, Hadoop, and Spark Basics (edX) EdX
IBM

Big Data, Hadoop, and Spark Basics (edX)

This course provides foundational big data practitioner knowledge and analytical skills using popular big data tools, including Hadoop and Spark. Learn and practice your big data skills hands-on. Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery, and more, to identify behaviors and preferences of prospects, clients, competitors, and others. ****

Self Paced
Self-Paced
Cluster Analysis (edX) EdX
University of Texas at Arlington,UTArlingtonX

Cluster Analysis (edX)

Learn how to conduct a cluster analysis to discover important patterns in student behavior using the popular Weka data mining toolkit. In this course, you will learn the basics of cluster analysis, one of the most popular data mining methods for the discovery of patterns in learning data, and its application in learning analytics.

No sessions available
3 Weeks
Knowledge Management and Big Data in Business (edX) EdX
The Hong Kong Polytechnic University,HKPolyUx

Knowledge Management and Big Data in Business (edX)

Learn why and how knowledge management and Big Data are vital to the new business era. The business landscape is changing so rapidly that traditional management, business and computing courses do not meet the needs for the next generation of workers in the business world. Most traditional methods are of a repetitive, rule-based nature and will be gradually replaced by Artificial Intelligence.

Self Paced
Self-Paced