EdX

Computational Thinking and Big Data (edX)

Computational Thinking and Big Data (edX)

Learn the core concepts of computational thinking and how to collect, clean and consolidate large-scale datasets. Computational thinking is an invaluable skill that can be used across every industry, as it allows you to formulate a problem and express a solution in such a way that a computer can effectively carry it out.

Class Deals by MOOC List - Click here and see EdX's Active Discounts, Deals, and Promo Codes.

In this course, part of the Big Data MicroMasters program, you will learn how to apply computational thinking in data science. You will learn core computational thinking concepts including decomposition, pattern recognition, abstraction, and algorithmic thinking.
You will also learn about data representation and analysis and the processes of cleaning, presenting, and visualizing data. You will develop skills in data-driven problem design and algorithms for big data.
The course will also explain mathematical representations, probabilistic and statistical models, dimension reduction and Bayesian models.
You will use tools such as R and Java data processing libraries in associated language environments.
This course is part of the Big Data MicroMasters program.

What you'll learn

  • Understand and apply advanced core computational thinking concepts to large-scale data sets
  • Use industry-level tools for data preparation and visualisation, such as R and Java
  • Apply methods for data preparation to large data sets
  • Understand mathematical and statistical techniques for attracting information from large data sets and illuminating relationships between data sets

Course Syllabus

Section 1: Data in R
Identify the components of RStudio; Identify the subjects and types of variables in R; Summarise and visualise univariate data, including histograms and box plots.

Section 2: Visualising relationships
Produce plots in ggplot2 in R to illustrate the relationship between pairs of variables; Understand which type of plot to use for different variables; Identify methods to deal with large datasets.

Section 3: Manipulating and joining data
Organise different data types, including strings, dates and times; Filter subjects in a data frame, select individual variables, group data by variables and calculate summary statistics; Join separate dataframes into a single dataframe; Learn how to implement these methods in mapReduce.

Section 4: Transforming data and dimension reduction
Transform data so that it is more appropriate for modelling; Use various methods to transform variables, including q-q plots and Box-Cox transformation, so that they are distributed normally Reduce the number of variables using PCA; Learn how to implement these techniques into modelling data with linear models.

Section 5: Summarising data
Estimate model parameters, both point and interval estimates; Differentiate between the statistical concepts or parameters and statistics; Use statistical summaries to infer population characteristics; Utilise strings; Learn about k-mers in genomics and their relationship to perfect hash functions as an example of text manipulation.

Section 6: Introduction to Java
Use complex data structures; Implement your own data structures to organise data; Explain the differences between classes and objects; Motivate object-orientation.

Section 7: Graphs
Encode directed and undirected graphs in different data structures, such as matrices and adjacency lists; Execute basic algorithms, such as depth-first search and breadth-first search.

Section 8: Probability
Determine the probability of events occurring when the probability distribution is discrete; How to approximate.

Section 9: Hashing
Apply hash functions on basic data structures in Java; Implement your own hash functions and execute, these as well as built-in ones; Differentiate good from bad hash functions based on the concept of collisions.

Section 10: Bringing it all together
Understand the context of big data in programming.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Software Construction: Object-Oriented Design (edX) EdX
The University of British Columbia,UBCx

Software Construction: Object-Oriented Design (edX)

Learn how to design large software systems that solve real-world problems using object-oriented design techniques. By the end of the course, you will have a solid foundation in Java and Object-Oriented Design, as well as many software development concepts that can be applied to any language.

Self Paced
Self-Paced
Probability: Basic Concepts & Discrete Random Variables (edX) EdX
Purdue University,PurdueX

Probability: Basic Concepts & Discrete Random Variables (edX)

Learn fundamental concepts of mathematical probability to prepare for a career in the growing field of information and data science. Our capacity to collect and store data has exponentially increased, but deriving information from data from a scientific perspective requires a foundational knowledge of probability. Are you interested in a career in the emerging data science field, or as an actuarial scientist? Or want better to understand statistical theory and mathematical modeling?

No sessions available
5-12 Weeks
Python for Data Science (edX) EdX
University of California, San Diego,UC San DiegoX

Python for Data Science (edX)

Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets. In the information age, data is all around us. Within this data are answers to compelling questions across many societal domains (politics, business, science, etc.). But if you had access to a large dataset, would you be able to find the answers you seek?

Self Paced
Self-Paced
Data Science Ethics (edX) EdX
University of Michigan,MichiganX

Data Science Ethics (edX)

Learn how to think through the ethics surrounding privacy, data sharing, and algorithmic decision-making. As patients, we care about the privacy of our medical record; but as patients, we also wish to benefit from the analysis of data in medical records. As citizens, we want a fair trial before being punished for a crime; but as citizens, we want to stop terrorists before they attack us. As decision-makers, we value the advice we get from data-driven algorithms; but as decision-makers, we also worry about unintended bias.

Self Paced
Self-Paced
Enabling Technologies for Data Science and Analytics: The Internet of Things (edX) EdX
Columbia University,ColumbiaX

Enabling Technologies for Data Science and Analytics: The Internet of Things (edX)

Discover the relationship between Big Data and the Internet of Things (IoT). The Internet of Things is rapidly growing. It is predicted that more than 25 billion devices will be connected by 2020. In this data science course, you will learn about the major components of the Internet of Things and how data is acquired from sensors. You will also examine ways of analyzing event data, sentiment analysis, facial recognition software and how data generated from devices can be used to make decisions.

Self Paced
Self-Paced
Big Data and Education (edX) EdX
University of Pennsylvania,PennX

Big Data and Education (edX)

Learn the methods and strategies for using large-scale educational data to improve education and make discoveries about learning. Online and software-based learning tools have been used increasingly in education. This movement has resulted in an explosion of data, which can now be used to improve educational effectiveness and support basic research on learning.

Self Paced
Self-Paced
Big Data Fundamentals (edX) EdX
University of Adelaide,AdelaideX

Big Data Fundamentals (edX)

Learn how big data is driving organisational change and essential analytical tools and techniques, including data mining and PageRank algorithms. Organizations now have access to massive amounts of data and it’s influencing the way they operate. They are realizing in order to be successful they must leverage their data to make effective business decisions.

Self Paced
Self-Paced
Introduction to Computational Thinking and Data Science (edX) EdX
MIT,MITx

Introduction to Computational Thinking and Data Science (edX)

This course is an introduction to using computation to understand real-world phenomena. This course will teach you how to use computation to accomplish a variety of goals and provides you with a brief introduction to a variety of topics in computational problem solving. This course is aimed at students with some prior programming experience in Python and a rudimentary knowledge of computational complexity.

Mar 20th 2024
5-12 Weeks
Introduction to Apache Spark (edX) EdX
University of California, Berkeley

Introduction to Apache Spark (edX)

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals. Spark is rapidly becoming the compute engine of choice for big data. Spark programs are more concise and often run 10-100 times faster than Hadoop MapReduce jobs. As companies realize this, Spark developers are becoming increasingly valued.

Not Available
Course Not Available