Coursera

Fundamentals of Scalable Data Science (Coursera)

Offered by IBM,

Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models. In this course we teach you the fundamentals of Apache Spark using python and pyspark. We'll introduce Apache Spark in the first two weeks and learn how to apply it to compute basic exploratory and data pre-processing tasks in the last two weeks.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Through this exercise you'll also be introduced to the most fundamental statistical measures and data visualization technologies.
This gives you enough knowledge to take over the role of a data engineer in any modern environment. But it gives you also the basis for advancing your career towards data science.
After completing this course, you will be able to:
• Describe how basic statistical measures, are used to reveal patterns within the data
• Recognize data characteristics, patterns, trends, deviations or inconsistencies, and potential outliers.
• Identify useful techniques for working with big data such as dimension reduction and feature selection methods
• Use advanced tools and charting libraries to:
o improve efficiency of analysis of big-data with partitioning and parallel analysis
o Visualize the data in an number of 2D and 3D formats (Box Plot, Run Chart, Scatter Plot, Pareto Chart, and Multidimensional Scaling)
For successful completion of the course, the following prerequisites are recommended:
• Basic programming skills in python
• Basic math
• Basic SQL (you can get it easily from Databases and SQL for Data Science if needed)
In order to complete this course, the following technologies will be used:
(These technologies are introduced in the course as necessary so no previous knowledge is required.)
• Jupyter notebooks (brought to you by IBM Watson Studio for free)
• ApacheSpark (brought to you by IBM Watson Studio for free)
• Python
Course 1 of 4 in the Advanced Data Science with IBM Specialization

Syllabus

WEEK 1: Introduction the course and grading environment
WEEK 2: Tools that support BigData solutions
WEEK 3: Scaling Math for Statistics on Apache Spark
WEEK 4: Data Visualization of Big Data

Go to Class

MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Coursera

Johns Hopkins University

Algorithms for DNA Sequencing (Coursera)

Statistics & Data Analysis Data Science

We will learn computational methods -- algorithms and data structures -- for analyzing DNA sequencing data. We will learn a little about DNA, genomics, and how DNA sequencing is used. We will use Python to implement key algorithms and data structures and to analyze real genomes and DNA sequencing datasets.

Jun 22nd 2026

4 Weeks

Python Algorithms DNA

Coursera

Northwestern University

The Importance of Listening (Coursera)

Marketing & Communication Business

In this second MOOC in the Social Marketing Specialization - "The Importance of Listening" - you will go deep into the Big Data of social and gain a more complete picture of what can be learned from interactions on social sites. You will be amazed at just how much information can be extracted from a single post, picture, or video.

Jun 22nd 2026

4 Weeks

Marketing Big Data Social Media

Coursera

Johns Hopkins University

Statistical Inference (Coursera)

Statistics & Data Analysis Data Science

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 22nd 2026

4 Weeks

Statistics Probability Data Analysis

Coursera

INSEAD

Web3 and Blockchain Transformations in Global Supply Chains (Coursera)

Business

The global supply chain is a $50 trillion industry and is the foundation of our global economy. While information technology has improved the flow of goods globally over the last few decades, as the COVID-19 crisis revealed there is still critical work to do. Today’s supply chains are complex, with parties conducting their transactions through a Byzantine network of computer systems with disparate applications like e-mail, phone, and fax.

Jun 28th 2026

5-12 Weeks

Supply Chain IoT Internet of Things

Coursera

University of Colorado System

Relational Database Support for Data Warehouses (Coursera)

Statistics & Data Analysis Data Science

Relational Database Support for Data Warehouses is the third course in the Data Warehousing for Business Intelligence specialization. In this course, you'll use analytical elements of SQL for answering business intelligence questions. You'll learn features of relational database management systems for managing summary data commonly used in business intelligence reporting. Because of the importance and difficulty of managing implementations of data warehouses, we'll also delve into storage architectures, scalable parallel processing, data governance, and big data impacts. In the assignments in this course, you can use either Oracle or PostgreSQL.

Jun 22nd 2026

5-12 Weeks

Databases SQL Big Data

Coursera

University of Colorado System

Business Intelligence Concepts, Tools, and Applications (Coursera)

Statistics & Data Analysis Data Science

This is the fourth course in the Data Warehouse for Business Intelligence specialization. Ideally, the courses should be taken in sequence. In this course, you will gain the knowledge and skills for using data warehouses for business intelligence purposes and for working as a business intelligence developer. You’ll have the opportunity to work with large data sets in a data warehouse environment and will learn the use of MicroStrategy's Online Analytical Processing (OLAP) and Visualization capabilities to create visualizations and dashboards.

Jun 22nd 2026

5-12 Weeks

Data Analysis Decision Making Business Intelligence

Coursera

Johns Hopkins University

Mathematical Biostatistics Boot Camp 1 (Coursera)

Sci: Mathematics

This class presents the fundamental probability and statistical concepts used in elementary data analysis. It will be taught at an introductory level for students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.

Jun 22nd 2026

4 Weeks

Math Statistics Probability

Coursera

University of California, Irvine

The Arduino Platform and C Programming (Coursera)

CS: Software Engineering

The Arduino is an open-source computer hardware/software platform for building digital devices and interactive objects that can sense and control the physical world around them. In this class you will learn how the Arduino platform works in terms of the physical board and libraries and the IDE (integrated development environment). You will also learn about shields, which are smaller boards that plug into the main Arduino board to perform other functions such as sensing light, heat, GPS tracking, or providing a user interface display. The course will also cover programming the Arduino using C code and accessing the pins on the board via the software to control external devices.

Jun 22nd 2026

4 Weeks

Programming Debugging Arduino

Coursera

University of Minnesota

Interprofessional Healthcare Informatics (Coursera)

Statistics & Data Analysis Data Science

Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. We will be incorporating technology-enabled educational innovations to bring the subject matter to life. Over the 10 modules, we will create a vital online learning community and a working healthcare informatics network.

Jun 22nd 2026

5-12 Weeks

Healthcare Informatics Telehealth

Coursera

University of California, San Diego

Graph Analytics for Big Data (Coursera)

Statistics & Data Analysis Data Science

Want to understand your data network structure and how it changes under different conditions? Curious to know how to identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model, store, retrieve and analyze graph-structured data.

Jun 22nd 2026

5-12 Weeks

Big Data Data Analysis Graphs

Coursera

Duke University

Introduction to Machine Learning (Coursera)

Data Science

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 26th 2026

5-12 Weeks

ML NLP Machine Learning

Coursera

Johns Hopkins University

Exploratory Data Analysis (Coursera)

Statistics & Data Analysis Data Science

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026

4 Weeks

Statistics Data Analysis Data Science