More Data Mining with Weka (FutureLearn)

Offered by University of Waikato,
More Data Mining with Weka (FutureLearn)

Learn more about practical data mining, including how to deal with large data sets. Use advanced techniques to mine your own data! This course introduces advanced data mining skills, following on from Data Mining with Weka.

Class Deals by MOOC List - Click here and see FutureLearn's Active Discounts, Deals, and Promo Codes.

You’ll process a dataset with 10 million instances. You’ll mine a 250,000-word text dataset. You’ll analyze a supermarket dataset representing 5000 shopping baskets. You’ll learn about filters for preprocessing data, selecting attributes, classification, clustering, association rules, cost-sensitive evaluation. You’ll meet learning curves and automatically optimize learning parameters. Weka originated at the University of Waikato in NZ, and Ian Witten has authored a leading book on data mining.

What topics will you cover?

  • Running large-scale data mining experiments
  • Constructing and executing knowledge flows
  • Processing very large datasets
  • Analyzing collections of textual documents
  • Mining association rules
  • Preprocessing data using a range of filters
  • Automatic methods of attribute selection
  • Clustering data
  • Taking account of different decision costs
  • Producing learning curves
  • Optimizing learning parameters in data mining

What will you achieve?

  • Compare the performance of different mining methods on a wide range of datasets
  • Demonstrate how to set up learning tasks as a knowledge flow
  • Solve data mining problems on huge datasets
  • Apply equal-width and equal-frequency binning for discretizing numeric attributes
  • Identify the advantages of supervised vs unsupervised discretization
  • Evaluate different trade-offs between error rates in 2-class classification
  • Classify documents using various techniques
  • Debate the correspondence between decision trees and decision rules
  • Explain how association rules can be generated and used
  • Discuss techniques for representing, generating, and evaluating clusters
  • Perform attribute selection by wrapping a classifier inside a cross-validation loop
  • Describe different techniques for searching through subsets of attributes
  • Develop effective sets of attributes for text classification problems
  • Explain cost-sensitive evaluation, cost-sensitive classification, and cost-sensitive learning
  • Design and evaluate multi-layer neural networks
  • Assess the volume of training data needed for mining tasks
  • Calculate optimal parameter values for a given learning system

Who is the course for?
This course is aimed at anyone who deals in data. It follows on from Data Mining with Weka, and you should have completed that first (or have otherwise acquired a rudimentary knowledge of Weka). As with the previous course, it involves no computer programming, although you need some experience with using computers for everyday tasks. High-school maths is more than enough; some elementary statistics concepts (means and variances) are assumed.

What software or tools do you need?
Before the course starts, download the free Weka software. It runs on any computer, under Windows, Linux, or Mac. It has been downloaded millions of times and is being used all around the world.
(Note: Depending on your computer and system version, you may need admin access to install Weka.)

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Chemometrics in Air Pollution (FutureLearn) FutureLearn
University of Malaya

Chemometrics in Air Pollution (FutureLearn)

This course briefly introduces the causes and effects of air pollution in Asian, chemometric models and chemometric application. This course briefly introduces the causes and effects of air pollution. Air pollution is a growing concern the we experience in our daily life. But not everyone has a clear understanding of what the sources of air pollution are. Here, you will not only learn how to identify them, but also understand the potential impact air pollution has in our present and future.

May 16th 2022
3 Weeks
Getting Started with Teaching Data Science in Schools (FutureLearn) FutureLearn
University of Glasgow

Getting Started with Teaching Data Science in Schools (FutureLearn)

Learn the basics of data science and how to introduce data science in the classroom. Learn practical ways to teach data science. Understanding how to use and interpret data will be essential for the next generation, but many schools and teachers aren’t equipped to teach basic data science to students. This course will help you introduce data science in the classroom so that your students are prepared for the future.

Sep 13th 2021
3 Weeks
Exploratory Data Analysis (Coursera) Coursera
Johns Hopkins University

Exploratory Data Analysis (Coursera)

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026
4 Weeks
Get ready for a Masters in Data Science and AI (FutureLearn) FutureLearn
Coventry University

Get ready for a Masters in Data Science and AI (FutureLearn)

Identify whether you’re ready for Master’s study, improve your data science skills, and get to grips with the basics of Python. Get a taste of life as a Data Science and AI Master's student. On this course, you’ll have the opportunity to explore the disciplines involved in a Master’s degree in Data Science and Artificial Intelligence (AI).

Apr 17th 2023
2 Weeks
Introduction to Machine Learning and AI (FutureLearn) FutureLearn
Raspberry Pi Foundation,National Centre for Computing Education

Introduction to Machine Learning and AI (FutureLearn)

Discover the fundamentals of machine learning, how it works, and learn to train your own AI using free online tools. Build your knowledge and skills in machine learning. From self-driving cars to determining someone’s age, artificial intelligence (AI) systems trained with machine learning (ML) are being used more and more. But what is AI, and what does machine learning actually involve?

Jan 2nd 2023
4 Weeks
Big Data: Statistical Inference and Machine Learning (FutureLearn) FutureLearn
Queensland University of Technology

Big Data: Statistical Inference and Machine Learning (FutureLearn)

Learn how to apply selected statistical and machine learning techniques and tools to analyse big data. Everyone has heard of big data. Many people have big data. But only some people know what to do with big data when they have it. So what’s the problem? Well, the big problem is that the data is big—the size, complexity and diversity of datasets increases every day. This means that we need new technological or methodological solutions for analysing data. There is a great demand for people with the skills and know-how to do big data analytics.

No sessions available
2 Weeks
Big Data: Data Visualisation (FutureLearn) FutureLearn
Queensland University of Technology

Big Data: Data Visualisation (FutureLearn)

Data visualisation is vital in bridging the gap between data and decisions. Discover the methods, tools and processes involved. Data visualisation is an important visual method for effective communication and analysing large datasets. Through data visualisations we are able to draw conclusions from data that sometimes are not immediately obvious, and interact with the data in an entirely different way.

No sessions Available
3 Weeks
Introduction to Machine Learning (Coursera) Coursera
Duke University

Introduction to Machine Learning (Coursera)

This course will provide you a foundational understanding of machine learning models (logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc.) as well as demonstrate how these models can solve complex problems in a variety of industries, from medical diagnostics to image recognition to text prediction.

Jun 26th 2026
5-12 Weeks
Mathematical Biostatistics Boot Camp 1 (Coursera) Coursera
Johns Hopkins University

Mathematical Biostatistics Boot Camp 1 (Coursera)

This class presents the fundamental probability and statistical concepts used in elementary data analysis. It will be taught at an introductory level for students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.

Jun 22nd 2026
4 Weeks
Statistical Inference (Coursera) Coursera
Johns Hopkins University

Statistical Inference (Coursera)

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference.

Jun 22nd 2026
4 Weeks