Introduction to Text Mining with R (Coursera)

Introduction to Text Mining with R (Coursera)

This course gives you access to the text mining techniques that are used by top data scientists from all over the world. Since most information available online in the form of text, knowing when and how to use these techniques, algorithms and models will not only give you an edge over your competition in the job market, but also allow you to see the world around you from a completely new perspective. This course covers from the very basics of programmatically working with text to advanced unsupervised learning methods.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

The course is taught using the R programming language, and starts with a brief introduction to the language itself (and RStudio, the primary IDE used for R programming), together with a short introduction to Tidyverse, a commonly used set of R libraries. Then, text preprocessing techniques and supervised learning methods will be introduced. The final part of the course covers various unsupervised learning methods that can be used for analysis of textual data.
Students are required to complete quizzes (1-2 for each of the 4 weeks) and to complete a final project using open data and the knowledge they gained during the course.

Course 3 of 4 in the Network Analytics for Business Specialization

Syllabus

WEEK 1
R and RStudio Basics
In this module, you will learn how to work with R and RStudio, how to use RMarkdown for literate programming, and how to work with data using basic R data types and structures

WEEK 2
Working with Tidyverse
In this module, you will learn how to work with data using the Tidyverse set of packages. You will learn how to use tibbles (a Tidyverse alternative to data.frames), the pipe operator from the magrittr package, and how to clean and transform data using the powerful dplyr package. You will also learn how to efficiently work with strings using the stringr package.

WEEK 3
Supervised machine learning with the bag-of-words approach
In this module, you will learn how to obtain text data from Project Gutenberg, how to prepare text data for analysis. You will also learn how to use TF-IDF to find most distinctive words in a corpus of texts and how to build, interpret and evaluate supervised learning models for textual data.

WEEK 4
Unsupervised machine learning
Is this module, you will learn how to preprocess text data using the preText package that can compare many types of preprocessing for a particular corpus. You will also learn how train, interpret and compare topic models.

WEEK 5
Final Project
This module in its entirety is dedicated to the final project of the course, in which you will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Experimentation for Improvement (Coursera) Coursera
McMaster University

Experimentation for Improvement (Coursera)

We are always using experiments to improve our lives, our community, and our work. Are you doing it efficiently? Or are you (incorrectly) changing one thing at a time and hoping for the best? In this course, you will learn how to plan efficient experiments - testing with many variables. Our goal is to find the best results using only a few experiments. A key part of the course is how to optimize a system.

Jun 22nd 2026
5-12 Weeks
Exploratory Data Analysis (Coursera) Coursera
Johns Hopkins University

Exploratory Data Analysis (Coursera)

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.

Jun 22nd 2026
4 Weeks
Big Data, Genes, and Medicine (Coursera) Coursera
The State University of New York

Big Data, Genes, and Medicine (Coursera)

This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.

Jun 22nd 2026
5-12 Weeks
The Data Scientist's Toolbox (Coursera) Coursera
Johns Hopkins University

The Data Scientist's Toolbox (Coursera)

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

Jun 22nd 2026
4 Weeks
Linear Regression and Modeling (Coursera) Coursera
Duke University

Linear Regression and Modeling (Coursera)

This course introduces simple and multiple linear regression models. These models allow you to assess the relationship between variables in a data set and a continuous response variable. Is there a relationship between the physical attractiveness of a professor and their student evaluation scores? Can we predict the test score for a child based on certain characteristics of his or her mother? In this course, you will learn the fundamental theory behind linear regression and, through data examples, learn to fit, examine, and utilize regression models to examine relationships between multiple variables, using the free statistical software R and RStudio.

Jun 22nd 2026
4 Weeks
Sequence Models (Coursera) Coursera
DeepLearning.AI

Sequence Models (Coursera)

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.

Jun 22nd 2026
3 Weeks
Machine Learning Foundations: A Case Study Approach (Coursera) Coursera
University of Washington

Machine Learning Foundations: A Case Study Approach (Coursera)

Do you have data and wonder what it can tell you? Do you need a deeper understanding of the core ways in which machine learning can improve your business? Do you want to be able to converse with specialists about anything from regression and classification to deep learning and recommender systems? In this course, you will get hands-on experience with machine learning from a series of practical case-studies.

Jun 22nd 2026
5-12 Weeks