Data Wrangling with MongoDB (Udacity)

Data Wrangling with MongoDB (Udacity)

In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications. Some data scientists spend the bulk of their time doing this! Students will learn how to gather and extract data from widely used data formats. They will learn how to assess the quality of data and explore best practices for data cleaning. We will also introduce students to MongoDB, covering the essentials of storing data and the MongoDB query language together with exploratory analysis using the MongoDB aggregation framework.

Class Deals by MOOC List - Click here and see Udacity's Active Discounts, Deals, and Promo Codes.

This is a great course for those interested in entry-level data science positions as well as current business/data analysts looking to add big data to their repertoire, and managers working with data professionals or looking to leverage big data.
This course is also a part of our Data Analyst Nanodegree.

What You Will Learn

Lesson 1
Data Extraction Fundamentals

  • Assessing the Quality of Data
  • Intro to Tabular Formats
  • Parsing CSV

Lesson 2
Data in More Complex Formats

  • XML Design Principles
  • Parsing XML
  • Web Scraping

Lesson 3
Data Quality

  • Sources of Dirty Data
  • A Blueprint for Cleaning
  • Auditing Data

Lesson 4
Working with MongoDB

  • Data Modelling in MongoDB
  • Introduction to PyMongo
  • Field Queries

Lesson 5
Analyzing Data

  • Examples of Aggregation Framework
  • The Aggregation Pipeline
  • Aggregation Operators: $match
  • $project
  • $unwind
  • $group

Lesson 6
Case Study - OpenStreetMap Data

  • Using iterative parsing for large datafiles
  • Open Street Map XML Overview
  • Exercises around OpenStreetMap data

Prerequisites and Requirements
The ideal student should have the following skills:
Programming experience in Python or a willingness to read a little documentation to understand examples and exercises throughout the course.
The ability to perform rudimentary system administration on Windows or Unix
At least some experience using a unix shell or Windows PowerShell will be helpful, but is not required. No prior experience with databases is needed.

Why Take This Course
At the end of the class, students should be able to:

  • Programmatically extract data stored in common formats such as csv, Microsoft Excel, JSON, XML and scrape web sites to parse data from HTML.
  • Audit data for quality (validity, accuracy, completeness, consistency, and uniformity) and critically assess options for cleaning data in different contexts.
  • Store, retrieve, and analyze data using MongoDB.

This course concludes with a final project where students incorporate what they have learned to address a real-world data analysis problem.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

C++ For Programmers (Udacity) Udacity
Udacity

C++ For Programmers (Udacity)

Learn features and constructs for C++. C++ for Programmers is designed for students who are familiar with a programming language and wish to learn C++. This course focuses on 'how' as opposed to 'what'. For example, in the lesson on functions, we do not teach what a function is, but rather how to create a function in C++. The lessons are taught by several different instructors who have used C++ in their professional careers, so students get to experience different perspectives.

Self Paced
Self-Paced
Big Data Analytics in Healthcare (Udacity) Udacity
Georgia Institute of Technology,Udacity

Big Data Analytics in Healthcare (Udacity)

Data science plays an important role in many industries. In facing massive amount of heterogeneous data, scalable machine learning and data mining algorithms and systems become extremely important for data scientists. The growth of volume, complexity and speed in data drives the need for scalable data analytic algorithms and systems. In this course, we study such algorithms and systems in the context of healthcare applications.

Self Paced
Self-Paced
Intro to JavaScript (Udacity) Udacity
Udacity

Intro to JavaScript (Udacity)

Learn the fundamentals of JavaScript, the most popular programming language in web development. JavaScript is the most popular programming language for both front-end and back-end web development. Applications for JavaScript span from interactive websites to the Internet of Things, making it a great choice for beginners and experienced developers looking to learn a new programming language.

Self Paced
Self-Paced
Swift for Developers (Udacity) Udacity
Udacity

Swift for Developers (Udacity)

Your Next Programming Language. This course offers a quick practical introduction to Swift basics, including types, variables, constants, and functions. It combines syntax exercises with hands-on iOS development in Xcode. By the end of the course students will build their first iOS app, an app that creates and displays song lyrics customized to user input.

Self Paced
Self-Paced
Model Building and Validation (Udacity) Udacity
Udacity

Model Building and Validation (Udacity)

Advanced Techniques for Analyzing Data. This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

Self Paced
Self-Paced
HTML5 Canvas (Udacity) Udacity
Udacity

HTML5 Canvas (Udacity)

From Pixels to Animation! Canvas is an HTML5 element which gives you drawable surface inside your web pages you can control with JavaScript. Powerful enough to use for compositing images and even creating games. In this course, through several sample projects, you’ll learn how to use the canvas; how to make compositions using shapes, images, and text; how to create effects and filters on images and how to create animations.

Self Paced
Self-Paced
Data Science Interview Prep (Udacity) Udacity
Udacity

Data Science Interview Prep (Udacity)

Confidently take on the tech interview. Data science job interviews can be daunting. Technical interviewers often ask you to design an experiment or model. You may need to solve problems using Python and SQL. You will likely need to show how you connect data skills to business decisions and strategy. In this course, you'll review the common questions asked in data science, data analyst, and machine learning interviews.

Self Paced
Self-Paced