Data Collection and Integration (Coursera)

Data Collection and Integration (Coursera)

The "Data Collection and Integration" course provides students with comprehensive techniques for gathering data from diverse sources, including files, relational databases, web pages, and APIs. Participants will gain practical experience in collecting and integrating data for further processing and analysis. The course emphasizes the utilization of appropriate tools and packages, such as Pandas, Beautiful Soup, and SQL, to effectively handle real-life datasets and address data integration challenges.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

What you'll learn

  • How to utilize Python and Python packages to collect data from various sources
  • How to integrate data collected from various sources to a unified dataset for further processing and analysis

This course is part of the Data Wrangling with Python Specialization.

Syllabus

Collect Data From Files
Module 1
The "Collect Data from Files" week focuses on equipping you with the necessary skills to handle various file formats, such as txt, csv, json, xml, html, and more, for effective data collection. You will learn how to read, parse, and extract relevant data from different file types, enabling you to gather valuable information from diverse sources.

Collect Data From Web
Module 2
The "Collect Data from Web" week focuses on empowering you with the skills to extract data from various webpage formats using Python libraries like requests and Beautiful Soup. You will learn how to access web pages, retrieve HTML content, and parse the data to collect relevant information effectively.

Collect Data From Database
Module 3
The "Collect Data from Database" week focuses on equipping you with the skills to interact with various SQL-like databases using Python packages. You will learn how to connect to databases, execute queries, and retrieve data from different database systems, enabling you to collect and utilize data efficiently.

Collect Data From APIs
Module 4
The "Collect Data from APIs" week focuses on enabling you to interact with various websites that provide Application Programming Interfaces (APIs). You will learn how to access APIs, retrieve data in structured formats (e.g., JSON or XML), and utilize Python to process and extract valuable information from API responses.

Data Integration
Module 5
The "Data Integration" week focuses on the techniques and methodologies for integrating data collected from various sources. You will learn how to combine and merge datasets, handle data inconsistencies, and create a unified dataset for further analysis and decision-making.

Case Study
Module 6
The "Case Study" week offers you the opportunity to apply the knowledge you have learned throughout the course in a practical and comprehensive case study. You will engage in data collection from various sources, including files, SQL-like databases, and web APIs, and then integrate the collected data into a unified dataset for further analysis. This week serves as a culminating activity, allowing you to demonstrate your skills in data collection, integration, and preparation for analysis.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Applied Text Mining in Python (Coursera) Coursera
University of Michigan

Applied Text Mining in Python (Coursera)

This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes. The third week will apply basic natural language processing methods to text, and demonstrate how text classification is accomplished. The final week will explore more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling).

Jun 22nd 2026
4 Weeks
Deploying Machine Learning Models (Coursera) Coursera
University of California, San Diego

Deploying Machine Learning Models (Coursera)

In this course we will learn about Recommender Systems (which we will study for the Capstone project), and also look at deployment issues for data products. By the end of this course, you should be able to implement a working recommender system (e.g. to predict ratings, or generate lists of related products), and you should understand the tools and techniques required to deploy such a working system on real-world, large-scale datasets.

Jun 22nd 2026
4 Weeks
Data Manipulation at Scale: Systems and Algorithms (Coursera) Coursera
University of Washington

Data Manipulation at Scale: Systems and Algorithms (Coursera)

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

Jun 22nd 2026
4 Weeks
Accounting Data Analytics with Python (Coursera) Coursera
University of Illinois at Urbana-Champaign

Accounting Data Analytics with Python (Coursera)

This course focuses on developing Python skills for assembling business data. It will cover some of the same material from Introduction to Accounting Data Analytics and Visualization, but in a more general purpose programming environment (Jupyter Notebook for Python), rather than in Excel and the Visual Basic Editor. These concepts are taught within the context of one or more accounting data domains (e.g., financial statement data from EDGAR, stock data, loan data, point-of-sale data).

Jun 22nd 2026
5-12 Weeks
Six Sigma Advanced Define and Measure Phases (Coursera) Coursera
University System of Georgia

Six Sigma Advanced Define and Measure Phases (Coursera)

This course is for you if you are looking to dive deeper into Six Sigma or strengthen and expand your knowledge of the basic components of green belt level of Six Sigma and Lean. Six Sigma skills are widely sought by employers both nationally and internationally. These skills have been proven to help improve business processes and performance. This course will take you deeper into the principles and tools associated with the "Design" and "Measure" phases of the DMAIC structure of Six Sigma.

Jun 22nd 2026
5-12 Weeks
Surveillance Systems: The Building Blocks (Coursera) Coursera
Johns Hopkins University

Surveillance Systems: The Building Blocks (Coursera)

Epidemiology is often described as the cornerstone science and public health and public health surveillance is a cornerstone of epidemiology. This course will help you build your technical awareness and skills for working with a variety of surveillance systems. Along the way, we'll focus on system objectives, data reporting, the core surveillance attributes, and performance assessment.

Jun 22nd 2026
4 Weeks
Framework for Data Collection and Analysis (Coursera) Coursera
University of Maryland, College Park

Framework for Data Collection and Analysis (Coursera)

This course will provide you with an overview over existing data products and a good understanding of the data collection landscape. With the help of various examples you will learn how to identify which data sources likely matches your research question, how to turn your research question into measurable pieces, and how to think about an analysis plan.

Jun 22nd 2026
4 Weeks
Building Scalable Java Microservices with Spring Boot and Spring Cloud (Coursera) Coursera
Google Cloud

Building Scalable Java Microservices with Spring Boot and Spring Cloud (Coursera)

"Microservices" describes a software design pattern in which an application is a collection of loosely coupled services. These services are fine-grained, and can be individually maintained and scaled. The microservices architecture is ideal for the public cloud, with its focus on elastic scaling with on-demand resources. In this course, you will learn how to build Java applications using Spring Boot and Spring Cloud on Google Cloud Platform.

Jun 23rd 2026
2 Weeks
Six Sigma Tools for Define and Measure (Coursera) Coursera
University System of Georgia

Six Sigma Tools for Define and Measure (Coursera)

This course is for you if you are looking to learn more about Six Sigma or refresh your knowledge of the basic components of Six Sigma and Lean. Six Sigma skills are widely sought by employers both nationally and internationally. These skills have been proven to help improve business processes and performance. This course will cover the Define phase and introduce you to the Measure phase of the DMAIC (Define, Measure, Analyze, Improve, and Control) process. You will learn about Six Sigma project development and implementation, you will become familiar with project management tools, you will be introduced to statistics and understand its significance to Six Sigma, and finally you will learn about data collection and its importance to an organization.

Jun 22nd 2026
4 Weeks
Fitting Statistical Models to Data with Python (Coursera) Coursera
University of Michigan

Fitting Statistical Models to Data with Python (Coursera)

In this course, we will expand our exploration of statistical inference techniques by focusing on the science and art of fitting statistical models to data. We will build on the concepts presented in the Statistical Inference course (Course 2) to emphasize the importance of connecting research questions to our data analysis methods. We will also focus on various modeling objectives, including making inference about relationships between variables and generating predictions for future observations.

Jun 22nd 2026
4 Weeks