Big Data: capstone project (Coursera)

Big Data: capstone project (Coursera)

En este último curso de la Especialización Big Data el estudiante tendrá la oportunidad de aplicar algunas de las herramientas y métodos aprendidos en los cursos anteriores en un caso práctico. El objetivo de este Capstone Project es mostrar un ejemplo del trabajo que se realiza diariamente en el departamento de Cosmología del Port d’Informació Científica, en Barcelona.

Class Deals by MOOC List - Click here and see Coursera's Active Discounts, Deals, and Promo Codes.

Se trata de crear un clasificador para imágenes de galaxias, a partir de datos del proyecto GalaxyZoo e imágenes y datos del telescopio Sloan Digital Sky Survey. Los trabajos y ejercicios guiados llevarán al estudiante a la exploración y analisis de estos datos, hasta realizar una herramienta automática de Machine Learning.
El proceso seguido por los estudiantes en este curso se podría aplicar en cualquier otra disciplina, por ejemplo en las ciencias sociales, en un estudio de mercado o en cualquier ámbito que comporte toma de decisiones a partir de un gran volumen de datos.
Course 5 of 5 in the Big Data – Introducción al uso práctico de datos masivos Specialization.

Syllabus

WEEK 1
Introducción
La máquina Virtual
ATENCIÓN: Si ya te instalaste la máquina virtual en el curso anterior de la Especialización no es necesario que vuelvas a hacerlo. En caso contrario, en este apartado te explicamos cómo descargar e instalar dicha máquina virtual en tu ordenador. La MV-Cloudera requiere disponer de un equipo con las siguientes características: (1) máquina de 64 bits, (2) mínimo 6G de memoria (recomendable 8G), y (3) 20G disponibles en disco. Ten en cuenta que bajar e instalar la máquina virtual te llevará tiempo dado el tamaño y complejidad de la misma
Exploración de datos
En esta semana vamos a conocer el proyecto y a hacer una primera exploración de algunos de los datos con los que iremos trabajando. Nos familiarizamos con el contenido de estos ficheros y haremos el trabajo preliminar para poderlo luego aplicar a grandes volumenes de datos.

WEEK 2
Modelo de Datos
En esta semana aprenderemos a cargar los datos en Hive, construir su modelo de datos y entender la tarea de clasificar una galaxia según su forma.

WEEK 3
Clasificación
Esta semana vamos a normalizar un modelo de datos, estudiaremos con profundidad los votos que nos han proporcionado los usuarios y generaremos la información necesaria para construir un clasificador automàtico.

WEEK 4
Machine Learning
Esta semana introduciremos el dataset de imágenes galácticas y prepararemos dos algoritmos de Inteligencia Artificial para la clasificación automática de galaxias a partir de una imagen.

WEEK 5
Trabajo Final
Es el momento de preparar el informe final con el trabajo realizado hasta ahora. Necesitaréis tener a mano los trabajos realizados las semanas anteriores.

Go to Class
MOOC List is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Related Courses

Machine Learning for All (Coursera) Coursera
University of London

Machine Learning for All (Coursera)

Machine Learning, often called Artificial Intelligence or AI, is one of the most exciting areas of technology at the moment. We see daily news stories that herald new breakthroughs in facial recognition technology, self driving cars or computers that can have a conversation just like a real person. Machine Learning technology is set to revolutionise almost any area of human life and work, and so will affect all our lives, and so you are likely to want to find out more about it.

Jun 29th 2026
4 Weeks
Google Cloud Platform Fundamentals: Core Infrastructure (Coursera) Coursera
Google

Google Cloud Platform Fundamentals: Core Infrastructure (Coursera)

This course introduces you to important concepts and terminology for working with Google Cloud Platform (GCP). You learn about, and compare, many of the computing and storage services available in Google Cloud Platform, including Google App Engine, Google Compute Engine, Google Kubernetes Engine, Google Cloud Storage, Google Cloud SQL, and BigQuery. You learn about important resource and policy management tools, such as the Google Cloud Resource Manager hierarchy and Google Cloud Identity and Access Management. Hands-on labs give you foundational skills for working with GCP.

Jun 29th 2026
1 Week
Guided Tour of Machine Learning in Finance (Coursera) Coursera
New York University

Guided Tour of Machine Learning in Finance (Coursera)

This course aims at providing an introductory and broad overview of the field of ML with the focus on applications on Finance. Supervised Machine Learning methods are used in the capstone project to predict bank closures. Simultaneously, while this course can be taken as a separate course, it serves as a preview of topics that are covered in more details in subsequent modules of the specialization Machine Learning and Reinforcement Learning in Finance.

Jun 29th 2026
4 Weeks
Data Science for Business Innovation (Coursera) Coursera
Politecnico di Milano,EIT Digital

Data Science for Business Innovation (Coursera)

The course is a compendium of the must-have expertise in data science for executive and middle-management to foster data-driven innovation. It consists of introductory lectures spanning big data, machine learning, data valorization and communication. Topics cover the essential concepts and intuitions on data needs, data analysis, machine learning methods, respective pros and cons, and practical applicability issues.

Jun 29th 2026
4 Weeks
Machine Learning for Accounting with Python (Coursera) Coursera
University of Illinois at Urbana-Champaign

Machine Learning for Accounting with Python (Coursera)

This course, Machine Learning for Accounting with Python, introduces machine learning algorithms (models) and their applications in accounting problems. It covers classification, regression, clustering, text analysis, time series analysis. It also discusses model evaluation and model optimization. This course provides an entry point for students to be able to apply proper machine learning models on business related datasets with Python to solve various problems.

Jun 29th 2026
5-12 Weeks
Advanced Algorithms and Complexity (Coursera) Coursera
University of California, San Diego,Higher School of Economics - HSE University

Advanced Algorithms and Complexity (Coursera)

You've learned the basic algorithms now and are ready to step into the area of more complex problems and algorithms to solve them. Advanced algorithms build upon basic ones and use new ideas. We will start with networks flows which are used in more typical applications such as optimal matchings, finding disjoint paths and flight scheduling as well as more surprising ones like image segmentation in computer vision.

Jun 29th 2026
5-12 Weeks
Prediction and Control with Function Approximation (Coursera) Coursera
University of Alberta,Alberta Machine Intelligence Institute

Prediction and Control with Function Approximation (Coursera)

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that estimating value functions can be cast as a supervised learning problem---function approximation---allowing you to build agents that carefully balance generalization and discrimination in order to maximize reward.

Jun 29th 2026
4 Weeks
Developing AI Applications on Azure (Coursera) Coursera
LearnQuest

Developing AI Applications on Azure (Coursera)

This course introduces the concepts of Artificial Intelligence and Machine learning. We'll discuss machine learning types and tasks, and machine learning algorithms. You'll explore Python as a popular programming language for machine learning solutions, including using some scientific ecosystem packages which will help you implement machine learning.

Jun 29th 2026
5-12 Weeks
Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera) Coursera
Icahn School of Medicine at Mount Sinai

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center (Coursera)

In this course we briefly introduce the DCIC and the various Centers that collect data for LINCS. We then cover metadata and how metadata is linked to ontologies. We then present data processing and normalization methods to clean and harmonize LINCS data. This follow discussions about how data is served as RESTful APIs. Most importantly, the course covers computational methods including: data clustering, gene-set enrichment analysis, interactive data visualization, and supervised learning. Finally, we introduce crowdsourcing/citizen-science projects where students can work together in teams to extract expression signatures from public databases and then query such collections of signatures against LINCS data for predicting small molecules as potential therapeutics.

Jun 29th 2026
5-12 Weeks
Practical Machine Learning (Coursera) Coursera
Johns Hopkins University

Practical Machine Learning (Coursera)

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates.

Jun 29th 2026
4 Weeks
Big Data, Artificial Intelligence, and Ethics (Coursera) Coursera
University of California, Davis

Big Data, Artificial Intelligence, and Ethics (Coursera)

This course gives you context and first-hand experience with the two major catalyzers of the computational science revolution: big data and artificial intelligence. With more than 99% of all mediated information in digital format and with 98% of the world population using digital technology, humanity produces an impressive digital footprint.

Jun 29th 2026
4 Weeks
Accounting Analytics (Coursera) Coursera
University of Pennsylvania

Accounting Analytics (Coursera)

Accounting Analytics explores how financial statement data and non-financial metrics can be linked to financial performance. In this course, taught by Wharton’s acclaimed accounting professors, you’ll learn how data is used to assess what drives financial performance and to forecast future financial scenarios. While many accounting and financial organizations deliver data, accounting analytics deploys that data to deliver insight, and this course will explore the many areas in which accounting data provides insight into other business areas including consumer behavior predictions, corporate strategy, risk management, optimization, and more.

Jun 29th 2026
4 Weeks