This project demonstrates an end-to-end ETL process for analyzing Olympics data using Microsoft Azure services: Azure Data Factory, Azure Databricks, Azure Storage, and Azure Synapse Analytics. The dataset includes four tables: Athletes, Coaches, Teams, and Medals.
Azure Data Factory π Orchestrated the ETL pipeline by extracting data from Azure Storage, transforming it in Databricks, and loading it into Azure Synapse Analytics.
Azure Databricks π₯ Handled data transformation and analysis using PySpark. The cleaned and processed data was prepared for deeper insights, such as athlete performance trends and medal distributions.
Azure Storage π¦ Stored the raw Olympics data (CSV files) as the source for the pipeline.
Azure Synapse Analytics π Served as the data warehouse, allowing complex SQL queries for team performance and medal analysis.
The dataset includes:
- Athletes
- Coaches
- Teams
- Medals
It can be found on kaggle - βhttps://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games/dataβ
https://github.com/user-attachments/assets/7320647c-7230-4502-9a48-660f9afd45b1
This is a link to my live PowerBI dashboard. Click the image below to open it:
