Leveraging SQL and data analysis to uncover corruption and optimize water services at Maji Ndogo.
-
Updated
Jul 6, 2024 - Jupyter Notebook
Leveraging SQL and data analysis to uncover corruption and optimize water services at Maji Ndogo.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Class mapping to combine classes with the same shapes
Demonstrating the application of clustering algorithms with various parameter sets and feature combinations, along with necessary data preprocessing tasks in Python.
This repository contains the NYC Taxi Data Engineering Pipeline project, which aims to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. The pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation.
object flow treatment, data transformation
Stock market analytics with ML, trading simulation, automation, and more. The final project from Stock Market Analytics Zoomcamp. Hosted by Python Invest and Data Talks Club.
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
A web application developed for the data collection team at the University of the Free State. The app, named MARS combines weekly attendance register files into a single bulk/aggregated attendance file.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
DeltaFi is a flexible, code-light data transformation and normalization platform.
Culled from the UCI Machine Learning Repository, the Dry Bean Dataset (licensed under CC BY 4.0) provides valuable insights into bean classification and is a valuable resource for machine learning enthusiasts.
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
A collection of actions for working with ROS data
A collection of actions for working with PX4 data
Criação de um Data Warehouse (DW) utilizando modelagem dimensional em um esquema estrela.
🔧 Laravel + Symfony Serializer. This package provides a bridge between Laravel and Symfony Serializer.
A visual data pipeline builder with various backends
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."