Mini-ETL (Extract, transform, load)

This project is inspired by Fiber.Dev

Live: mini-etl.vercel.app GitHub: mini-etl

Introduction

This is a mini-ETL project like Fiber (YC) but smaller version. It's allows users to extract data from a source, transform it, and load it into a destination. The project is built using next.js (Frontend) and NestJS (Backend).

CURRENTLY IT SUPPORT ONLY GITHUB

This is just a demonstration project and it can be extended to support other data sources like Gitlab, Bitbucket, etc.

Features

Load the data into a destination (Support PostgreSQL and S3)
GitHub OAuth Authentication: Users can log in using their GitHub accounts.
Extract data from Github (Public Repositories, ISSUES, Pull Requests)
Transform the data
Data Source Management: Users can add and manage data sources such as S3 buckets and PostgreSQL databases.
Automatic and Manual Data Synchronization: Data is synced automatically at regular intervals, with an option for manual synchronization.
Data Viewing: Users can view their synchronized data in a user-friendly interface.

Backend (NestJS)

ApiGateway Built with NestJS, it handles all incoming REST API calls and routes them to the appropriate microservices.
SyncService (Microservice) A dedicated microservice for handling data synchronization tasks.

Stack

NestJS (Node.js Framework)
PostgreSQL (Database)
PG, Prisma and Drizzle (ORM)
Docker (Containerization)
RabbitMQ (Message Broker)
DigitalOcean (Deployment)

How to Run the Project?

To run this project, you need to have node installed on your machine. You can download it from here. This project have Two parts:

Frontend (Next.js) - cd frontend
Backend (NestJS)
- ApiGateway - cd api_gateway
- SyncService - cd sync_service

Backend (NestJS - SyncService)

First we need to run the sync service. To run the sync service, you need to have RabbitMQ and postgres connection Strings. You can create a .env file in the sync_service directory as like the .env.example file.

RABBITMQ_QUEUE=""
RABBITMQ_URL=""
DATABASE_URL=""
DATABASE_URL_DRIZZLE="" // no need this

After creating the .env file, you can run the following commands:

# generate the prisma client
pnpm install
npx prisma generate && npx prisma db push
pnpm start:dev

Backend (NestJS - ApiGateway)

First we need to run the api gateway. To run the api gateway, you need to have RabbitMQ and postgres connection Strings. You can create a .env file in the api_gateway directory as like the .env.example file.

DATABASE_URL=
GITHUB_CALLBACK_URL=http://localhost:3000/auth/callback/github
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
JWT_SECRET=
RABBITMQ_QUEUE=
RABBITMQ_URL=
AUTH_FRONTEND_REDIRECT_URL=""
FRONTEND_URL=""

After creating the .env file, you can run the following commands:

pnpm install
npx prisma generate && npx prisma db push
pnpm start:dev

Frontend (Next.js)

To run the frontend, you need to have the following environment variables. You can create a .env.local file in the frontend directory as like the .env.example file.

NEXT_PUBLIC_API_URL=http://localhost:3000/api

After creating the .env.local file, you can run the following commands:

pnpm install
pnpm dev

Mini ETL Workflow

Here's how Mini-ETL works.

User Authentication:
- Users log in using GitHub OAuth.
- Upon successful login, a JWT token is generated and stored in user cookies.
Adding Data Sources:
- Users can add data sources by providing specific credentials.
- Supported destinations include S3 buckets (with optional Cloudflare R2) and PostgreSQL databases.
- The API gateway validates these credentials via the SyncMicroservice.
Data Source Validation:
- If the data source credentials are valid, the data source is marked as valid.
- Users can then connect their GitHub provider to this valid data source.
Data Synchronization:
- The SyncMicroservice automatically synchronizes data (public repositories, issues, and pull requests) from GitHub to the specified destination every ten minutes.
- Users can also manually trigger synchronization via a button in the app console.
Viewing Data:
- In the app console, users can see all connected providers and data sources.
- Synced data is displayed in a nicely formatted table.
- Users can manually trigger synchronization if needed.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
api_gateway		api_gateway
frontend		frontend
sync_service		sync_service
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api-gateway.docker-compose.yml		api-gateway.docker-compose.yml
rabbit.docker-compose.yml		rabbit.docker-compose.yml
sync-service.docker-compose.yml		sync-service.docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini-ETL (Extract, transform, load)

Introduction

Features

Backend (NestJS)

Stack

How to Run the Project?

Backend (NestJS - SyncService)

Backend (NestJS - ApiGateway)

Frontend (Next.js)

Mini ETL Workflow

About

Releases

Packages

Languages

License

monzim/mini-etl

Folders and files

Latest commit

History

Repository files navigation

Mini-ETL (Extract, transform, load)

Introduction

Features

Backend (NestJS)

Stack

How to Run the Project?

Backend (NestJS - SyncService)

Backend (NestJS - ApiGateway)

Frontend (Next.js)

Mini ETL Workflow

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages