This project is inspired by Fiber.Dev
Live: mini-etl.vercel.app GitHub: mini-etl
This is a mini-ETL project like Fiber (YC) but smaller version. It's allows users to extract data from a source, transform it, and load it into a destination. The project is built using next.js (Frontend) and NestJS (Backend).
- CURRENTLY IT SUPPORT ONLY GITHUB
This is just a demonstration project and it can be extended to support other data sources like Gitlab, Bitbucket, etc.
- Load the data into a destination (Support PostgreSQL and S3)
- GitHub OAuth Authentication: Users can log in using their GitHub accounts.
- Extract data from Github (Public Repositories, ISSUES, Pull Requests)
- Transform the data
- Data Source Management: Users can add and manage data sources such as S3 buckets and PostgreSQL databases.
- Automatic and Manual Data Synchronization: Data is synced automatically at regular intervals, with an option for manual synchronization.
- Data Viewing: Users can view their synchronized data in a user-friendly interface.
- ApiGateway Built with NestJS, it handles all incoming REST API calls and routes them to the appropriate microservices.
- SyncService (Microservice) A dedicated microservice for handling data synchronization tasks.
- NestJS (Node.js Framework)
- PostgreSQL (Database)
- PG, Prisma and Drizzle (ORM)
- Docker (Containerization)
- RabbitMQ (Message Broker)
- DigitalOcean (Deployment)
To run this project, you need to have node installed on your machine. You can download it from here. This project have Two parts:
-
Frontend (Next.js) -
cd frontend
-
Backend (NestJS)
- ApiGateway -
cd api_gateway
- SyncService -
cd sync_service
- ApiGateway -
First we need to run the sync service. To run the sync service, you need to have RabbitMQ and postgres connection Strings. You can create a .env
file in the sync_service
directory as like the .env.example file.
RABBITMQ_QUEUE=""
RABBITMQ_URL=""
DATABASE_URL=""
DATABASE_URL_DRIZZLE="" // no need this
After creating the .env
file, you can run the following commands:
# generate the prisma client
pnpm install
npx prisma generate && npx prisma db push
pnpm start:dev
First we need to run the api gateway. To run the api gateway, you need to have RabbitMQ and postgres connection Strings. You can create a .env
file in the api_gateway
directory as like the .env.example file.
DATABASE_URL=
GITHUB_CALLBACK_URL=http://localhost:3000/auth/callback/github
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
JWT_SECRET=
RABBITMQ_QUEUE=
RABBITMQ_URL=
AUTH_FRONTEND_REDIRECT_URL=""
FRONTEND_URL=""
After creating the .env
file, you can run the following commands:
pnpm install
npx prisma generate && npx prisma db push
pnpm start:dev
To run the frontend, you need to have the following environment variables. You can create a .env.local
file in the frontend
directory as like the .env.example file.
NEXT_PUBLIC_API_URL=http://localhost:3000/api
After creating the .env.local
file, you can run the following commands:
pnpm install
pnpm dev
Here's how Mini-ETL works.
-
User Authentication:
- Users log in using GitHub OAuth.
- Upon successful login, a JWT token is generated and stored in user cookies.
-
Adding Data Sources:
- Users can add data sources by providing specific credentials.
- Supported destinations include S3 buckets (with optional Cloudflare R2) and PostgreSQL databases.
- The API gateway validates these credentials via the SyncMicroservice.
-
Data Source Validation:
- If the data source credentials are valid, the data source is marked as valid.
- Users can then connect their GitHub provider to this valid data source.
-
Data Synchronization:
- The SyncMicroservice automatically synchronizes data (public repositories, issues, and pull requests) from GitHub to the specified destination every ten minutes.
- Users can also manually trigger synchronization via a button in the app console.
-
Viewing Data:
- In the app console, users can see all connected providers and data sources.
- Synced data is displayed in a nicely formatted table.
- Users can manually trigger synchronization if needed.