This repository contains a full RAG application using Terraform as IaC, LangChain as framework, AWS Bedrock as LLM and Embedding Models, AWS OpenSearch as a vector database, and deployment on AWS OpenSearch endpoint.
Main Steps
- Data Ingestion: Load data to an Opensearch Index
- Embedding and Model: Bedrock Titan
- Vector Store and Endpoint: Opensearch
- IaC: Terraform
- data: original pdf document and generated json file with embeddings
Feel free to ⭐ and clone this repo 😉
The project has been structured with the following files:
terraform:
IaCtests
: unittest and mock testssrc:
scripts with the app logicrequirements.txt:
project requirementsMakefile:
command for testing, linting and formatingpyproject.toml:
linting/formatting requirements
The Python version used for this project is Python 3.11.
-
Clone the repo (or download it as a zip file):
git clone https://github.com/benitomartin/aws-bedrock-opensearch-langchain.git
-
Create the virtual environment named
main-env
using Conda with Python version 3.10:conda create -n main-env python=3.11 conda activate main-env
-
Install the requirements.txt:
pip install -r requirements.txt or make req
-
Create infrastructure from the terraform folder. This can take up to 30 minutes
conda install conda-forge::terraform terraform init terraform plan terraform apply
-
Generate embeddings from documents:
python src/generate_embeddings.py
-
Create Index:
python src/create_index.py
-
Ingest documents into index:
python src/ingest_docs_with_embeddings.py
-
Test the app to get a reply:
python src/app.py
The app contains a question. You can change it accordingly to test other scenarios.