Skip to content

Latest commit

 

History

History
357 lines (263 loc) · 10.6 KB

README.md

File metadata and controls

357 lines (263 loc) · 10.6 KB

RAG locally with LangChain RunnableSequence. HOWTOs

Repo contains scripts with overly detailed explanations as well as advanced scripts with not an excessive number of details and comments (ready to run ones). These resources aim to provide someone with concise guidance and practical examples for creating and evaluating a RAG system from scratch.

Beginners start : start_here.ipynb or start_here.py

Further methods : Advanced option to rule RAG

What is RAG?

"Baby, don't hurt me..."

RAG = Retrieval Augmented Generation

  • Retrieval - the process of searching for and extracting relevant information (retriever).
  • Retrieval Augmented - supplementing the user's query with found relevant information.
  • Retrieval Augmented Generation - generating a response to the user while taking into account additionally found relevant information.

Walkthrough example:

  1. User query: "Baby, don't hurt me..."
  2. RAG process:
    • Input Interpretation: The system receives the user's plea and detects a potential for a song lyric reference.
    • Data Retrieval: It quickly scours the attached database for relevant information, focusing on the lyrics of the song "What is Love" by Haddaway.
    • Augmentation: Next, it augments the user's query with additional context, ensuring a deep understanding of the reference.
    • Generation: Armed with the knowledge of the song's lyrics, the system crafts a witty response, perhaps something like: "No worries, user! I'll only hurt you with my endless knowledge of 90s pop hits."
  3. RAG delivery: Finally, the system delivers the response with a touch of humor, leaving the user amused and impressed by the AI's cleverness.

Why RAG?

  • Economically Efficient Deployment: The development of chatbots typically starts with basic models, which are LLM models trained on generalized data. RAG offers a more cost-effective method for incorporating new data into LLM, without finetuning whole LLM.

  • Up-to-Date Information: RAG enables to integrate rapidly changing and the latest data directly into generative models. By connecting LLM to real-time social media feeds or news websites, users receive the most current information.

  • Increased User Trust: With RAG, LLM can provide accurate information while citing sources, boosting user confidence in the generative AI. Users can verify information by accessing the source documents themselves, enhancing trust in the system.

How to read and create RAG:

  • with RunnableSequences (langchain) (if you want clean and structured approach and easy-to-follow code sequences)
  • with HuggingFace models (if you want to try some the very resent releases and cutting-edge technology)
  • localy (if you love the smell of code in the morning)

You can start with start_here.ipynb or start_here.py file and proceed with other exceptionally detailed for the begginers files and notebooks from tutorials section.

Where to find the model and how to choose one:

How to choose retrieval model (llm embedder)? --> mteb/leaderboard, tab: Retrieval or Retrieval w/Instruction

How to choose reranking model (reorder list of relevant documents)? --> mteb/leaderboard, tab: Reranking

How to choose generator model (llm for generate final answer)? --> open-llm-leaderboard/open_llm_leaderboard

Advanced option to rule RAG

Pls, refer to the other options and files listed below, to get less commented, but more advanced scripts, examples and techniques.

HOWTO Option Go-to file Outer documentation
Basic tutorials
Basic and simple default

start_here.ipynb

start_here.py

Run scripts for full RAG system
How to run HuggingFace models localy:
  • with HuggingFaceEmbeddings
  • with HuggingFacePipeline

local_rag_chain_simple.py

local_rag_retrieval_qa_class.py

remotely:
  • with HuggingFaceHub

in progress... release imminent

Hugging Face Hub documentation

How to evaluate and monitoring application with LangSmith

in progress... release imminent

Get started with LangSmith

Individual components and elements

HOWTO Option Go-to file Outer documentation
How to store and embed documents?
How to store embeddings in vectorstore (FAISS or Chroma)

default with:

  • text splitter
  • progress bar on creating vectorstore
  • dump and load from disk

get_vectorstore.py

create_vectorstore.py

FAISS

Chroma

How to embed documents default

create_llm_emb_default.py

Text embedding models

with Caching (save your time while next creating)

create_llm_emb_cached.py

Caching Embeddings

with Compressing (save RAM while store and retrieving)

in progress... release imminent

How RunnableSequence chains work?
How to retrieve documents default

local_rag_chain_simple.py

combine_simple_RAG_chains.py

with Multiple Queries Generation

local_rag_chain_multi_query.py

multiple_queries_chain.py

with chain_type :

  • stuff,

  • map_reduce,

  • refine,

  • map_rerank

in progress... release imminent

with Prompting

Hint: ask GPT to provide instruction for your RAG system and use it as prompt template

prompt_templates_retrieve.py

How to generate answer default

create_llm_gen_default.py

with Prompting

Hint: ask GPT to provide instruction for your RAG system and use it as prompt template

prompt_templates_generate.py

with GPTQQuantizer (save RAM and fast generation)

pip install optimum auto-gptq

create_llm_gen_default.py

with vLLM (If you encounter RuntimeError: probability tensor contains either inf, nan or element < 0 during GPTQQuantizer inference)

pip install vllm

create_llm_gen_vLLM.py

vLLM in LangChain

with LlamaCpp (save RAM and fast generation)

pip install llama-cpp-python

create_llm_gen_llama_cpp.py

LlamaCpp in LangChain

How to further improve your chain?
Advansed chain elements Amplification

in progress... release imminent

Christiano et al. 2018. Supervising strong learners by amplifying weak experts

Debate

in progress... release imminent

Irving et al. 2018. AI safety via debate

Advansed prompt techniques default

in progress... release imminent

Schulhoff et al. 2024. The Prompt Report: A Systematic Survey of Prompting Techniques

Further reading:

Mrs Wallbreaker or: How I Learned to Stop Worrying and Love the AGI.

About AI Risk, AI Alignment, AI Safety, AI Ethics

https://t.me/MrsWallbreaker