Skip to content

A jupyter notebook that creates a vector database of all my research papers I have read for my thesis to build the source knowledge of my chatbot

Notifications You must be signed in to change notification settings

rsr-droid/langchain-literature-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Large Language Models (LLMs) have a data freshness problem. Even some of the most powerful models, like GPT-4, have no idea about recent events.

The world, according to LLMs, is frozen in time. They only know the world as it appeared through their training data.

That creates problems for any use case that relies on up-to-date information or a particular dataset. For example, you may have internal company documents you’d like to interact with via an LLM.

The first challenge is adding those documents to the LLM, we could try training the LLM on these documents, but this is time-consuming and expensive. And what happens when a new document is added? Training for every new document added is beyond inefficient — it is simply impossible.

So, how do we handle this problem? We can use retrieval augmentation. This technique allows us to retrieve relevant information from an external knowledge base and give that information to our LLM.

To help our LLM, we need to give it access to relevant source knowledge. To do that, we need to create our knowledge base.

We start with a dataset. The dataset used naturally depends on the use case.

In our example, we will use a research papers I have read for my thesis and then use the OpenAI API to build a chatbot that I can use to ask questions about the research papers to improve my understanding of my thesis project.

About

A jupyter notebook that creates a vector database of all my research papers I have read for my thesis to build the source knowledge of my chatbot

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published