Skip to content

gsajko/dharmaQA

Repository files navigation

dharmaQA

This is my "Hello World!" in the realm of RAGs.

This is a project to build a basic question answering system with RAG (Retrieval-Augmented Generation). Dataset used is dataset thats important to me, which is dataset made from Rob Burbea's Dharma talks.

Unfortunatelly, that dataset isn't well-suited for RAG system - it's not factual, it has long-winded answers, that are sometimes not directly related to the question.

For this kind of dataset, fine-tuning a language model would be more appropriate.

I'll explore RAG using a different dataset, and then come back to this dataset later.

Notes

App is deployed with streamlit cloud

It retrieves context from transcripts of Rob Burbea's Dharma talks, and generates a response based on the context.

Transcripts where downloaded from https://airtable.com/appe9WAZCVxfdGDnX/shr9OS6jqmWvWTG5g/tblHlCKWIIhZzEFMk/viw3k0IfSo0Dve9ZJ in the form of a pdf files.

I used marker to convert pdf to Markdown files before ingesting them.

About

chatbot using lancedb, langchain and streamlit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages