Skip to content

astro-informatics/mcp-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG MCP tool

This repository contains a minimal Retrieval‑Augmented Generation (RAG) MCP server example implemented in Python. The server exposes a single MCP tool named retrieve that looks up short, directly relevant snippets from a local corpus of research papers (the papers/ directory) using embeddings and a vector store.

How it works

rag.py contains the core code for the server. It first loads the Markdown files under papers/, splits them into chunks, embeds them using a HuggingFace embedding model (default all-MiniLM-L6-v2), and stores the embeddings in a Chroma vector store with search functionality included. The server exposes a MCP tool retrieve(prompt: str) which returns the top-k closest text chunks for a keyword/topic prompt.

The retrieved text chunks can then be processed by another LLM e.g. GPT-5 in VSCode Copilot. Another option would be to have an extra LLM that processes/summarises the chunks first, then passing this summary to the orchestrator agent (e.g. Copilot), but I'm concerned this would be too lossy. The above workflow gives the orchestrator direct access to the retrieved chunks.

Using the tool (VSCode)

For VSCode, create a file under .vscode/mcp.json with the contents

{
    "servers": {
        "rag": {
            "type": "stdio",
            "command": "python",
            "args": [
                "rag.py"
            ]
        }
    }
}

We can then ask Copilot to call the tool e.g. in agent mode, "use the rag MCP tool to give an overview of simulation-based inference". If any issues with starting the server (it should start automatically, may need to restart VSCode), try https://code.visualstudio.com/docs/copilot/customization/mcp-servers.

I found it cumbersome to constantly ask Copilot to call the MCP tool. A more effective way is to set up a prompt file (https://code.visualstudio.com/docs/copilot/customization/prompt-files). I've put an example in .github/prompts/rag.prompt.md, which tells the orchestrator (whichever LLM you've selected in Copilot) how to use the MCP tool e.g. by translating the user's query into a set of keywords before vector searching. You can then prompt Copilot to do things like /rag write an introduction to simulation-based inference in intro.txt, including references.

Cursor/Claude etc should have a very similar setup process to the above, main difference I think being the syntax of the JSON file.

Notes

Dependencies/customizations:

  • langchain and langchain_community for document loading and text splitting.
    • Chunking: CharacterTextSplitter is configured with chunk_size=500 and chunk_overlap=100.
  • HuggingFaceEmbeddings from langchain_huggingface.embeddings (model: all-MiniLM-L6-v2) for generating dense embeddings. Can replace this, can also use OllamaEmbeddings.
  • Chroma vector store and search via langchain_community.vectorstores.
    • Currently the index is an in-memory Chroma instance, but should figure out how to set up a persistent index - imagine just need to change the Chroma configuration.
    • Requests k=10 top matches by default; change search_kwargs to adjust the number of retrieved snippets.
  • mcp.server.fastmcp.FastMCP to expose a simple MCP server interface.

About

MCP server for RAG from a directory of files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages