RAG MCP tool

This repository contains a minimal Retrieval‑Augmented Generation (RAG) MCP server example implemented in Python. The server exposes a single MCP tool named retrieve that looks up short, directly relevant snippets from a local corpus of research papers (the papers/ directory) using embeddings and a vector store.

How it works

rag.py contains the core code for the server. It first loads the Markdown files under papers/, splits them into chunks, embeds them using a HuggingFace embedding model (default all-MiniLM-L6-v2), and stores the embeddings in a Chroma vector store with search functionality included. The server exposes a MCP tool retrieve(prompt: str) which returns the top-k closest text chunks for a keyword/topic prompt.

The retrieved text chunks can then be processed by another LLM e.g. GPT-5 in VSCode Copilot. Another option would be to have an extra LLM that processes/summarises the chunks first, then passing this summary to the orchestrator agent (e.g. Copilot), but I'm concerned this would be too lossy. The above workflow gives the orchestrator direct access to the retrieved chunks.

Using the tool (VSCode)

For VSCode, create a file under .vscode/mcp.json with the contents

{
    "servers": {
        "rag": {
            "type": "stdio",
            "command": "python",
            "args": [
                "rag.py"
            ]
        }
    }
}

We can then ask Copilot to call the tool e.g. in agent mode, "use the rag MCP tool to give an overview of simulation-based inference". If any issues with starting the server (it should start automatically, may need to restart VSCode), try https://code.visualstudio.com/docs/copilot/customization/mcp-servers.

I found it cumbersome to constantly ask Copilot to call the MCP tool. A more effective way is to set up a prompt file (https://code.visualstudio.com/docs/copilot/customization/prompt-files). I've put an example in .github/prompts/rag.prompt.md, which tells the orchestrator (whichever LLM you've selected in Copilot) how to use the MCP tool e.g. by translating the user's query into a set of keywords before vector searching. You can then prompt Copilot to do things like /rag write an introduction to simulation-based inference in intro.txt, including references.

Cursor/Claude etc should have a very similar setup process to the above, main difference I think being the syntax of the JSON file.

Notes

Dependencies/customizations:

langchain and langchain_community for document loading and text splitting.
- Chunking: CharacterTextSplitter is configured with chunk_size=500 and chunk_overlap=100.
HuggingFaceEmbeddings from langchain_huggingface.embeddings (model: all-MiniLM-L6-v2) for generating dense embeddings. Can replace this, can also use OllamaEmbeddings.
Chroma vector store and search via langchain_community.vectorstores.
- Currently the index is an in-memory Chroma instance, but should figure out how to set up a persistent index - imagine just need to change the Chroma configuration.
- Requests k=10 top matches by default; change search_kwargs to adjust the number of retrieved snippets.
mcp.server.fastmcp.FastMCP to expose a simple MCP server interface.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/prompts		.github/prompts
papers		papers
.gitignore		.gitignore
README.md		README.md
rag.py		rag.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG MCP tool

How it works

Using the tool (VSCode)

Notes

About

Uh oh!

Releases

Packages

Languages

astro-informatics/mcp-rag

Folders and files

Latest commit

History

Repository files navigation

RAG MCP tool

How it works

Using the tool (VSCode)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages