Spring AI RAG Demo

RAG implementation using Spring AI 1.1.0, PGVector, and OpenAI. Includes conversation memory, tool calling, streaming, and an OpenAI-compatible API for Open WebUI integration.

Stack

Spring Boot 3.5.8 / Spring AI 1.1.0
OpenAI GPT-4o-mini
PostgreSQL 16 + PGVector
Open WebUI (optional)

Quick Start

# Set your OpenAI API key
export OPENAI_API_KEY=sk-your-key

# Start PostgreSQL + PGVector
docker compose up -d

# Run the app
./mvnw spring-boot:run

App runs on localhost:8080.

Testing the API

Health Check

curl http://localhost:8080/actuator/health

Basic Chat (with RAG)

curl -X POST http://localhost:8080/api/v2/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is Spring AI?"}'

With conversation memory:

curl -X POST http://localhost:8080/api/v2/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "My name is Alice", "conversationId": "session-123"}'

curl -X POST http://localhost:8080/api/v2/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "What is my name?", "conversationId": "session-123"}'

Streaming Chat

curl -N http://localhost:8080/api/v2/chat/stream?message=Explain+RAG+in+3+sentences

Upload Documents

# Upload a file to the vector store
curl -X POST http://localhost:8080/api/v2/documents \
  -F "file=@/path/to/document.pdf"

# Clear all documents
curl -X DELETE http://localhost:8080/api/v2/documents

OpenAI-Compatible API

Two models available:

spring-ai-chat — General conversation (no RAG)
spring-ai-rag — Uses uploaded documents for context, includes source citations

# List available models
curl http://localhost:8080/v1/models

# General chat (no RAG)
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer my-session" \
  -d '{
    "model": "spring-ai-chat",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'

# RAG-enabled chat (uses documents, shows sources)
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer my-session" \
  -d '{
    "model": "spring-ai-rag",
    "messages": [{"role": "user", "content": "What is Spring AI?"}]
  }'
# Response includes: "---\n**Sources:** faq.txt, spring-ai-reference.md"

# Streaming response
curl -N -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "spring-ai-chat",
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream": true
  }'

Memory persists per Authorization header — same header = same conversation.

Open WebUI Integration

# Start Open WebUI alongside PostgreSQL
docker compose --profile ui up -d

Open http://localhost:3000, create an account, then:

Go to Settings → Admin → Connections
Add OpenAI connection: http://host.docker.internal:8080/v1
Select model: spring-ai-chat (general) or spring-ai-rag (document-grounded)

Architecture

Request → ChatController → ChatClient
                              ├── RetrievalAugmentationAdvisor (RAG)
                              ├── MessageChatMemoryAdvisor (conversation history)
                              └── DocumentTools (@Tool functions)
                                      │
              ┌─────────────────────┼─────────────────────┐
              ▼                     ▼                     ▼
           OpenAI              PGVector              PostgreSQL
        (GPT-4o-mini)         (vectors)            (chat memory)

Project Layout

src/main/java/com/arvindand/rag/
├── config/
│   ├── ChatClientConfig.java    # ChatClient + advisors
│   └── MemoryConfig.java        # JDBC chat memory
├── controller/
│   ├── ChatController.java      # /api/v2/chat
│   ├── DocumentController.java  # /api/v2/documents
│   └── OpenAICompatibleController.java
├── service/
│   └── DocumentService.java     # document ingestion
└── tools/
    └── DocumentTools.java       # @Tool methods

Key Patterns

RetrievalAugmentationAdvisor — The 1.1 approach to RAG. Searches the vector store for relevant chunks and injects them into the prompt context.

MessageWindowChatMemory — Keeps the last N messages per conversation, persisted to PostgreSQL via JDBC.

@Tool annotation — Methods the LLM can invoke. Spring AI generates the JSON schema and handles the function calling protocol.

Config

spring:
  ai:
    openai:
      chat:
        options:
          model: gpt-4o-mini
          temperature: 0.7
    vectorstore:
      pgvector:
        dimensions: 1536
        index-type: hnsw

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
src/main		src/main
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spring AI RAG Demo

Stack

Quick Start

Testing the API

Health Check

Basic Chat (with RAG)

Streaming Chat

Upload Documents

OpenAI-Compatible API

Open WebUI Integration

Architecture

Project Layout

Key Patterns

Config

Links

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

arvindand/spring-ai-rag-demo

Folders and files

Latest commit

History

Repository files navigation

Spring AI RAG Demo

Stack

Quick Start

Testing the API

Health Check

Basic Chat (with RAG)

Streaming Chat

Upload Documents

OpenAI-Compatible API

Open WebUI Integration

Architecture

Project Layout

Key Patterns

Config

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages