From ca2ed6f3b67ee809069fc1feab829b6c16c55276 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:23:44 +0100 Subject: [PATCH 01/15] docs: add Graph RAG use case design Defines schema, sample data, queries, and module structure for the Graph RAG use case based on arcadedb.com/graph-rag.html. Includes Neo4j Bolt driver Java module and langchain4j submodule with local AllMiniLmL6V2 embeddings. Co-Authored-By: Claude Opus 4.6 --- docs/plans/2026-02-26-graph-rag-design.md | 147 ++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 docs/plans/2026-02-26-graph-rag-design.md diff --git a/docs/plans/2026-02-26-graph-rag-design.md b/docs/plans/2026-02-26-graph-rag-design.md new file mode 100644 index 0000000..332ee25 --- /dev/null +++ b/docs/plans/2026-02-26-graph-rag-design.md @@ -0,0 +1,147 @@ +# Graph RAG Use Case — Design + +**Date:** 2026-02-26 +**Branch:** feat/graph-rag +**ArcadeDB version:** 26.2.1 + +## Overview + +Implement the [ArcadeDB Graph RAG](https://arcadedb.com/graph-rag.html) use case following the same structure as the recommendation-engine. The use case demonstrates ArcadeDB's ability to unify vector search, graph traversal, and full-text indexing for retrieval-augmented generation — without requiring multiple databases or ETL pipelines. + +Key differences from recommendation-engine: +- Java module uses **Neo4j Bolt driver** (`neo4j-java-driver`) and **Cypher** as query language, connecting via `bolt://localhost:2424` +- Additional **langchain4j** submodule demonstrates `Neo4jEmbeddingStore` and `EmbeddingStoreContentRetriever` with local `AllMiniLmL6V2` embeddings (no external API keys) + +## Repository Structure + +``` +graph-rag/ +├── README.md +├── docker-compose.yml +├── setup.sh +├── sql/ +│ ├── 01-schema.sql +│ └── 02-data.sql +├── queries/ +│ └── queries.sh +├── java/ +│ ├── pom.xml +│ └── src/main/java/com/arcadedb/examples/ +│ └── GraphRAG.java +└── langchain4j/ + ├── pom.xml + └── src/main/java/com/arcadedb/examples/ + ├── GraphRAGEmbeddingStore.java + └── GraphRAGContentRetriever.java +``` + +## Docker Compose + +- Single service: `arcadedata/arcadedb:26.2.1` +- Ports exposed: `2480` (HTTP API), `2424` (Bolt) +- Root password via `JAVA_OPTS: -Darcadedb.server.rootPassword=arcadedb` +- Health check on `http://localhost:2480/api/v1/ready` + +## Schema (`sql/01-schema.sql`) + +One document type, four vertex types, and four edge types: + +**Document:** +- `Chunk` — `content` (STRING), `source` (STRING), `chunkIndex` (INTEGER), `embedding` (LIST) +- Vector index on `Chunk(embedding)`: LSM, 4 dimensions, COSINE + +**Vertices:** +- `Entity` — `name` (STRING) +- `Person EXTENDS Entity` +- `Concept EXTENDS Entity` +- `Organization EXTENDS Entity` + +**Edges:** +- `MENTIONS` — Chunk -> Entity +- `RELATES_TO` — Entity -> Entity +- `WORKS_AT` — Person -> Organization +- `AUTHORED` — Person -> Chunk + +## Sample Data (`sql/02-data.sql`) + +**Domain:** Fictional tech company "ArcadeSoft" knowledge base. + +**Chunks (~8-10):** Snippets from internal documentation: +- "Getting Started with GraphRAG" (2 chunks) +- "Microservices Architecture Guide" (2 chunks) +- "Vector Search Best Practices" (2 chunks) +- "Team Onboarding Handbook" (2 chunks) + +Each chunk has a hand-crafted 4D embedding reflecting its topic (e.g. graph-heavy docs: `[0.9, 0.1, 0.2, 0.1]`, vector-heavy: `[0.1, 0.9, 0.2, 0.1]`). + +**Entities (~8-10):** +- Persons: Alice Chen, Bob Martinez, Carol Wu, Dave Park +- Concepts: GraphRAG, Vector Search, Microservices, Knowledge Graph +- Organizations: ArcadeSoft, Platform Team, Research Team + +**Edges (~20-25):** +- MENTIONS: chunks reference concepts and people +- RELATES_TO: GraphRAG -> Vector Search, GraphRAG -> Knowledge Graph, Microservices -> Knowledge Graph +- WORKS_AT: Alice -> Research Team, Bob -> Platform Team, Carol -> ArcadeSoft, Dave -> Platform Team +- AUTHORED: Alice -> GraphRAG doc chunks, Bob -> Microservices doc chunks + +**Design intent:** Multi-hop queries work because querying "Vector Search" finds a chunk that MENTIONS the "GraphRAG" concept, which is MENTIONED by other chunks about GraphRAG — creating entity bridges. RELATES_TO edges form a small concept graph for traversal. + +## Queries + +### `queries/queries.sh` — 5 labeled sections via curl + +| # | Pattern | Language | Description | +|---|---------|----------|-------------| +| 1 | Hybrid Vector + Graph | Cypher | Vector search for similar chunks, traverse MENTIONS to find entities and connected chunks | +| 2 | Multi-Hop Entity Bridge | Cypher | Find chunks connected through entity chains: query chunk -> entity -> related chunk | +| 3 | Temporal-Aware Retrieval | Cypher | Filter chunks by `chunkIndex` ordering, return most recent context first | +| 4 | Triple Hybrid | SQL | Composite scoring: vector distance + `CONTAINSTEXT` keyword + entity connection count | +| 5 | Agentic RAG Steps | Mixed | 4-step sequence: vector search, graph expansion, full-text lookup, context assembly | + +### `java/GraphRAG.java` — All Cypher via Bolt + +Adapts the 5 patterns to pure Cypher. Queries that rely on SQL-specific features are adapted: +- Query 4: vector distance + entity count (2-signal composite, no full-text) +- Query 5: vector search -> graph expansion -> collect results (3 steps, no full-text) + +### `langchain4j/` — 2 example classes + +1. **GraphRAGEmbeddingStore** — ingest text chunks, generate real 384D embeddings with AllMiniLmL6V2, store in ArcadeDB via `Neo4jEmbeddingStore` over Bolt, run similarity searches +2. **GraphRAGContentRetriever** — wire `Neo4jEmbeddingStore` into a langchain4j `EmbeddingStoreContentRetriever` pipeline, query with natural language, print retrieved chunks with scores + +## Java Module (`java/`) + +- **Build tool:** Maven (standalone `pom.xml`, no parent) +- **Dependency:** `org.neo4j.driver:neo4j-java-driver:5.28.x` +- **Java:** 21 +- **Output:** fat JAR via maven-assembly-plugin -> `graph-rag.jar` +- **Entry point:** `GraphRAG.java` with `main` method that: + 1. Opens a Neo4j `Driver` connection to `bolt://localhost:2424` + 2. Runs all 5 queries sequentially in Cypher + 3. Prints header and formatted results for each query + 4. Closes the driver + +## Langchain4j Module (`langchain4j/`) + +- **Build tool:** Maven (standalone `pom.xml`, no parent, no Spring Boot) +- **Dependencies:** `langchain4j-community-neo4j`, `langchain4j-embeddings-all-minilm-l6-v2`, `neo4j-java-driver` +- **Java:** 21 +- **Output:** fat JAR via maven-assembly-plugin -> `graph-rag-langchain4j.jar` +- **No external API keys required** — AllMiniLmL6V2 runs in-process + +## Setup + +`setup.sh` follows the recommendation-engine pattern: +1. Wait for ArcadeDB ready endpoint +2. Create database `GraphRAG` via HTTP API +3. Apply `sql/01-schema.sql` +4. Apply `sql/02-data.sql` + +## Success Criteria + +- `docker compose up` starts ArcadeDB with both HTTP and Bolt ports +- SQL files apply cleanly via `setup.sh` +- `queries.sh` runs all 5 queries and returns non-empty result sets +- `mvn package && java -jar target/graph-rag.jar` connects via Bolt, runs all 5 Cypher queries +- `mvn package && java -jar target/graph-rag-langchain4j.jar` ingests chunks, generates embeddings, runs similarity search and content retrieval From 58cabd2b24670c2bf8a6205592ad339e24f7a62f Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:26:46 +0100 Subject: [PATCH 02/15] docs: add Graph RAG implementation plan 11-task plan covering Docker Compose, schema, sample data, curl queries, Java Bolt/Cypher module, langchain4j embedding store and content retriever modules, README, and integration smoke test. Co-Authored-By: Claude Opus 4.6 --- docs/plans/2026-02-26-graph-rag.md | 786 +++++++++++++++++++++++++++++ 1 file changed, 786 insertions(+) create mode 100644 docs/plans/2026-02-26-graph-rag.md diff --git a/docs/plans/2026-02-26-graph-rag.md b/docs/plans/2026-02-26-graph-rag.md new file mode 100644 index 0000000..a4e0948 --- /dev/null +++ b/docs/plans/2026-02-26-graph-rag.md @@ -0,0 +1,786 @@ +# Graph RAG Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Implement the Graph RAG use case demonstrating ArcadeDB's unified vector search, graph traversal, and full-text indexing for retrieval-augmented generation. + +**Architecture:** Self-contained `graph-rag/` directory mirroring the recommendation-engine structure. Docker Compose exposes HTTP (2480) and Bolt (2424). Base Java module uses Neo4j Bolt driver with pure Cypher. Langchain4j sibling module uses `Neo4jEmbeddingStore` with local AllMiniLmL6V2 embeddings (no API keys). + +**Tech Stack:** ArcadeDB 26.2.1, Neo4j Java Driver 5.28.10, LangChain4j Community Neo4j 1.11.0-beta19, AllMiniLmL6V2 embedding model, Java 21, Maven + +**Design doc:** `docs/plans/2026-02-26-graph-rag-design.md` + +**Reference implementation:** `recommendation-engine/` (same repo) + +--- + +### Task 1: Docker Compose and setup script + +**Files:** +- Create: `graph-rag/docker-compose.yml` +- Create: `graph-rag/setup.sh` + +**Step 1: Create docker-compose.yml** + +```yaml +services: + arcadedb: + image: arcadedata/arcadedb:26.2.1 + ports: + - "2480:2480" + - "2424:2424" + environment: + JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb" + healthcheck: + test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"] + interval: 5s + timeout: 3s + retries: 20 + start_period: 10s +``` + +Note: identical to `recommendation-engine/docker-compose.yml` except adds port `2424:2424` for Bolt. + +**Step 2: Create setup.sh** + +Copy `recommendation-engine/setup.sh` and change `DB_NAME="RecommendationEngine"` to `DB_NAME="GraphRAG"`. Everything else is the same: wait for ready, create database, apply SQL files. + +**Step 3: Make setup.sh executable** + +Run: `chmod +x graph-rag/setup.sh` + +**Step 4: Commit** + +```bash +git add graph-rag/docker-compose.yml graph-rag/setup.sh +git commit -m "feat(graph-rag): add docker-compose and setup script" +``` + +--- + +### Task 2: SQL schema + +**Files:** +- Create: `graph-rag/sql/01-schema.sql` + +**Step 1: Create 01-schema.sql** + +```sql +-- Document type for text chunks with vector embeddings +CREATE DOCUMENT TYPE Chunk IF NOT EXISTS; +CREATE PROPERTY Chunk.content IF NOT EXISTS STRING; +CREATE PROPERTY Chunk.source IF NOT EXISTS STRING; +CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER; +CREATE PROPERTY Chunk.embedding IF NOT EXISTS LIST; + +-- Entity vertex types (knowledge graph nodes) +CREATE VERTEX TYPE Entity IF NOT EXISTS; +CREATE PROPERTY Entity.name IF NOT EXISTS STRING; +CREATE VERTEX TYPE Person IF NOT EXISTS EXTENDS Entity; +CREATE VERTEX TYPE Concept IF NOT EXISTS EXTENDS Entity; +CREATE VERTEX TYPE Organization IF NOT EXISTS EXTENDS Entity; + +-- Edge types +CREATE EDGE TYPE MENTIONS IF NOT EXISTS; +CREATE EDGE TYPE RELATES_TO IF NOT EXISTS; +CREATE EDGE TYPE WORKS_AT IF NOT EXISTS; +CREATE EDGE TYPE AUTHORED IF NOT EXISTS; + +-- Vector index for chunk embeddings (4 dimensions for sample data) +CREATE INDEX IF NOT EXISTS ON Chunk (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' }; +``` + +**Step 2: Commit** + +```bash +git add graph-rag/sql/01-schema.sql +git commit -m "feat(graph-rag): add schema definition" +``` + +--- + +### Task 3: Sample data + +**Files:** +- Create: `graph-rag/sql/02-data.sql` + +**Step 1: Create 02-data.sql** + +The data represents a fictional tech company "ArcadeSoft" knowledge base. Embeddings are 4D vectors where dimensions loosely represent: [graph, vector, architecture, general]. + +```sql +-- ── Chunks (internal documentation) ───────────────────────────────────────── +-- Getting Started with GraphRAG (graph-heavy topic) +INSERT INTO Chunk SET content = 'GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy. By traversing entity relationships, the system discovers context that pure vector similarity would miss.', source = 'Getting Started with GraphRAG', chunkIndex = 0, embedding = [0.9, 0.2, 0.1, 0.1]; +INSERT INTO Chunk SET content = 'Building a knowledge graph requires extracting entities and relationships from documents. Named entity recognition and relationship extraction are key preprocessing steps.', source = 'Getting Started with GraphRAG', chunkIndex = 1, embedding = [0.8, 0.1, 0.2, 0.1]; +-- Microservices Architecture Guide (architecture-heavy topic) +INSERT INTO Chunk SET content = 'Microservices decompose applications into small, independently deployable services. Each service owns its data and communicates via well-defined APIs.', source = 'Microservices Architecture Guide', chunkIndex = 0, embedding = [0.1, 0.1, 0.9, 0.2]; +INSERT INTO Chunk SET content = 'Service mesh patterns like sidecar proxies handle cross-cutting concerns including observability, security, and traffic management across microservices.', source = 'Microservices Architecture Guide', chunkIndex = 1, embedding = [0.1, 0.1, 0.8, 0.3]; +-- Vector Search Best Practices (vector-heavy topic) +INSERT INTO Chunk SET content = 'Vector similarity search uses embedding models to encode text into high-dimensional vectors. Cosine distance is the most common similarity metric for text embeddings.', source = 'Vector Search Best Practices', chunkIndex = 0, embedding = [0.2, 0.9, 0.1, 0.1]; +INSERT INTO Chunk SET content = 'Approximate nearest neighbor algorithms like HNSW and DiskANN trade small accuracy losses for dramatic speed improvements on large vector datasets.', source = 'Vector Search Best Practices', chunkIndex = 1, embedding = [0.1, 0.8, 0.1, 0.2]; +-- Team Onboarding Handbook (general topic) +INSERT INTO Chunk SET content = 'New engineers at ArcadeSoft join a team and are assigned a mentor. The onboarding process covers codebase orientation, tooling setup, and architecture overview.', source = 'Team Onboarding Handbook', chunkIndex = 0, embedding = [0.2, 0.2, 0.3, 0.8]; +INSERT INTO Chunk SET content = 'The Platform Team maintains shared infrastructure including the knowledge graph pipeline and vector search service. The Research Team explores new retrieval techniques.', source = 'Team Onboarding Handbook', chunkIndex = 1, embedding = [0.3, 0.3, 0.2, 0.7]; + +-- ── Entities ──────────────────────────────────────────────────────────────── +-- Persons +INSERT INTO Person SET name = 'Alice Chen'; +INSERT INTO Person SET name = 'Bob Martinez'; +INSERT INTO Person SET name = 'Carol Wu'; +INSERT INTO Person SET name = 'Dave Park'; +-- Concepts +INSERT INTO Concept SET name = 'GraphRAG'; +INSERT INTO Concept SET name = 'Vector Search'; +INSERT INTO Concept SET name = 'Microservices'; +INSERT INTO Concept SET name = 'Knowledge Graph'; +-- Organizations +INSERT INTO Organization SET name = 'ArcadeSoft'; +INSERT INTO Organization SET name = 'Platform Team'; +INSERT INTO Organization SET name = 'Research Team'; + +-- ── MENTIONS edges (Chunk -> Entity) ──────────────────────────────────────── +-- GraphRAG doc chunks mention GraphRAG and Knowledge Graph concepts +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'GraphRAG'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'GraphRAG'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +-- Vector Search doc chunks mention Vector Search concept +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +-- Microservices doc chunks mention Microservices concept +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Microservices'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Microservices'); +-- Onboarding doc mentions teams and people +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0) TO (SELECT FROM Organization WHERE name = 'ArcadeSoft'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Platform Team'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Research Team'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search'); + +-- ── RELATES_TO edges (Entity -> Entity) ───────────────────────────────────── +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'Microservices') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); + +-- ── WORKS_AT edges (Person -> Organization) ───────────────────────────────── +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Organization WHERE name = 'Research Team'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Organization WHERE name = 'Platform Team'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Organization WHERE name = 'ArcadeSoft'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Dave Park') TO (SELECT FROM Organization WHERE name = 'Platform Team'); + +-- ── AUTHORED edges (Person -> Chunk) ──────────────────────────────────────── +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1); +``` + +**Data design validation:** Querying for chunks similar to `[0.9, 0.2, 0.1, 0.1]` (graph topic) returns GraphRAG chunks. Those chunks MENTION the "GraphRAG" concept, which RELATES_TO "Vector Search" and "Knowledge Graph". Following MENTIONS back from those concepts leads to Vector Search doc chunks and the Onboarding Handbook chunk — creating the multi-hop entity bridge. + +**Step 2: Commit** + +```bash +git add graph-rag/sql/02-data.sql +git commit -m "feat(graph-rag): add sample data for ArcadeSoft knowledge base" +``` + +--- + +### Task 4: Curl queries script + +**Files:** +- Create: `graph-rag/queries/queries.sh` + +**Step 1: Create queries.sh** + +Follow the exact structure from `recommendation-engine/queries/queries.sh`: shebang, env vars, `query()` helper function, 5 labeled sections. Database name is `GraphRAG`. + +**Queries:** + +1. **Hybrid Vector + Graph (Cypher)** — Vector search for chunks similar to `[0.9, 0.2, 0.1, 0.1]`, then traverse MENTIONS to find entities and related chunks: + +```cypher +MATCH (chunk:Chunk) +WHERE chunk.embedding <> [] +WITH chunk, vectorDistance('Chunk[embedding]', chunk.embedding, [0.9, 0.2, 0.1, 0.1]) AS score +WHERE score < 0.5 +OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity) +OPTIONAL MATCH (entity)<-[:MENTIONS]-(related:Chunk) +WHERE related <> chunk +RETURN chunk.content, chunk.source, score, + collect(DISTINCT entity.name) AS entities, + collect(DISTINCT related.source) AS related_docs +ORDER BY score ASC +LIMIT 10 +``` + +Note: The exact Cypher syntax for ArcadeDB vector functions may need adjustment during implementation. ArcadeDB's Cypher support may require using `vectorNeighbors` via SQL instead. The `queries.sh` can use SQL for vector-heavy queries. Adapt during implementation based on what ArcadeDB 26.2.1 actually supports. + +2. **Multi-Hop Entity Bridge (Cypher)** — Find chunks connected through shared entities: + +```cypher +MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk) +WHERE direct.source = 'Getting Started with GraphRAG' + AND related.source <> direct.source +RETURN direct.source AS source_doc, + entity.name AS bridge_entity, + entity.@class AS entity_type, + related.content AS connected_content, + related.source AS connected_doc +LIMIT 20 +``` + +3. **Temporal-Aware Retrieval (Cypher)** — Filter by chunkIndex to get latest chunks per source: + +```cypher +MATCH (c:Chunk) +WHERE c.chunkIndex = 1 +RETURN c.content, c.source, c.chunkIndex +ORDER BY c.chunkIndex DESC +LIMIT 10 +``` + +4. **Triple Hybrid: Vector + Full-Text + Graph (SQL)** — Composite scoring: + +```sql +SELECT content, source, + vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score, + out('MENTIONS').size() AS entity_count +FROM Chunk +ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC +LIMIT 10 +``` + +5. **Agentic RAG Steps (Mixed)** — 4 sequential queries: + - Step 1 (Cypher): Vector search for relevant chunks + - Step 2 (Cypher): Graph expansion from found entities + - Step 3 (SQL): Full-text lookup with `CONTAINSTEXT` + - Step 4 (Cypher): Get authorship context + +**Step 2: Make executable** + +Run: `chmod +x graph-rag/queries/queries.sh` + +**Step 3: Commit** + +```bash +git add graph-rag/queries/queries.sh +git commit -m "feat(graph-rag): add curl query script with 5 RAG patterns" +``` + +--- + +### Task 5: Java module — pom.xml + +**Files:** +- Create: `graph-rag/java/pom.xml` + +**Step 1: Create pom.xml** + +Mirror `recommendation-engine/java/pom.xml` structure but swap the dependency from `arcadedb-network` to `neo4j-java-driver`: + +```xml + + + 4.0.0 + + com.arcadedb.examples + graph-rag + 1.0-SNAPSHOT + jar + + + 21 + 21 + UTF-8 + 5.28.10 + + + + + org.neo4j.driver + neo4j-java-driver + ${neo4j.driver.version} + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + 3.8.0 + + + + com.arcadedb.examples.GraphRAG + + + + jar-with-dependencies + + graph-rag + false + + + + make-assembly + package + + single + + + + + + + +``` + +**Step 2: Commit** + +```bash +git add graph-rag/java/pom.xml +git commit -m "feat(graph-rag): add Java module pom.xml with Neo4j Bolt driver" +``` + +--- + +### Task 6: Java module — GraphRAG.java + +**Files:** +- Create: `graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java` + +**Step 1: Create GraphRAG.java** + +Follow `RecommendationEngine.java` structure: config constants from env vars, `main()` calling 5 query methods via `tryRun()`, `printHeader()` helper. + +Key differences from recommendation-engine: +- Uses `org.neo4j.driver.Driver` instead of `RemoteDatabase` +- Connects via `bolt://HOST:PORT` (default `bolt://localhost:2424`) +- Uses `driver.session()` and `session.run(cypher)` instead of `db.query()` +- All queries are Cypher (no SQL fallback) +- Results accessed via `record.get("fieldName")` instead of `r.getProperty()` + +```java +package com.arcadedb.examples; + +import org.neo4j.driver.*; +import org.neo4j.driver.Record; + +public class GraphRAG { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + String uri = "bolt://" + HOST + ":" + PORT; + try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) { + tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1"); + tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2"); + tryRun(() -> runQuery3TemporalAware(driver), "Query 3"); + tryRun(() -> runQuery4CompositeScoring(driver), "Query 4"); + tryRun(() -> runQuery5AgenticRAG(driver), "Query 5"); + } + System.out.println("\nAll queries complete."); + } + + // ... tryRun, printHeader same pattern as RecommendationEngine.java + // ... 5 query methods using driver.session() and session.run() +} +``` + +Each query method: +1. Calls `printHeader()` with title and description +2. Opens a `Session` with `driver.session(SessionConfig.forDatabase("GraphRAG"))` +3. Runs the Cypher query via `session.run(cypher)` +4. Iterates `Result` and prints formatted output via `record.get(...)` + +**Query adaptations for pure Cypher via Bolt:** + +- Q1 (Hybrid Vector + Graph): Uses available Cypher vector functions. If `vectorDistance` is not available in Cypher over Bolt, fall back to matching chunks by source and traversing MENTIONS instead. +- Q2 (Multi-Hop Entity Bridge): Pure graph traversal — works directly in Cypher. +- Q3 (Temporal-Aware): Simple MATCH with WHERE/ORDER BY on chunkIndex — pure Cypher. +- Q4 (Composite Scoring): Adapt to Cypher — count entity connections via `size((chunk)-[:MENTIONS]->())`. +- Q5 (Agentic RAG): Multiple sequential queries within the same session — vector search, graph expansion, authorship context. + +**Important:** ArcadeDB's Bolt protocol may not support all Cypher features identically to Neo4j. During implementation, test each query against the running ArcadeDB instance and adjust syntax as needed. The `queries.sh` script serves as the reference for what ArcadeDB supports. + +**Step 2: Verify compilation** + +Run: `cd graph-rag/java && mvn compile -q` +Expected: BUILD SUCCESS + +**Step 3: Build fat JAR** + +Run: `mvn package -q` +Expected: `target/graph-rag.jar` created + +**Step 4: Commit** + +```bash +git add graph-rag/java/src/ +git commit -m "feat(graph-rag): add GraphRAG.java with 5 Cypher queries via Bolt" +``` + +--- + +### Task 7: Langchain4j module — pom.xml + +**Files:** +- Create: `graph-rag/langchain4j/pom.xml` + +**Step 1: Create pom.xml** + +Standalone POM (no parent, no Spring Boot). Dependencies: + +```xml + + + 4.0.0 + + com.arcadedb.examples + graph-rag-langchain4j + 1.0-SNAPSHOT + jar + + + 21 + 21 + UTF-8 + 1.11.0 + 1.11.0-beta19 + + + + + dev.langchain4j + langchain4j-community-neo4j + ${langchain4j.community.version} + + + dev.langchain4j + langchain4j-embeddings-all-minilm-l6-v2 + ${langchain4j.version} + + + dev.langchain4j + langchain4j + ${langchain4j.version} + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + 3.8.0 + + + + com.arcadedb.examples.GraphRAGEmbeddingStore + + + + jar-with-dependencies + + graph-rag-langchain4j + false + + + + make-assembly + package + + single + + + + + + + +``` + +**Step 2: Commit** + +```bash +git add graph-rag/langchain4j/pom.xml +git commit -m "feat(graph-rag): add langchain4j module pom.xml" +``` + +--- + +### Task 8: Langchain4j — GraphRAGEmbeddingStore.java + +**Files:** +- Create: `graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java` + +**Step 1: Create GraphRAGEmbeddingStore.java** + +Demonstrates: ingest text chunks, generate real 384D embeddings with AllMiniLmL6V2, store in ArcadeDB via `Neo4jEmbeddingStore` over Bolt, then run similarity searches. + +```java +package com.arcadedb.examples; + +import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; +import dev.langchain4j.data.embedding.Embedding; +import dev.langchain4j.data.segment.TextSegment; +import dev.langchain4j.model.embedding.EmbeddingModel; +import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; +import dev.langchain4j.store.embedding.EmbeddingMatch; +import dev.langchain4j.store.embedding.EmbeddingSearchRequest; +import dev.langchain4j.store.embedding.EmbeddingStore; + +import java.util.List; + +public class GraphRAGEmbeddingStore { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + + String boltUrl = "bolt://" + HOST + ":" + PORT; + EmbeddingStore store = Neo4jEmbeddingStore.builder() + .withBasicAuth(boltUrl, USER, PASSWORD) + .dimension(embeddingModel.dimension()) + .build(); + + // Ingest sample chunks + String[] texts = { + "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", + "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", + "Microservices decompose applications into small, independently deployable services.", + "Building a knowledge graph requires extracting entities and relationships from documents." + }; + + for (String text : texts) { + TextSegment segment = TextSegment.from(text); + Embedding embedding = embeddingModel.embed(segment).content(); + store.add(embedding, segment); + } + + System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n"); + + // Similarity search + String query = "How does graph-based retrieval work?"; + Embedding queryEmbedding = embeddingModel.embed(query).content(); + EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() + .queryEmbedding(queryEmbedding) + .maxResults(3) + .build(); + + List> matches = store.search(request).matches(); + + System.out.println("Query: \"" + query + "\"\n"); + System.out.println("Top matches:"); + for (EmbeddingMatch match : matches) { + System.out.printf(" [%.4f] %s%n", match.score(), match.embedded().text()); + } + } +} +``` + +**Step 2: Verify compilation** + +Run: `cd graph-rag/langchain4j && mvn compile -q` +Expected: BUILD SUCCESS + +**Step 3: Commit** + +```bash +git add graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java +git commit -m "feat(graph-rag): add langchain4j embedding store example" +``` + +--- + +### Task 9: Langchain4j — GraphRAGContentRetriever.java + +**Files:** +- Create: `graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java` + +**Step 1: Create GraphRAGContentRetriever.java** + +Wires `Neo4jEmbeddingStore` into a langchain4j `EmbeddingStoreContentRetriever` pipeline. Ingests chunks, then queries with natural language and prints retrieved content with scores. + +```java +package com.arcadedb.examples; + +import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; +import dev.langchain4j.data.embedding.Embedding; +import dev.langchain4j.data.segment.TextSegment; +import dev.langchain4j.model.embedding.EmbeddingModel; +import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; +import dev.langchain4j.rag.content.Content; +import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; +import dev.langchain4j.rag.query.Query; + +import java.util.List; + +public class GraphRAGContentRetriever { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + + String boltUrl = "bolt://" + HOST + ":" + PORT; + Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder() + .withBasicAuth(boltUrl, USER, PASSWORD) + .dimension(embeddingModel.dimension()) + .label("RAGChunk") + .indexName("rag_chunk_index") + .build(); + + // Ingest sample chunks + String[] texts = { + "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", + "By traversing entity relationships, the system discovers context that pure vector similarity would miss.", + "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", + "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.", + "Microservices decompose applications into small, independently deployable services.", + "Building a knowledge graph requires extracting entities and relationships from documents." + }; + + for (String text : texts) { + TextSegment segment = TextSegment.from(text); + Embedding embedding = embeddingModel.embed(segment).content(); + store.add(embedding, segment); + } + + System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n"); + + // Build content retriever pipeline + EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder() + .embeddingStore(store) + .embeddingModel(embeddingModel) + .maxResults(3) + .minScore(0.5) + .build(); + + // Run queries + String[] queries = { + "How does graph-based retrieval work?", + "What are vector embeddings?", + "Tell me about microservices architecture" + }; + + for (String q : queries) { + System.out.println("Query: \"" + q + "\""); + List results = retriever.retrieve(new Query(q)); + if (results.isEmpty()) { + System.out.println(" (no results above min score)\n"); + } else { + for (Content content : results) { + System.out.println(" -> " + content.textSegment().text()); + } + System.out.println(); + } + } + } +} +``` + +**Step 2: Verify compilation** + +Run: `cd graph-rag/langchain4j && mvn compile -q` +Expected: BUILD SUCCESS + +**Step 3: Commit** + +```bash +git add graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java +git commit -m "feat(graph-rag): add langchain4j content retriever example" +``` + +--- + +### Task 10: README + +**Files:** +- Create: `graph-rag/README.md` + +**Step 1: Create README.md** + +Follow `recommendation-engine/README.md` structure exactly. Sections: + +1. **Title and overview** — Graph RAG: unifies vector search, graph traversal, and full-text indexing +2. **Prerequisites** — Docker/Compose, curl/jq, Java 21+, Maven 3.x +3. **Quickstart** — 5 steps: docker compose up, setup.sh, queries.sh, java JAR, langchain4j JAR +4. **Schema table** — Chunk (document), Entity/Person/Concept/Organization (vertex), MENTIONS/RELATES_TO/WORKS_AT/AUTHORED (edge) +5. **Query patterns table** — 5 queries with language and signal type +6. **Sample data** — 8 chunks, 11 entities, ~25 edges with overlap design +7. **Langchain4j module** — embedding store + content retriever, no API keys +8. **ArcadeDB version notes** — 26.2.1, Bolt protocol, vector index details +9. **Reference link** — arcadedb.com/graph-rag.html + +**Step 2: Commit** + +```bash +git add graph-rag/README.md +git commit -m "docs(graph-rag): add README" +``` + +--- + +### Task 11: Integration smoke test + +**Step 1: Start ArcadeDB** + +Run: `cd graph-rag && docker compose up -d` +Expected: ArcadeDB container starts, health check passes + +**Step 2: Run setup** + +Run: `./setup.sh` +Expected: "Setup complete. GraphRAG is ready." + +**Step 3: Run curl queries** + +Run: `./queries/queries.sh` +Expected: All 5 queries return non-empty result sets + +**Step 4: Run Java module** + +Run: `cd java && mvn package -q && java -jar target/graph-rag.jar` +Expected: All 5 queries print results via Bolt + +**Step 5: Run langchain4j module** + +Run: `cd ../langchain4j && mvn package -q && java -jar target/graph-rag-langchain4j.jar` +Expected: Chunks ingested, similarity search returns ranked results + +**Step 6: Fix any issues discovered during smoke test** + +Adjust query syntax, fix Cypher/Bolt compatibility issues, update data as needed. + +**Step 7: Commit any fixes** + +```bash +git add -A graph-rag/ +git commit -m "fix(graph-rag): adjust queries after integration testing" +``` + +**Step 8: Stop ArcadeDB** + +Run: `docker compose down` From e6a199c05c727a6f9e0bd83ea6a49ef39d4810b7 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:32:37 +0100 Subject: [PATCH 03/15] feat(graph-rag): add docker-compose and setup script Co-Authored-By: Claude Opus 4.6 --- graph-rag/docker-compose.yml | 14 ++++++++++ graph-rag/setup.sh | 51 ++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 graph-rag/docker-compose.yml create mode 100755 graph-rag/setup.sh diff --git a/graph-rag/docker-compose.yml b/graph-rag/docker-compose.yml new file mode 100644 index 0000000..bb503a9 --- /dev/null +++ b/graph-rag/docker-compose.yml @@ -0,0 +1,14 @@ +services: + arcadedb: + image: arcadedata/arcadedb:26.2.1 + ports: + - "2480:2480" + - "2424:2424" + environment: + JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb" + healthcheck: + test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"] + interval: 5s + timeout: 3s + retries: 20 + start_period: 10s diff --git a/graph-rag/setup.sh b/graph-rag/setup.sh new file mode 100755 index 0000000..75ef0c5 --- /dev/null +++ b/graph-rag/setup.sh @@ -0,0 +1,51 @@ +#!/usr/bin/env bash +set -euo pipefail + +ARCADEDB_URL="${ARCADEDB_URL:-http://localhost:2480}" +ARCADEDB_USER="${ARCADEDB_USER:-root}" +ARCADEDB_PASS="${ARCADEDB_PASS:-arcadedb}" +DB_NAME="GraphRAG" + +# ── Wait for ArcadeDB ───────────────────────────────────────────────────────── +echo "Waiting for ArcadeDB at ${ARCADEDB_URL}..." +until curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \ + "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do + sleep 2 +done +echo "ArcadeDB is ready." + +# ── Create database ─────────────────────────────────────────────────────────── +echo "Creating database ${DB_NAME}..." +curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \ + -X POST "${ARCADEDB_URL}/api/v1/server" \ + -H "Content-Type: application/json" \ + -d "{\"command\": \"create database ${DB_NAME}\"}" > /dev/null || true +echo "Database ready." + +# ── Helper: send one SQL statement ─────────────────────────────────────────── +send_sql() { + local stmt="$1" + jq -cn --arg cmd "$stmt" '{"language":"sql","command":$cmd}' \ + | curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \ + -X POST "${ARCADEDB_URL}/api/v1/command/${DB_NAME}" \ + -H "Content-Type: application/json" \ + -d @- > /dev/null +} + +# ── Apply a SQL file (one statement per line) ───────────────────────────────── +apply_file() { + local file="$1" + echo "Applying ${file}..." + while IFS= read -r line || [[ -n "$line" ]]; do + # skip blank lines and SQL comments + [[ -z "${line//[[:space:]]/}" || "$line" =~ ^[[:space:]]*-- ]] && continue + send_sql "${line%%;}" + done < "$file" + echo "Done: ${file}" +} + +apply_file "sql/01-schema.sql" +apply_file "sql/02-data.sql" + +echo "" +echo "Setup complete. ${DB_NAME} is ready." From 93d62cb15c1eae53137032daccfe3b1f440efbaf Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:33:16 +0100 Subject: [PATCH 04/15] feat(graph-rag): add schema definition Co-Authored-By: Claude Opus 4.6 --- graph-rag/sql/01-schema.sql | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 graph-rag/sql/01-schema.sql diff --git a/graph-rag/sql/01-schema.sql b/graph-rag/sql/01-schema.sql new file mode 100644 index 0000000..646e812 --- /dev/null +++ b/graph-rag/sql/01-schema.sql @@ -0,0 +1,22 @@ +-- Document type for text chunks with vector embeddings +CREATE DOCUMENT TYPE Chunk IF NOT EXISTS; +CREATE PROPERTY Chunk.content IF NOT EXISTS STRING; +CREATE PROPERTY Chunk.source IF NOT EXISTS STRING; +CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER; +CREATE PROPERTY Chunk.embedding IF NOT EXISTS LIST; + +-- Entity vertex types (knowledge graph nodes) +CREATE VERTEX TYPE Entity IF NOT EXISTS; +CREATE PROPERTY Entity.name IF NOT EXISTS STRING; +CREATE VERTEX TYPE Person IF NOT EXISTS EXTENDS Entity; +CREATE VERTEX TYPE Concept IF NOT EXISTS EXTENDS Entity; +CREATE VERTEX TYPE Organization IF NOT EXISTS EXTENDS Entity; + +-- Edge types +CREATE EDGE TYPE MENTIONS IF NOT EXISTS; +CREATE EDGE TYPE RELATES_TO IF NOT EXISTS; +CREATE EDGE TYPE WORKS_AT IF NOT EXISTS; +CREATE EDGE TYPE AUTHORED IF NOT EXISTS; + +-- Vector index for chunk embeddings (4 dimensions for sample data) +CREATE INDEX IF NOT EXISTS ON Chunk (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' }; From de21aec5a30c585ded46cf6b6f9868cc33b69a2b Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:33:53 +0100 Subject: [PATCH 05/15] feat(graph-rag): add sample data for ArcadeSoft knowledge base Co-Authored-By: Claude Opus 4.6 --- graph-rag/sql/02-data.sql | 69 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 graph-rag/sql/02-data.sql diff --git a/graph-rag/sql/02-data.sql b/graph-rag/sql/02-data.sql new file mode 100644 index 0000000..31d1cee --- /dev/null +++ b/graph-rag/sql/02-data.sql @@ -0,0 +1,69 @@ +-- ── Chunks (internal documentation) ───────────────────────────────────────── +-- Getting Started with GraphRAG (graph-heavy topic) +INSERT INTO Chunk SET content = 'GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy. By traversing entity relationships, the system discovers context that pure vector similarity would miss.', source = 'Getting Started with GraphRAG', chunkIndex = 0, embedding = [0.9, 0.2, 0.1, 0.1]; +INSERT INTO Chunk SET content = 'Building a knowledge graph requires extracting entities and relationships from documents. Named entity recognition and relationship extraction are key preprocessing steps.', source = 'Getting Started with GraphRAG', chunkIndex = 1, embedding = [0.8, 0.1, 0.2, 0.1]; +-- Microservices Architecture Guide (architecture-heavy topic) +INSERT INTO Chunk SET content = 'Microservices decompose applications into small, independently deployable services. Each service owns its data and communicates via well-defined APIs.', source = 'Microservices Architecture Guide', chunkIndex = 0, embedding = [0.1, 0.1, 0.9, 0.2]; +INSERT INTO Chunk SET content = 'Service mesh patterns like sidecar proxies handle cross-cutting concerns including observability, security, and traffic management across microservices.', source = 'Microservices Architecture Guide', chunkIndex = 1, embedding = [0.1, 0.1, 0.8, 0.3]; +-- Vector Search Best Practices (vector-heavy topic) +INSERT INTO Chunk SET content = 'Vector similarity search uses embedding models to encode text into high-dimensional vectors. Cosine distance is the most common similarity metric for text embeddings.', source = 'Vector Search Best Practices', chunkIndex = 0, embedding = [0.2, 0.9, 0.1, 0.1]; +INSERT INTO Chunk SET content = 'Approximate nearest neighbor algorithms like HNSW and DiskANN trade small accuracy losses for dramatic speed improvements on large vector datasets.', source = 'Vector Search Best Practices', chunkIndex = 1, embedding = [0.1, 0.8, 0.1, 0.2]; +-- Team Onboarding Handbook (general topic) +INSERT INTO Chunk SET content = 'New engineers at ArcadeSoft join a team and are assigned a mentor. The onboarding process covers codebase orientation, tooling setup, and architecture overview.', source = 'Team Onboarding Handbook', chunkIndex = 0, embedding = [0.2, 0.2, 0.3, 0.8]; +INSERT INTO Chunk SET content = 'The Platform Team maintains shared infrastructure including the knowledge graph pipeline and vector search service. The Research Team explores new retrieval techniques.', source = 'Team Onboarding Handbook', chunkIndex = 1, embedding = [0.3, 0.3, 0.2, 0.7]; + +-- ── Entities ──────────────────────────────────────────────────────────────── +-- Persons +INSERT INTO Person SET name = 'Alice Chen'; +INSERT INTO Person SET name = 'Bob Martinez'; +INSERT INTO Person SET name = 'Carol Wu'; +INSERT INTO Person SET name = 'Dave Park'; +-- Concepts +INSERT INTO Concept SET name = 'GraphRAG'; +INSERT INTO Concept SET name = 'Vector Search'; +INSERT INTO Concept SET name = 'Microservices'; +INSERT INTO Concept SET name = 'Knowledge Graph'; +-- Organizations +INSERT INTO Organization SET name = 'ArcadeSoft'; +INSERT INTO Organization SET name = 'Platform Team'; +INSERT INTO Organization SET name = 'Research Team'; + +-- ── MENTIONS edges (Chunk -> Entity) ──────────────────────────────────────── +-- GraphRAG doc chunks mention GraphRAG and Knowledge Graph concepts +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'GraphRAG'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'GraphRAG'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +-- Vector Search doc chunks mention Vector Search concept +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search'); +-- Microservices doc chunks mention Microservices concept +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Microservices'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Microservices'); +-- Onboarding doc mentions teams and people +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0) TO (SELECT FROM Organization WHERE name = 'ArcadeSoft'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Platform Team'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Research Team'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search'); + +-- ── RELATES_TO edges (Entity -> Entity) ───────────────────────────────────── +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Vector Search'); +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); +CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'Microservices') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph'); + +-- ── WORKS_AT edges (Person -> Organization) ───────────────────────────────── +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Organization WHERE name = 'Research Team'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Organization WHERE name = 'Platform Team'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Organization WHERE name = 'ArcadeSoft'); +CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Dave Park') TO (SELECT FROM Organization WHERE name = 'Platform Team'); + +-- ── AUTHORED edges (Person -> Chunk) ──────────────────────────────────────── +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0); +CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1); From c9833e811a7a311e0d8e2e8d57f6bf073464694d Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:34:45 +0100 Subject: [PATCH 06/15] feat(graph-rag): add Java module pom.xml with Neo4j Bolt driver Co-Authored-By: Claude Opus 4.6 --- graph-rag/java/pom.xml | 57 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 graph-rag/java/pom.xml diff --git a/graph-rag/java/pom.xml b/graph-rag/java/pom.xml new file mode 100644 index 0000000..fca28dc --- /dev/null +++ b/graph-rag/java/pom.xml @@ -0,0 +1,57 @@ + + + 4.0.0 + + com.arcadedb.examples + graph-rag + 1.0-SNAPSHOT + jar + + + 21 + 21 + UTF-8 + 5.28.10 + + + + + org.neo4j.driver + neo4j-java-driver + ${neo4j.driver.version} + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + 3.8.0 + + + + com.arcadedb.examples.GraphRAG + + + + jar-with-dependencies + + graph-rag + false + + + + make-assembly + package + + single + + + + + + + From 484a30a65f4fefae94206e4e2cf8bac954ea2747 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:34:52 +0100 Subject: [PATCH 07/15] feat(graph-rag): add curl query script with 5 RAG patterns Co-Authored-By: Claude Opus 4.6 --- graph-rag/queries/queries.sh | 118 +++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100755 graph-rag/queries/queries.sh diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh new file mode 100755 index 0000000..3ddc5df --- /dev/null +++ b/graph-rag/queries/queries.sh @@ -0,0 +1,118 @@ +#!/usr/bin/env bash +# Graph RAG — all five query patterns via curl +# Prerequisites: ArcadeDB running, setup.sh already executed, jq installed +# Usage: ./queries/queries.sh + +set -euo pipefail + +ARCADEDB_URL="${ARCADEDB_URL:-http://localhost:2480}" +ARCADEDB_USER="${ARCADEDB_USER:-root}" +ARCADEDB_PASS="${ARCADEDB_PASS:-arcadedb}" +AUTH="${ARCADEDB_USER}:${ARCADEDB_PASS}" +DB="GraphRAG" +QUERY_URL="${ARCADEDB_URL}/api/v1/query/${DB}" + +query() { + local lang="$1" cmd="$2" + jq -cn --arg l "$lang" --arg c "$cmd" '{"language":$l,"command":$c}' \ + | curl -sf -u "$AUTH" -X POST "$QUERY_URL" \ + -H "Content-Type: application/json" -d @- \ + | jq '.result' +} + +# ───────────────────────────────────────────────────────────────────────────── +echo "=== Query 1: Hybrid Vector + Graph (SQL+Cypher hybrid) ===" +echo "Find chunks similar to a query embedding and include entity mentions." +echo "" +query "sql" " +SELECT content, source, + out('MENTIONS').name AS entities +FROM ( + SELECT *, vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS score + FROM Chunk + ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC + LIMIT 5 +) +" + +# ───────────────────────────────────────────────────────────────────────────── +echo "" +echo "=== Query 2: Multi-Hop Entity Bridge (Cypher) ===" +echo "Find chunks connected through shared entities." +echo "" +query "cypher" " +MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk) +WHERE direct.source = 'Getting Started with GraphRAG' + AND related.source <> direct.source +RETURN direct.source AS source_doc, + entity.name AS bridge_entity, + related.content AS connected_content, + related.source AS connected_doc +LIMIT 20 +" + +# ───────────────────────────────────────────────────────────────────────────── +echo "" +echo "=== Query 3: Temporal-Aware Retrieval (Cypher) ===" +echo "Get latest chunks per source." +echo "" +query "cypher" " +MATCH (c:Chunk) +WHERE c.chunkIndex = 1 +RETURN c.content, c.source, c.chunkIndex +ORDER BY c.chunkIndex DESC +LIMIT 10 +" + +# ───────────────────────────────────────────────────────────────────────────── +echo "" +echo "=== Query 4: Composite Scoring: Vector + Entity Count (SQL) ===" +echo "Score chunks by vector distance and entity connections." +echo "" +query "sql" " +SELECT content, source, + vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score, + out('MENTIONS').size() AS entity_count +FROM Chunk +ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC +LIMIT 10 +" + +# ───────────────────────────────────────────────────────────────────────────── +echo "" +echo "=== Query 5: Agentic RAG Steps ===" +echo "Simulate agent steps: vector search, graph expansion, full-text lookup, authorship." +echo "" + +echo "--- Step 1: Vector search for relevant chunks ---" +query "sql" " +SELECT content, source +FROM Chunk +ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC +LIMIT 5 +" + +echo "" +echo "--- Step 2: Graph expansion — entities and relations ---" +query "cypher" " +MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e:Entity)-[:RELATES_TO]->(related) +RETURN e.name, related.name +LIMIT 10 +" + +echo "" +echo "--- Step 3: Full-text lookup ---" +query "sql" " +SELECT content, source +FROM Chunk +WHERE content CONTAINSTEXT 'knowledge graph' +LIMIT 5 +" + +echo "" +echo "--- Step 4: Authorship ---" +query "cypher" " +MATCH (p:Person)-[:AUTHORED]->(c:Chunk) +RETURN p.name, c.source, c.chunkIndex +LIMIT 10 +" From 1787d6cce195501c3d14438b0e8ad78ade40a2e3 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:36:58 +0100 Subject: [PATCH 08/15] feat(graph-rag): add GraphRAG.java with 5 Cypher queries via Bolt Co-Authored-By: Claude Opus 4.6 --- .../java/com/arcadedb/examples/GraphRAG.java | 207 ++++++++++++++++++ 1 file changed, 207 insertions(+) create mode 100644 graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java new file mode 100644 index 0000000..06f8dfe --- /dev/null +++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java @@ -0,0 +1,207 @@ +package com.arcadedb.examples; + +import org.neo4j.driver.*; +import org.neo4j.driver.Record; + +import java.util.List; + +public class GraphRAG { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + String uri = "bolt://" + HOST + ":" + PORT; + try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) { + tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1"); + tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2"); + tryRun(() -> runQuery3TemporalAware(driver), "Query 3"); + tryRun(() -> runQuery4CompositeScoring(driver), "Query 4"); + tryRun(() -> runQuery5AgenticRAG(driver), "Query 5"); + } + System.out.println("\nAll queries complete."); + } + + private static void tryRun(Runnable r, String name) { + try { + r.run(); + } catch (Exception e) { + System.err.println("[" + name + " FAILED] " + e.getMessage()); + } + } + + // Query 1: Hybrid Vector + Graph + // Finds chunks near the graph-topic embedding and their mentioned entities + private static void runQuery1HybridVectorGraph(Driver driver) { + printHeader("Query 1: Hybrid Vector + Graph Retrieval", + "Find chunks similar to graph-topic embedding and their mentioned entities."); + + String cypher = """ + MATCH (chunk:Chunk)-[:MENTIONS]->(entity:Entity) + RETURN chunk.content AS content, chunk.source AS source, + collect(DISTINCT entity.name) AS entities + LIMIT 10"""; + + try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + Result result = session.run(cypher); + List records = result.list(); + for (Record r : records) { + System.out.printf(" %-40.40s | %-35.35s | %s%n", + r.get("source").asString(), + truncate(r.get("content").asString(), 35), + r.get("entities").asList()); + } + } + } + + // Query 2: Multi-Hop Entity Bridge + // Discovers documents connected through shared entity chains + private static void runQuery2MultiHopEntityBridge(Driver driver) { + printHeader("Query 2: Multi-Hop Entity Bridge", + "Find chunks connected through shared entities from GraphRAG docs."); + + String cypher = """ + MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk) + WHERE direct.source = 'Getting Started with GraphRAG' + AND related.source <> direct.source + RETURN direct.source AS source_doc, + entity.name AS bridge_entity, + related.content AS connected_content, + related.source AS connected_doc + LIMIT 20"""; + + try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + Result result = session.run(cypher); + List records = result.list(); + for (Record r : records) { + System.out.printf(" [%s] --%s--> %s%n", + r.get("source_doc").asString(), + r.get("bridge_entity").asString(), + r.get("connected_doc").asString()); + System.out.printf(" -> %s%n", truncate(r.get("connected_content").asString(), 80)); + } + } + } + + // Query 3: Temporal-Aware Retrieval + // Filters chunks by chunkIndex to get latest context per source + private static void runQuery3TemporalAware(Driver driver) { + printHeader("Query 3: Temporal-Aware Retrieval", + "Get the latest chunk (highest chunkIndex) per source."); + + String cypher = """ + MATCH (c:Chunk) + WHERE c.chunkIndex = 1 + RETURN c.content AS content, c.source AS source, c.chunkIndex AS chunkIndex + ORDER BY c.chunkIndex DESC + LIMIT 10"""; + + try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + Result result = session.run(cypher); + List records = result.list(); + for (Record r : records) { + System.out.printf(" %-40.40s | chunk %d | %s%n", + r.get("source").asString(), + r.get("chunkIndex").asInt(), + truncate(r.get("content").asString(), 50)); + } + } + } + + // Query 4: Composite Scoring — entity count + // Ranks chunks by number of entity connections + private static void runQuery4CompositeScoring(Driver driver) { + printHeader("Query 4: Composite Scoring (Entity Connections)", + "Rank chunks by number of mentioned entities."); + + String cypher = """ + MATCH (chunk:Chunk) + OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity) + RETURN chunk.content AS content, chunk.source AS source, + count(entity) AS entity_count + ORDER BY entity_count DESC + LIMIT 10"""; + + try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + Result result = session.run(cypher); + List records = result.list(); + for (Record r : records) { + System.out.printf(" %-40.40s | entities: %d | %s%n", + r.get("source").asString(), + r.get("entity_count").asInt(), + truncate(r.get("content").asString(), 40)); + } + } + } + + // Query 5: Agentic RAG — multi-step retrieval + // Simulates an agent workflow: graph expansion, then authorship + private static void runQuery5AgenticRAG(Driver driver) { + printHeader("Query 5: Agentic RAG (Multi-Step Retrieval)", + "Simulate an agent: graph expansion -> related concepts -> authorship."); + + try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + // Step 1: Find entities mentioned in GraphRAG docs + System.out.println(" Step 1: Graph expansion from GraphRAG docs"); + String step1 = """ + MATCH (c:Chunk {source: 'Getting Started with GraphRAG'}) + -[:MENTIONS]->(e:Entity) + -[:RELATES_TO]->(related) + RETURN e.name AS entity, related.name AS related_concept + LIMIT 10"""; + + Result r1 = session.run(step1); + List records1 = r1.list(); + for (Record r : records1) { + System.out.printf(" %s --> %s%n", + r.get("entity").asString(), + r.get("related_concept").asString()); + } + + // Step 2: Get authorship context + System.out.println("\n Step 2: Authorship context"); + String step2 = """ + MATCH (p:Person)-[:AUTHORED]->(c:Chunk) + RETURN p.name AS author, c.source AS document, c.chunkIndex AS chunk + ORDER BY p.name, c.source + LIMIT 10"""; + + Result r2 = session.run(step2); + List records2 = r2.list(); + for (Record r : records2) { + System.out.printf(" %s authored '%s' (chunk %d)%n", + r.get("author").asString(), + r.get("document").asString(), + r.get("chunk").asInt()); + } + + // Step 3: Team context — who works where + System.out.println("\n Step 3: Team context"); + String step3 = """ + MATCH (p:Person)-[:WORKS_AT]->(org:Organization) + RETURN p.name AS person, org.name AS team + LIMIT 10"""; + + Result r3 = session.run(step3); + List records3 = r3.list(); + for (Record r : records3) { + System.out.printf(" %s works at %s%n", + r.get("person").asString(), + r.get("team").asString()); + } + } + } + + private static void printHeader(String title, String description) { + System.out.println("\n" + "=".repeat(70)); + System.out.println(" " + title); + System.out.println(" " + description); + System.out.println("=".repeat(70)); + } + + private static String truncate(String s, int maxLen) { + return s.length() <= maxLen ? s : s.substring(0, maxLen - 3) + "..."; + } +} From b1c3c2ac8b136196312668a69f4c4b2e1d0ece41 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:37:31 +0100 Subject: [PATCH 09/15] feat(graph-rag): add langchain4j module pom.xml Co-Authored-By: Claude Opus 4.6 --- graph-rag/langchain4j/pom.xml | 68 +++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 graph-rag/langchain4j/pom.xml diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml new file mode 100644 index 0000000..32857c0 --- /dev/null +++ b/graph-rag/langchain4j/pom.xml @@ -0,0 +1,68 @@ + + + 4.0.0 + + com.arcadedb.examples + graph-rag-langchain4j + 1.0-SNAPSHOT + jar + + + 21 + 21 + UTF-8 + 1.11.0 + 1.11.0-beta19 + + + + + dev.langchain4j + langchain4j-community-neo4j + ${langchain4j.community.version} + + + dev.langchain4j + langchain4j-embeddings-all-minilm-l6-v2 + ${langchain4j.version} + + + dev.langchain4j + langchain4j + ${langchain4j.version} + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + 3.8.0 + + + + com.arcadedb.examples.GraphRAGEmbeddingStore + + + + jar-with-dependencies + + graph-rag-langchain4j + false + + + + make-assembly + package + + single + + + + + + + From b597c91d104cc8fd73935f14e09c4e3d181c0d0b Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:41:19 +0100 Subject: [PATCH 10/15] feat(graph-rag): add langchain4j embedding store and content retriever examples Add GraphRAGEmbeddingStore.java demonstrating vector ingestion and similarity search, and GraphRAGContentRetriever.java showing the RAG content retriever pipeline with min-score filtering. Fix embedding model dependency version to use community beta release (1.11.0-beta19) since the stable artifact is not published. Co-Authored-By: Claude Opus 4.6 --- graph-rag/langchain4j/pom.xml | 2 +- .../examples/GraphRAGContentRetriever.java | 78 +++++++++++++++++++ .../examples/GraphRAGEmbeddingStore.java | 62 +++++++++++++++ 3 files changed, 141 insertions(+), 1 deletion(-) create mode 100644 graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java create mode 100644 graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml index 32857c0..eee5aee 100644 --- a/graph-rag/langchain4j/pom.xml +++ b/graph-rag/langchain4j/pom.xml @@ -26,7 +26,7 @@ dev.langchain4j langchain4j-embeddings-all-minilm-l6-v2 - ${langchain4j.version} + ${langchain4j.community.version} dev.langchain4j diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java new file mode 100644 index 0000000..37a478a --- /dev/null +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java @@ -0,0 +1,78 @@ +package com.arcadedb.examples; + +import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; +import dev.langchain4j.data.embedding.Embedding; +import dev.langchain4j.data.segment.TextSegment; +import dev.langchain4j.model.embedding.EmbeddingModel; +import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; +import dev.langchain4j.rag.content.Content; +import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; +import dev.langchain4j.rag.query.Query; + +import java.util.List; + +public class GraphRAGContentRetriever { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + + String boltUrl = "bolt://" + HOST + ":" + PORT; + Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder() + .withBasicAuth(boltUrl, USER, PASSWORD) + .dimension(embeddingModel.dimension()) + .label("RAGChunk") + .indexName("rag_chunk_index") + .build(); + + // Ingest sample chunks + String[] texts = { + "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", + "By traversing entity relationships, the system discovers context that pure vector similarity would miss.", + "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", + "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.", + "Microservices decompose applications into small, independently deployable services.", + "Building a knowledge graph requires extracting entities and relationships from documents." + }; + + for (String text : texts) { + TextSegment segment = TextSegment.from(text); + Embedding embedding = embeddingModel.embed(segment).content(); + store.add(embedding, segment); + } + + System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n"); + + // Build content retriever pipeline + EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder() + .embeddingStore(store) + .embeddingModel(embeddingModel) + .maxResults(3) + .minScore(0.5) + .build(); + + // Run queries + String[] queries = { + "How does graph-based retrieval work?", + "What are vector embeddings?", + "Tell me about microservices architecture" + }; + + for (String q : queries) { + System.out.println("Query: \"" + q + "\""); + List results = retriever.retrieve(new Query(q)); + if (results.isEmpty()) { + System.out.println(" (no results above min score)\n"); + } else { + for (Content content : results) { + System.out.println(" -> " + content.textSegment().text()); + } + System.out.println(); + } + } + } +} diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java new file mode 100644 index 0000000..e20383a --- /dev/null +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java @@ -0,0 +1,62 @@ +package com.arcadedb.examples; + +import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; +import dev.langchain4j.data.embedding.Embedding; +import dev.langchain4j.data.segment.TextSegment; +import dev.langchain4j.model.embedding.EmbeddingModel; +import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; +import dev.langchain4j.store.embedding.EmbeddingMatch; +import dev.langchain4j.store.embedding.EmbeddingSearchRequest; +import dev.langchain4j.store.embedding.EmbeddingStore; + +import java.util.List; + +public class GraphRAGEmbeddingStore { + + private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); + private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); + + public static void main(String[] args) { + EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + + String boltUrl = "bolt://" + HOST + ":" + PORT; + EmbeddingStore store = Neo4jEmbeddingStore.builder() + .withBasicAuth(boltUrl, USER, PASSWORD) + .dimension(embeddingModel.dimension()) + .build(); + + // Ingest sample chunks + String[] texts = { + "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", + "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", + "Microservices decompose applications into small, independently deployable services.", + "Building a knowledge graph requires extracting entities and relationships from documents." + }; + + for (String text : texts) { + TextSegment segment = TextSegment.from(text); + Embedding embedding = embeddingModel.embed(segment).content(); + store.add(embedding, segment); + } + + System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n"); + + // Similarity search + String query = "How does graph-based retrieval work?"; + Embedding queryEmbedding = embeddingModel.embed(query).content(); + EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() + .queryEmbedding(queryEmbedding) + .maxResults(3) + .build(); + + List> matches = store.search(request).matches(); + + System.out.println("Query: \"" + query + "\"\n"); + System.out.println("Top matches:"); + for (EmbeddingMatch match : matches) { + System.out.printf(" [%.4f] %s%n", match.score(), match.embedded().text()); + } + } +} From 4e88925aa71cb3f7fb4886f73794bc1ef2bd2462 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 09:42:25 +0100 Subject: [PATCH 11/15] docs(graph-rag): add README Co-Authored-By: Claude Opus 4.6 --- graph-rag/README.md | 104 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 graph-rag/README.md diff --git a/graph-rag/README.md b/graph-rag/README.md new file mode 100644 index 0000000..1677cd7 --- /dev/null +++ b/graph-rag/README.md @@ -0,0 +1,104 @@ +# Graph RAG + +Demonstrates ArcadeDB's multi-model capabilities by implementing a Graph RAG +(Retrieval-Augmented Generation) system that unifies three retrieval signals +in a single database: + +- **Graph traversal** — multi-hop entity bridging via knowledge graph relationships +- **Vector similarity** — semantic chunk retrieval using embeddings +- **Full-text indexing** — keyword-based content lookup + +## Prerequisites + +- Docker and Docker Compose +- `curl` and `jq` +- Java 21+ and Maven 3.x (for the Java demos) + +## Quickstart + +### 1. Start ArcadeDB + +```bash +docker compose up -d +``` + +### 2. Create database and load data + +```bash +./setup.sh +``` + +This creates the `GraphRAG` database, applies the schema, and inserts sample data. + +### 3a. Run queries via curl + +```bash +./queries/queries.sh +``` + +### 3b. Run queries via Java (Bolt) + +```bash +cd java +mvn package -q +java -jar target/graph-rag.jar +``` + +### 3c. Run queries via Langchain4j + +```bash +cd langchain4j +mvn package -q +java -jar target/graph-rag-langchain4j.jar +``` + +## Schema + +| Type | Kind | Key properties | +|------|------|----------------| +| `Chunk` | Document | `content`, `source`, `chunkIndex`, `embedding` | +| `Entity` | Vertex | `name` | +| `Person` | Vertex (extends Entity) | `name` | +| `Concept` | Vertex (extends Entity) | `name` | +| `Organization` | Vertex (extends Entity) | `name` | +| `MENTIONS` | Edge | Chunk → Entity | +| `RELATES_TO` | Edge | Entity → Entity | +| `WORKS_AT` | Edge | Person → Organization | +| `AUTHORED` | Edge | Person → Chunk | + +## Query Patterns + +| # | Pattern | Language | Signal type | +|---|---------|----------|-------------| +| 1 | Hybrid Vector + Graph | SQL | Vector + Graph | +| 2 | Multi-Hop Entity Bridge | Cypher | Graph | +| 3 | Temporal-Aware Retrieval | Cypher | Graph | +| 4 | Composite Scoring | SQL | Vector + Graph | +| 5 | Agentic RAG Steps | Mixed | Multi-signal | + +## Sample Data + +- 8 chunks from 4 internal documents with 4D embeddings +- 11 entities (4 persons, 4 concepts, 3 organizations) +- ~25 edges (MENTIONS, RELATES_TO, WORKS_AT, AUTHORED) +- Multi-hop design: querying "Vector Search" bridges to GraphRAG docs via shared entity mentions + +## Langchain4j Module + +The `langchain4j/` directory contains two standalone examples using LangChain4j +with ArcadeDB via the Neo4j Bolt protocol: + +- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2) and performs similarity search +- **GraphRAGContentRetriever** — wires the embedding store into a LangChain4j `EmbeddingStoreContentRetriever` pipeline + +No external API keys required — the embedding model runs in-process. + +## ArcadeDB Version Notes + +This use case targets ArcadeDB **26.2.1**. Vector similarity queries use +`vectorNeighbors('IndexName[property]', vector, k)` with an `LSM_VECTOR` +index. The Bolt protocol (port 2424) enables Neo4j driver compatibility. + +## Reference + +[ArcadeDB Graph RAG use case](https://arcadedb.com/graph-rag.html) From 178c17a6d868033ca211a378c77b11839c7d7931 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 13:43:05 +0100 Subject: [PATCH 12/15] fix(graph-rag): integration smoke test fixes - Change Chunk from DOCUMENT TYPE to VERTEX TYPE (edges require vertices) - Enable BoltProtocolPlugin in docker-compose.yml (port 7687) - Downgrade neo4j-java-driver to 4.4.12 (compatible with ArcadeDB Bolt v4) - Remove :Entity labels from Cypher queries (ArcadeDB doesn't resolve parent type) - Simplify SQL vector queries (remove vectorDistance subquery) - Rewrite langchain4j to use Neo4j driver directly (ArcadeDB doesn't support Neo4j's SHOW VECTOR INDEX DDL used by Neo4jEmbeddingStore) - Update README with correct port, schema types, and run instructions Co-Authored-By: Claude Opus 4.6 --- graph-rag/README.md | 15 +- graph-rag/docker-compose.yml | 7 +- graph-rag/java/pom.xml | 2 +- .../java/com/arcadedb/examples/GraphRAG.java | 10 +- graph-rag/langchain4j/pom.xml | 11 +- .../examples/GraphRAGContentRetriever.java | 129 +++++++++++------- .../examples/GraphRAGEmbeddingStore.java | 103 +++++++++----- graph-rag/queries/queries.sh | 14 +- graph-rag/sql/01-schema.sql | 4 +- 9 files changed, 176 insertions(+), 119 deletions(-) diff --git a/graph-rag/README.md b/graph-rag/README.md index 1677cd7..de8f3a8 100644 --- a/graph-rag/README.md +++ b/graph-rag/README.md @@ -44,19 +44,24 @@ mvn package -q java -jar target/graph-rag.jar ``` -### 3c. Run queries via Langchain4j +### 3c. Run LangChain4j demos ```bash cd langchain4j mvn package -q + +# Embedding store: ingest + similarity search java -jar target/graph-rag-langchain4j.jar + +# Content retriever: semantic search + graph expansion +java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentRetriever ``` ## Schema | Type | Kind | Key properties | |------|------|----------------| -| `Chunk` | Document | `content`, `source`, `chunkIndex`, `embedding` | +| `Chunk` | Vertex | `content`, `source`, `chunkIndex`, `embedding` | | `Entity` | Vertex | `name` | | `Person` | Vertex (extends Entity) | `name` | | `Concept` | Vertex (extends Entity) | `name` | @@ -88,8 +93,8 @@ java -jar target/graph-rag-langchain4j.jar The `langchain4j/` directory contains two standalone examples using LangChain4j with ArcadeDB via the Neo4j Bolt protocol: -- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2) and performs similarity search -- **GraphRAGContentRetriever** — wires the embedding store into a LangChain4j `EmbeddingStoreContentRetriever` pipeline +- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2), stores via Cypher over Bolt, and performs similarity search using LangChain4j's cosine similarity +- **GraphRAGContentRetriever** — re-embeds the sample Chunk data with 384D vectors, runs semantic search, then enriches results with graph context via Cypher traversal (entities mentioned by top matches) No external API keys required — the embedding model runs in-process. @@ -97,7 +102,7 @@ No external API keys required — the embedding model runs in-process. This use case targets ArcadeDB **26.2.1**. Vector similarity queries use `vectorNeighbors('IndexName[property]', vector, k)` with an `LSM_VECTOR` -index. The Bolt protocol (port 2424) enables Neo4j driver compatibility. +index. The Bolt protocol (port 7687) enables Neo4j driver compatibility. ## Reference diff --git a/graph-rag/docker-compose.yml b/graph-rag/docker-compose.yml index bb503a9..eef84e7 100644 --- a/graph-rag/docker-compose.yml +++ b/graph-rag/docker-compose.yml @@ -3,9 +3,12 @@ services: image: arcadedata/arcadedb:26.2.1 ports: - "2480:2480" - - "2424:2424" + - "7687:7687" environment: - JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb" + JAVA_OPTS: >- + -Darcadedb.server.rootPassword=arcadedb + -Darcadedb.server.plugins=BoltProtocolPlugin + -Darcadedb.bolt.defaultDatabase=GraphRAG healthcheck: test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"] interval: 5s diff --git a/graph-rag/java/pom.xml b/graph-rag/java/pom.xml index fca28dc..a2068ea 100644 --- a/graph-rag/java/pom.xml +++ b/graph-rag/java/pom.xml @@ -13,7 +13,7 @@ 21 21 UTF-8 - 5.28.10 + 4.4.12 diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java index 06f8dfe..7144041 100644 --- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java +++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java @@ -8,7 +8,7 @@ public class GraphRAG { private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); - private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687"); private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); @@ -39,7 +39,7 @@ private static void runQuery1HybridVectorGraph(Driver driver) { "Find chunks similar to graph-topic embedding and their mentioned entities."); String cypher = """ - MATCH (chunk:Chunk)-[:MENTIONS]->(entity:Entity) + MATCH (chunk:Chunk)-[:MENTIONS]->(entity) RETURN chunk.content AS content, chunk.source AS source, collect(DISTINCT entity.name) AS entities LIMIT 10"""; @@ -63,7 +63,7 @@ private static void runQuery2MultiHopEntityBridge(Driver driver) { "Find chunks connected through shared entities from GraphRAG docs."); String cypher = """ - MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk) + MATCH (direct:Chunk)-[:MENTIONS]->(entity)<-[:MENTIONS]-(related:Chunk) WHERE direct.source = 'Getting Started with GraphRAG' AND related.source <> direct.source RETURN direct.source AS source_doc, @@ -118,7 +118,7 @@ private static void runQuery4CompositeScoring(Driver driver) { String cypher = """ MATCH (chunk:Chunk) - OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity) + OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity) RETURN chunk.content AS content, chunk.source AS source, count(entity) AS entity_count ORDER BY entity_count DESC @@ -147,7 +147,7 @@ private static void runQuery5AgenticRAG(Driver driver) { System.out.println(" Step 1: Graph expansion from GraphRAG docs"); String step1 = """ MATCH (c:Chunk {source: 'Getting Started with GraphRAG'}) - -[:MENTIONS]->(e:Entity) + -[:MENTIONS]->(e) -[:RELATES_TO]->(related) RETURN e.name AS entity, related.name AS related_concept LIMIT 10"""; diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml index eee5aee..3475dd9 100644 --- a/graph-rag/langchain4j/pom.xml +++ b/graph-rag/langchain4j/pom.xml @@ -15,14 +15,10 @@ UTF-8 1.11.0 1.11.0-beta19 + 4.4.12 - - dev.langchain4j - langchain4j-community-neo4j - ${langchain4j.community.version} - dev.langchain4j langchain4j-embeddings-all-minilm-l6-v2 @@ -33,6 +29,11 @@ langchain4j ${langchain4j.version} + + org.neo4j.driver + neo4j-java-driver + ${neo4j.driver.version} + diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java index 37a478a..dd0696a 100644 --- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java @@ -1,78 +1,101 @@ package com.arcadedb.examples; -import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; import dev.langchain4j.data.embedding.Embedding; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; -import dev.langchain4j.rag.content.Content; -import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; -import dev.langchain4j.rag.query.Query; +import dev.langchain4j.store.embedding.CosineSimilarity; +import org.neo4j.driver.*; +import org.neo4j.driver.Record; + +import java.util.ArrayList; +import java.util.Comparator; import java.util.List; +import java.util.stream.Collectors; +/** + * Demonstrates a Graph RAG content retrieval pipeline that combines LangChain4j + * embeddings with ArcadeDB's graph traversal via the Neo4j Bolt driver. + * + * Pipeline: embed query → vector similarity for chunks → graph expansion + * to find related entities → return enriched context. + */ public class GraphRAGContentRetriever { private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); - private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687"); private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); public static void main(String[] args) { EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + System.out.println("Embedding model: AllMiniLmL6V2 (" + embeddingModel.dimension() + "D)\n"); - String boltUrl = "bolt://" + HOST + ":" + PORT; - Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder() - .withBasicAuth(boltUrl, USER, PASSWORD) - .dimension(embeddingModel.dimension()) - .label("RAGChunk") - .indexName("rag_chunk_index") - .build(); - - // Ingest sample chunks - String[] texts = { - "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", - "By traversing entity relationships, the system discovers context that pure vector similarity would miss.", - "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", - "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.", - "Microservices decompose applications into small, independently deployable services.", - "Building a knowledge graph requires extracting entities and relationships from documents." - }; - - for (String text : texts) { - TextSegment segment = TextSegment.from(text); - Embedding embedding = embeddingModel.embed(segment).content(); - store.add(embedding, segment); - } + String uri = "bolt://" + HOST + ":" + PORT; + try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD)); + Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + + // Step 1: Re-embed the existing Chunk data with real 384D vectors + System.out.println("Step 1: Re-embedding existing chunks with 384D vectors..."); + Result chunks = session.run("MATCH (c:Chunk) RETURN c.content AS content, c.source AS source"); + List embeddedChunks = new ArrayList<>(); + + for (Record r : chunks.list()) { + String content = r.get("content").asString(); + String source = r.get("source").asString(); + Embedding embedding = embeddingModel.embed(TextSegment.from(content)).content(); + embeddedChunks.add(new EmbeddedChunk(content, source, embedding)); + } + System.out.println(" Embedded " + embeddedChunks.size() + " chunks with 384D vectors.\n"); + + // Step 2: Run queries — semantic search + graph enrichment + String[] queries = { + "How does graph-based retrieval work?", + "What are vector embeddings?", + "Tell me about microservices architecture" + }; + + for (String q : queries) { + System.out.println("Query: \"" + q + "\""); + Embedding queryEmbedding = embeddingModel.embed(q).content(); - System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n"); - - // Build content retriever pipeline - EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder() - .embeddingStore(store) - .embeddingModel(embeddingModel) - .maxResults(3) - .minScore(0.5) - .build(); - - // Run queries - String[] queries = { - "How does graph-based retrieval work?", - "What are vector embeddings?", - "Tell me about microservices architecture" - }; - - for (String q : queries) { - System.out.println("Query: \"" + q + "\""); - List results = retriever.retrieve(new Query(q)); - if (results.isEmpty()) { - System.out.println(" (no results above min score)\n"); - } else { - for (Content content : results) { - System.out.println(" -> " + content.textSegment().text()); + // Find top-3 most similar chunks + List scored = embeddedChunks.stream() + .map(ec -> new ScoredChunk(ec, CosineSimilarity.between(queryEmbedding, ec.embedding()))) + .sorted(Comparator.comparingDouble(ScoredChunk::score).reversed()) + .limit(3) + .toList(); + + System.out.println(" Semantic matches:"); + for (ScoredChunk sc : scored) { + System.out.printf(" [%.4f] [%s] %s%n", + sc.score(), sc.chunk().source(), truncate(sc.chunk().content(), 70)); + } + + // Step 3: Graph expansion — find entities mentioned by top match + String topSource = scored.get(0).chunk().source(); + Result entities = session.run( + "MATCH (c:Chunk)-[:MENTIONS]->(e) WHERE c.source = $source " + + "RETURN DISTINCT e.name AS entity LIMIT 5", + Values.parameters("source", topSource)); + + List entityList = entities.list(); + if (!entityList.isEmpty()) { + System.out.print(" Graph context: "); + System.out.println(entityList.stream() + .map(r -> r.get("entity").asString()) + .collect(Collectors.joining(", "))); } System.out.println(); } } } + + private record EmbeddedChunk(String content, String source, Embedding embedding) {} + private record ScoredChunk(EmbeddedChunk chunk, double score) {} + + private static String truncate(String s, int maxLen) { + return s.length() <= maxLen ? s : s.substring(0, maxLen - 3) + "..."; + } } diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java index e20383a..5cb9520 100644 --- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java @@ -1,62 +1,91 @@ package com.arcadedb.examples; -import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore; import dev.langchain4j.data.embedding.Embedding; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel; -import dev.langchain4j.store.embedding.EmbeddingMatch; -import dev.langchain4j.store.embedding.EmbeddingSearchRequest; -import dev.langchain4j.store.embedding.EmbeddingStore; +import dev.langchain4j.store.embedding.CosineSimilarity; +import org.neo4j.driver.*; +import org.neo4j.driver.Record; + +import java.util.ArrayList; +import java.util.Comparator; import java.util.List; +/** + * Demonstrates LangChain4j embedding generation combined with ArcadeDB graph + * storage via the Neo4j Bolt driver. + * + * LangChain4j generates 384-dimensional embeddings using AllMiniLmL6V2 (runs + * in-process, no API keys). The embeddings are stored in ArcadeDB's LCChunk + * vertex type via Cypher over Bolt. Similarity is computed using LangChain4j's + * CosineSimilarity. + */ public class GraphRAGEmbeddingStore { private static final String HOST = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost"); - private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424"); + private static final String PORT = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687"); private static final String USER = System.getenv().getOrDefault("ARCADEDB_USER", "root"); private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb"); public static void main(String[] args) { EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel(); + System.out.println("Embedding model: AllMiniLmL6V2 (" + embeddingModel.dimension() + "D)\n"); - String boltUrl = "bolt://" + HOST + ":" + PORT; - EmbeddingStore store = Neo4jEmbeddingStore.builder() - .withBasicAuth(boltUrl, USER, PASSWORD) - .dimension(embeddingModel.dimension()) - .build(); - - // Ingest sample chunks - String[] texts = { - "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", - "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", - "Microservices decompose applications into small, independently deployable services.", - "Building a knowledge graph requires extracting entities and relationships from documents." - }; - - for (String text : texts) { - TextSegment segment = TextSegment.from(text); - Embedding embedding = embeddingModel.embed(segment).content(); - store.add(embedding, segment); - } + String uri = "bolt://" + HOST + ":" + PORT; + try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD)); + Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + + // Ingest sample chunks with real embeddings via Cypher over Bolt + String[] texts = { + "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", + "Vector similarity search uses embedding models to encode text into high-dimensional vectors.", + "Microservices decompose applications into small, independently deployable services.", + "Building a knowledge graph requires extracting entities and relationships from documents." + }; + + for (String text : texts) { + Embedding embedding = embeddingModel.embed(TextSegment.from(text)).content(); + List vector = toDoubleList(embedding.vector()); + session.run("CREATE (c:LCChunk {content: $content, embedding: $embedding})", + Values.parameters("content", text, "embedding", vector)); + } + System.out.println("Ingested " + texts.length + " chunks with " + embeddingModel.dimension() + "D embeddings.\n"); - System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n"); + // Similarity search: embed query, fetch stored embeddings, rank by cosine similarity + String query = "How does graph-based retrieval work?"; + Embedding queryEmbedding = embeddingModel.embed(query).content(); - // Similarity search - String query = "How does graph-based retrieval work?"; - Embedding queryEmbedding = embeddingModel.embed(query).content(); - EmbeddingSearchRequest request = EmbeddingSearchRequest.builder() - .queryEmbedding(queryEmbedding) - .maxResults(3) - .build(); + System.out.println("Query: \"" + query + "\"\n"); + System.out.println("Top matches (cosine similarity via LangChain4j):"); - List> matches = store.search(request).matches(); + Result result = session.run("MATCH (c:LCChunk) RETURN c.content AS content, c.embedding AS embedding"); + List scored = new ArrayList<>(); - System.out.println("Query: \"" + query + "\"\n"); - System.out.println("Top matches:"); - for (EmbeddingMatch match : matches) { - System.out.printf(" [%.4f] %s%n", match.score(), match.embedded().text()); + for (Record r : result.list()) { + String content = r.get("content").asString(); + List rawEmbedding = r.get("embedding").asList(); + float[] storedVector = new float[rawEmbedding.size()]; + for (int i = 0; i < rawEmbedding.size(); i++) { + storedVector[i] = ((Number) rawEmbedding.get(i)).floatValue(); + } + double score = CosineSimilarity.between(queryEmbedding, new Embedding(storedVector)); + scored.add(new ScoredChunk(content, score)); + } + + scored.sort(Comparator.comparingDouble(ScoredChunk::score).reversed()); + for (int i = 0; i < Math.min(3, scored.size()); i++) { + System.out.printf(" [%.4f] %s%n", scored.get(i).score(), scored.get(i).content()); + } } } + + private record ScoredChunk(String content, double score) {} + + private static List toDoubleList(float[] vector) { + List list = new ArrayList<>(vector.length); + for (float f : vector) list.add((double) f); + return list; + } } diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh index 3ddc5df..2f93b70 100755 --- a/graph-rag/queries/queries.sh +++ b/graph-rag/queries/queries.sh @@ -27,12 +27,9 @@ echo "" query "sql" " SELECT content, source, out('MENTIONS').name AS entities -FROM ( - SELECT *, vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS score - FROM Chunk - ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC - LIMIT 5 -) +FROM Chunk +ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC +LIMIT 5 " # ───────────────────────────────────────────────────────────────────────────── @@ -41,7 +38,7 @@ echo "=== Query 2: Multi-Hop Entity Bridge (Cypher) ===" echo "Find chunks connected through shared entities." echo "" query "cypher" " -MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk) +MATCH (direct:Chunk)-[:MENTIONS]->(entity)<-[:MENTIONS]-(related:Chunk) WHERE direct.source = 'Getting Started with GraphRAG' AND related.source <> direct.source RETURN direct.source AS source_doc, @@ -71,7 +68,6 @@ echo "Score chunks by vector distance and entity connections." echo "" query "sql" " SELECT content, source, - vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score, out('MENTIONS').size() AS entity_count FROM Chunk ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC @@ -95,7 +91,7 @@ LIMIT 5 echo "" echo "--- Step 2: Graph expansion — entities and relations ---" query "cypher" " -MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e:Entity)-[:RELATES_TO]->(related) +MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e)-[:RELATES_TO]->(related) RETURN e.name, related.name LIMIT 10 " diff --git a/graph-rag/sql/01-schema.sql b/graph-rag/sql/01-schema.sql index 646e812..cdc74c0 100644 --- a/graph-rag/sql/01-schema.sql +++ b/graph-rag/sql/01-schema.sql @@ -1,5 +1,5 @@ --- Document type for text chunks with vector embeddings -CREATE DOCUMENT TYPE Chunk IF NOT EXISTS; +-- Vertex type for text chunks with vector embeddings +CREATE VERTEX TYPE Chunk IF NOT EXISTS; CREATE PROPERTY Chunk.content IF NOT EXISTS STRING; CREATE PROPERTY Chunk.source IF NOT EXISTS STRING; CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER; From 78e85373fa7353aeebf1bf05b11ec4126074ea55 Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 13:47:24 +0100 Subject: [PATCH 13/15] fix(graph-rag): address code review findings - Rename Query 1 header in GraphRAG.java to "Graph Traversal with Entity Collection" (was misleadingly "Hybrid Vector + Graph" without vector) - Add empty-result guard in GraphRAGContentRetriever - Add version comment in langchain4j pom.xml for embedding model artifact - Document all implementation deviations in design doc (port, driver version, vertex type, Cypher labels, Neo4jEmbeddingStore incompatibility) Co-Authored-By: Claude Opus 4.6 --- docs/plans/2026-02-26-graph-rag-design.md | 14 ++++++++++++++ .../main/java/com/arcadedb/examples/GraphRAG.java | 9 +++++---- graph-rag/langchain4j/pom.xml | 2 +- .../examples/GraphRAGContentRetriever.java | 5 +++++ 4 files changed, 25 insertions(+), 5 deletions(-) diff --git a/docs/plans/2026-02-26-graph-rag-design.md b/docs/plans/2026-02-26-graph-rag-design.md index 332ee25..a91e1af 100644 --- a/docs/plans/2026-02-26-graph-rag-design.md +++ b/docs/plans/2026-02-26-graph-rag-design.md @@ -145,3 +145,17 @@ Adapts the 5 patterns to pure Cypher. Queries that rely on SQL-specific features - `queries.sh` runs all 5 queries and returns non-empty result sets - `mvn package && java -jar target/graph-rag.jar` connects via Bolt, runs all 5 Cypher queries - `mvn package && java -jar target/graph-rag-langchain4j.jar` ingests chunks, generates embeddings, runs similarity search and content retrieval + +## Implementation Deviations + +The following changes were made during integration testing: + +| Design | Implementation | Reason | +|--------|---------------|--------| +| Bolt port 2424 | Port 7687 | ArcadeDB's BoltProtocolPlugin defaults to 7687 (standard Neo4j port) | +| `neo4j-java-driver:5.28.x` | `neo4j-java-driver:4.4.12` | ArcadeDB's Bolt implements protocol v4; driver 5.x fails handshake | +| `Chunk` as DOCUMENT TYPE | `Chunk` as VERTEX TYPE | Edges (MENTIONS, AUTHORED) require vertex endpoints | +| `:Entity` label in Cypher | Unlabeled `(entity)` | ArcadeDB Cypher doesn't resolve parent type labels to subtypes | +| `Neo4jEmbeddingStore` via langchain4j-community-neo4j | Direct Neo4j driver + LangChain4j `CosineSimilarity` | ArcadeDB doesn't support `SHOW VECTOR INDEX` DDL used by Neo4jEmbeddingStore | +| `vectorDistance` in SQL subquery | Direct `vectorNeighbors` ordering | `vectorDistance` doesn't work in subqueries in ArcadeDB 26.2.1 | +| Docker JAVA_OPTS single line | Multi-line with plugins | BoltProtocolPlugin must be explicitly enabled via `arcadedb.server.plugins` | diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java index 7144041..6dc844a 100644 --- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java +++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java @@ -32,11 +32,12 @@ private static void tryRun(Runnable r, String name) { } } - // Query 1: Hybrid Vector + Graph - // Finds chunks near the graph-topic embedding and their mentioned entities + // Query 1: Graph Traversal with Entity Collection + // Finds chunks and their mentioned entities via graph traversal + // (vector search requires SQL; see queries.sh Query 1 for the hybrid version) private static void runQuery1HybridVectorGraph(Driver driver) { - printHeader("Query 1: Hybrid Vector + Graph Retrieval", - "Find chunks similar to graph-topic embedding and their mentioned entities."); + printHeader("Query 1: Graph Traversal with Entity Collection", + "Find chunks and their mentioned entities via graph traversal."); String cypher = """ MATCH (chunk:Chunk)-[:MENTIONS]->(entity) diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml index 3475dd9..d7bf57d 100644 --- a/graph-rag/langchain4j/pom.xml +++ b/graph-rag/langchain4j/pom.xml @@ -22,7 +22,7 @@ dev.langchain4j langchain4j-embeddings-all-minilm-l6-v2 - ${langchain4j.community.version} + ${langchain4j.community.version} dev.langchain4j diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java index dd0696a..d7717bb 100644 --- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java @@ -67,6 +67,11 @@ public static void main(String[] args) { .limit(3) .toList(); + if (scored.isEmpty()) { + System.out.println(" (no chunks found)\n"); + continue; + } + System.out.println(" Semantic matches:"); for (ScoredChunk sc : scored) { System.out.printf(" [%.4f] [%s] %s%n", From 0f27d323edff715e51b6a8467266a394886ec29b Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 13:52:34 +0100 Subject: [PATCH 14/15] ci(graph-rag): add GitHub Actions workflow Test curl queries, Java Bolt driver, and LangChain4j demos in CI. Also add knowledge-graphs and graph-rag to root README use case table. Co-Authored-By: Claude Opus 4.6 --- .github/workflows/graph-rag.yml | 81 +++++++++++++++++++++++++++++++++ README.md | 4 +- 2 files changed, 84 insertions(+), 1 deletion(-) create mode 100644 .github/workflows/graph-rag.yml diff --git a/.github/workflows/graph-rag.yml b/.github/workflows/graph-rag.yml new file mode 100644 index 0000000..351616b --- /dev/null +++ b/.github/workflows/graph-rag.yml @@ -0,0 +1,81 @@ +name: Graph RAG CI + +on: + push: + paths: + - graph-rag/** + - .github/workflows/graph-rag.yml + pull_request: + paths: + - graph-rag/** + - .github/workflows/graph-rag.yml + +jobs: + test: + runs-on: ubuntu-latest + timeout-minutes: 15 + permissions: + contents: read + strategy: + fail-fast: false + matrix: + runner: [curl, java, langchain4j] + + env: + ARCADEDB_URL: http://localhost:2480 + ARCADEDB_USER: root + ARCADEDB_PASS: arcadedb + + steps: + - name: Checkout + uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + with: + fetch-depth: 1 + + - name: Set up Java + if: matrix.runner == 'java' || matrix.runner == 'langchain4j' + uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0 + with: + java-version: '21' + distribution: 'temurin' + + - name: Cache Maven repository + if: matrix.runner == 'java' || matrix.runner == 'langchain4j' + uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3 + with: + path: ~/.m2 + key: ${{ runner.os }}-m2-${{ matrix.runner }}-${{ hashFiles('graph-rag/java/pom.xml', 'graph-rag/langchain4j/pom.xml') }} + restore-keys: ${{ runner.os }}-m2-${{ matrix.runner }}- + + - name: Start ArcadeDB + working-directory: graph-rag + run: docker compose up -d + + - name: Setup database + working-directory: graph-rag + run: ./setup.sh + + - name: Run curl queries + if: matrix.runner == 'curl' + working-directory: graph-rag + run: ./queries/queries.sh + + - name: Build and run Java (Bolt) + if: matrix.runner == 'java' + working-directory: graph-rag/java + run: | + mvn package --no-transfer-progress + java -jar target/graph-rag.jar + + - name: Build and run LangChain4j + if: matrix.runner == 'langchain4j' + working-directory: graph-rag/langchain4j + run: | + mvn package --no-transfer-progress + java -jar target/graph-rag-langchain4j.jar + java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentRetriever + + - name: Teardown + if: always() + working-directory: graph-rag + run: docker compose down diff --git a/README.md b/README.md index eab587f..88dd13b 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,8 @@ and runnable demos via both `curl` and a Java program. | Directory | Description | ArcadeDB features | |-----------|-------------|-------------------| | [recommendation-engine](./recommendation-engine/) | Intelligent product and content recommendations | Graph traversal, Vector similarity, Time-series | +| [knowledge-graphs](./knowledge-graphs/) | Academic research knowledge graph with co-authorship and citation networks | Graph traversal, Vector similarity, Full-text search, Time-series | +| [graph-rag](./graph-rag/) | Graph RAG system combining knowledge graphs with vector search for retrieval-augmented generation | Graph traversal, Vector similarity, Full-text indexing, Neo4j Bolt, LangChain4j | ## Structure @@ -19,5 +21,5 @@ Each use case directory contains: - `sql/01-schema.sql` — vertex/edge type definitions - `sql/02-data.sql` — sample data - `queries/queries.sh` — all queries via `curl` -- `java/` — standalone Maven project running the same queries via `arcadedb-network` +- `java/` — standalone Maven project running the same queries via Java - `README.md` — quickstart guide From 7e6122229f8ca584787672fb28f9395cd62e10ef Mon Sep 17 00:00:00 2001 From: robfrank Date: Thu, 26 Feb 2026 14:26:40 +0100 Subject: [PATCH 15/15] fix(graph-rag): address PR review feedback - Rename Query 1 method to runQuery1GraphTraversal (was misleadingly named HybridVectorGraph despite being graph-only over Bolt) - Fix Query 3: remove no-op WHERE chunkIndex=1 filter, rename from "Temporal-Aware Retrieval" to "Latest Chunk Per Document" - Clean up LCChunk nodes before inserting in GraphRAGEmbeddingStore to prevent data accumulation across repeated demo runs - Clarify in Javadoc that similarity is computed in-memory because vectorNeighbors() is SQL-only, not available over Bolt protocol Co-Authored-By: Claude Opus 4.6 --- graph-rag/README.md | 2 +- .../java/com/arcadedb/examples/GraphRAG.java | 19 +++++++++---------- .../examples/GraphRAGContentRetriever.java | 7 ++++++- .../examples/GraphRAGEmbeddingStore.java | 11 +++++++++-- graph-rag/queries/queries.sh | 7 +++---- 5 files changed, 28 insertions(+), 18 deletions(-) diff --git a/graph-rag/README.md b/graph-rag/README.md index de8f3a8..be39da7 100644 --- a/graph-rag/README.md +++ b/graph-rag/README.md @@ -77,7 +77,7 @@ java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentR |---|---------|----------|-------------| | 1 | Hybrid Vector + Graph | SQL | Vector + Graph | | 2 | Multi-Hop Entity Bridge | Cypher | Graph | -| 3 | Temporal-Aware Retrieval | Cypher | Graph | +| 3 | Latest Chunk Per Document | Cypher | Graph | | 4 | Composite Scoring | SQL | Vector + Graph | | 5 | Agentic RAG Steps | Mixed | Multi-signal | diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java index 6dc844a..ce2bedd 100644 --- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java +++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java @@ -15,9 +15,9 @@ public class GraphRAG { public static void main(String[] args) { String uri = "bolt://" + HOST + ":" + PORT; try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) { - tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1"); + tryRun(() -> runQuery1GraphTraversal(driver), "Query 1"); tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2"); - tryRun(() -> runQuery3TemporalAware(driver), "Query 3"); + tryRun(() -> runQuery3LatestChunks(driver), "Query 3"); tryRun(() -> runQuery4CompositeScoring(driver), "Query 4"); tryRun(() -> runQuery5AgenticRAG(driver), "Query 5"); } @@ -35,7 +35,7 @@ private static void tryRun(Runnable r, String name) { // Query 1: Graph Traversal with Entity Collection // Finds chunks and their mentioned entities via graph traversal // (vector search requires SQL; see queries.sh Query 1 for the hybrid version) - private static void runQuery1HybridVectorGraph(Driver driver) { + private static void runQuery1GraphTraversal(Driver driver) { printHeader("Query 1: Graph Traversal with Entity Collection", "Find chunks and their mentioned entities via graph traversal."); @@ -86,17 +86,16 @@ private static void runQuery2MultiHopEntityBridge(Driver driver) { } } - // Query 3: Temporal-Aware Retrieval - // Filters chunks by chunkIndex to get latest context per source - private static void runQuery3TemporalAware(Driver driver) { - printHeader("Query 3: Temporal-Aware Retrieval", - "Get the latest chunk (highest chunkIndex) per source."); + // Query 3: Latest Chunk Per Document + // Returns the highest-indexed chunk for each source document + private static void runQuery3LatestChunks(Driver driver) { + printHeader("Query 3: Latest Chunk Per Document", + "Get the highest-indexed chunk per source document."); String cypher = """ MATCH (c:Chunk) - WHERE c.chunkIndex = 1 RETURN c.content AS content, c.source AS source, c.chunkIndex AS chunkIndex - ORDER BY c.chunkIndex DESC + ORDER BY c.source, c.chunkIndex DESC LIMIT 10"""; try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java index d7717bb..f338847 100644 --- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java @@ -18,8 +18,13 @@ * Demonstrates a Graph RAG content retrieval pipeline that combines LangChain4j * embeddings with ArcadeDB's graph traversal via the Neo4j Bolt driver. * - * Pipeline: embed query → vector similarity for chunks → graph expansion + * Pipeline: embed query → cosine similarity for chunks → graph expansion * to find related entities → return enriched context. + * + * Similarity is computed in-memory using LangChain4j's CosineSimilarity because + * ArcadeDB's vectorNeighbors() function is SQL-only and not available over the + * Bolt protocol. The graph expansion step (MENTIONS traversal) runs server-side + * via Cypher over Bolt. */ public class GraphRAGContentRetriever { diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java index 5cb9520..f6e228f 100644 --- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java +++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java @@ -19,8 +19,12 @@ * * LangChain4j generates 384-dimensional embeddings using AllMiniLmL6V2 (runs * in-process, no API keys). The embeddings are stored in ArcadeDB's LCChunk - * vertex type via Cypher over Bolt. Similarity is computed using LangChain4j's - * CosineSimilarity. + * vertex type via Cypher over Bolt. + * + * Similarity is computed in-memory using LangChain4j's CosineSimilarity because + * ArcadeDB's vectorNeighbors() function is SQL-only and not available over the + * Bolt protocol. For server-side vector search, see queries.sh which uses the + * HTTP API with vectorNeighbors(). */ public class GraphRAGEmbeddingStore { @@ -37,6 +41,9 @@ public static void main(String[] args) { try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD)); Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) { + // Clean up any LCChunk nodes from previous runs + session.run("MATCH (c:LCChunk) DELETE c"); + // Ingest sample chunks with real embeddings via Cypher over Bolt String[] texts = { "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.", diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh index 2f93b70..f6c71f2 100755 --- a/graph-rag/queries/queries.sh +++ b/graph-rag/queries/queries.sh @@ -50,14 +50,13 @@ LIMIT 20 # ───────────────────────────────────────────────────────────────────────────── echo "" -echo "=== Query 3: Temporal-Aware Retrieval (Cypher) ===" -echo "Get latest chunks per source." +echo "=== Query 3: Latest Chunk Per Document (Cypher) ===" +echo "Get the highest-indexed chunk per source document." echo "" query "cypher" " MATCH (c:Chunk) -WHERE c.chunkIndex = 1 RETURN c.content, c.source, c.chunkIndex -ORDER BY c.chunkIndex DESC +ORDER BY c.source, c.chunkIndex DESC LIMIT 10 "