From ca2ed6f3b67ee809069fc1feab829b6c16c55276 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:23:44 +0100
Subject: [PATCH 01/15] docs: add Graph RAG use case design

Defines schema, sample data, queries, and module structure for the
Graph RAG use case based on arcadedb.com/graph-rag.html. Includes
Neo4j Bolt driver Java module and langchain4j submodule with local
AllMiniLmL6V2 embeddings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/plans/2026-02-26-graph-rag-design.md | 147 ++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 docs/plans/2026-02-26-graph-rag-design.md

diff --git a/docs/plans/2026-02-26-graph-rag-design.md b/docs/plans/2026-02-26-graph-rag-design.md
new file mode 100644
index 0000000..332ee25
--- /dev/null
+++ b/docs/plans/2026-02-26-graph-rag-design.md
@@ -0,0 +1,147 @@
+# Graph RAG Use Case — Design
+
+**Date:** 2026-02-26
+**Branch:** feat/graph-rag
+**ArcadeDB version:** 26.2.1
+
+## Overview
+
+Implement the [ArcadeDB Graph RAG](https://arcadedb.com/graph-rag.html) use case following the same structure as the recommendation-engine. The use case demonstrates ArcadeDB's ability to unify vector search, graph traversal, and full-text indexing for retrieval-augmented generation — without requiring multiple databases or ETL pipelines.
+
+Key differences from recommendation-engine:
+- Java module uses **Neo4j Bolt driver** (`neo4j-java-driver`) and **Cypher** as query language, connecting via `bolt://localhost:2424`
+- Additional **langchain4j** submodule demonstrates `Neo4jEmbeddingStore` and `EmbeddingStoreContentRetriever` with local `AllMiniLmL6V2` embeddings (no external API keys)
+
+## Repository Structure
+
+```
+graph-rag/
+├── README.md
+├── docker-compose.yml
+├── setup.sh
+├── sql/
+│   ├── 01-schema.sql
+│   └── 02-data.sql
+├── queries/
+│   └── queries.sh
+├── java/
+│   ├── pom.xml
+│   └── src/main/java/com/arcadedb/examples/
+│       └── GraphRAG.java
+└── langchain4j/
+    ├── pom.xml
+    └── src/main/java/com/arcadedb/examples/
+        ├── GraphRAGEmbeddingStore.java
+        └── GraphRAGContentRetriever.java
+```
+
+## Docker Compose
+
+- Single service: `arcadedata/arcadedb:26.2.1`
+- Ports exposed: `2480` (HTTP API), `2424` (Bolt)
+- Root password via `JAVA_OPTS: -Darcadedb.server.rootPassword=arcadedb`
+- Health check on `http://localhost:2480/api/v1/ready`
+
+## Schema (`sql/01-schema.sql`)
+
+One document type, four vertex types, and four edge types:
+
+**Document:**
+- `Chunk` — `content` (STRING), `source` (STRING), `chunkIndex` (INTEGER), `embedding` (LIST)
+- Vector index on `Chunk(embedding)`: LSM, 4 dimensions, COSINE
+
+**Vertices:**
+- `Entity` — `name` (STRING)
+- `Person EXTENDS Entity`
+- `Concept EXTENDS Entity`
+- `Organization EXTENDS Entity`
+
+**Edges:**
+- `MENTIONS` — Chunk -> Entity
+- `RELATES_TO` — Entity -> Entity
+- `WORKS_AT` — Person -> Organization
+- `AUTHORED` — Person -> Chunk
+
+## Sample Data (`sql/02-data.sql`)
+
+**Domain:** Fictional tech company "ArcadeSoft" knowledge base.
+
+**Chunks (~8-10):** Snippets from internal documentation:
+- "Getting Started with GraphRAG" (2 chunks)
+- "Microservices Architecture Guide" (2 chunks)
+- "Vector Search Best Practices" (2 chunks)
+- "Team Onboarding Handbook" (2 chunks)
+
+Each chunk has a hand-crafted 4D embedding reflecting its topic (e.g. graph-heavy docs: `[0.9, 0.1, 0.2, 0.1]`, vector-heavy: `[0.1, 0.9, 0.2, 0.1]`).
+
+**Entities (~8-10):**
+- Persons: Alice Chen, Bob Martinez, Carol Wu, Dave Park
+- Concepts: GraphRAG, Vector Search, Microservices, Knowledge Graph
+- Organizations: ArcadeSoft, Platform Team, Research Team
+
+**Edges (~20-25):**
+- MENTIONS: chunks reference concepts and people
+- RELATES_TO: GraphRAG -> Vector Search, GraphRAG -> Knowledge Graph, Microservices -> Knowledge Graph
+- WORKS_AT: Alice -> Research Team, Bob -> Platform Team, Carol -> ArcadeSoft, Dave -> Platform Team
+- AUTHORED: Alice -> GraphRAG doc chunks, Bob -> Microservices doc chunks
+
+**Design intent:** Multi-hop queries work because querying "Vector Search" finds a chunk that MENTIONS the "GraphRAG" concept, which is MENTIONED by other chunks about GraphRAG — creating entity bridges. RELATES_TO edges form a small concept graph for traversal.
+
+## Queries
+
+### `queries/queries.sh` — 5 labeled sections via curl
+
+| # | Pattern | Language | Description |
+|---|---------|----------|-------------|
+| 1 | Hybrid Vector + Graph | Cypher | Vector search for similar chunks, traverse MENTIONS to find entities and connected chunks |
+| 2 | Multi-Hop Entity Bridge | Cypher | Find chunks connected through entity chains: query chunk -> entity -> related chunk |
+| 3 | Temporal-Aware Retrieval | Cypher | Filter chunks by `chunkIndex` ordering, return most recent context first |
+| 4 | Triple Hybrid | SQL | Composite scoring: vector distance + `CONTAINSTEXT` keyword + entity connection count |
+| 5 | Agentic RAG Steps | Mixed | 4-step sequence: vector search, graph expansion, full-text lookup, context assembly |
+
+### `java/GraphRAG.java` — All Cypher via Bolt
+
+Adapts the 5 patterns to pure Cypher. Queries that rely on SQL-specific features are adapted:
+- Query 4: vector distance + entity count (2-signal composite, no full-text)
+- Query 5: vector search -> graph expansion -> collect results (3 steps, no full-text)
+
+### `langchain4j/` — 2 example classes
+
+1. **GraphRAGEmbeddingStore** — ingest text chunks, generate real 384D embeddings with AllMiniLmL6V2, store in ArcadeDB via `Neo4jEmbeddingStore` over Bolt, run similarity searches
+2. **GraphRAGContentRetriever** — wire `Neo4jEmbeddingStore` into a langchain4j `EmbeddingStoreContentRetriever` pipeline, query with natural language, print retrieved chunks with scores
+
+## Java Module (`java/`)
+
+- **Build tool:** Maven (standalone `pom.xml`, no parent)
+- **Dependency:** `org.neo4j.driver:neo4j-java-driver:5.28.x`
+- **Java:** 21
+- **Output:** fat JAR via maven-assembly-plugin -> `graph-rag.jar`
+- **Entry point:** `GraphRAG.java` with `main` method that:
+  1. Opens a Neo4j `Driver` connection to `bolt://localhost:2424`
+  2. Runs all 5 queries sequentially in Cypher
+  3. Prints header and formatted results for each query
+  4. Closes the driver
+
+## Langchain4j Module (`langchain4j/`)
+
+- **Build tool:** Maven (standalone `pom.xml`, no parent, no Spring Boot)
+- **Dependencies:** `langchain4j-community-neo4j`, `langchain4j-embeddings-all-minilm-l6-v2`, `neo4j-java-driver`
+- **Java:** 21
+- **Output:** fat JAR via maven-assembly-plugin -> `graph-rag-langchain4j.jar`
+- **No external API keys required** — AllMiniLmL6V2 runs in-process
+
+## Setup
+
+`setup.sh` follows the recommendation-engine pattern:
+1. Wait for ArcadeDB ready endpoint
+2. Create database `GraphRAG` via HTTP API
+3. Apply `sql/01-schema.sql`
+4. Apply `sql/02-data.sql`
+
+## Success Criteria
+
+- `docker compose up` starts ArcadeDB with both HTTP and Bolt ports
+- SQL files apply cleanly via `setup.sh`
+- `queries.sh` runs all 5 queries and returns non-empty result sets
+- `mvn package && java -jar target/graph-rag.jar` connects via Bolt, runs all 5 Cypher queries
+- `mvn package && java -jar target/graph-rag-langchain4j.jar` ingests chunks, generates embeddings, runs similarity search and content retrieval

From 58cabd2b24670c2bf8a6205592ad339e24f7a62f Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:26:46 +0100
Subject: [PATCH 02/15] docs: add Graph RAG implementation plan

11-task plan covering Docker Compose, schema, sample data, curl queries,
Java Bolt/Cypher module, langchain4j embedding store and content retriever
modules, README, and integration smoke test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/plans/2026-02-26-graph-rag.md | 786 +++++++++++++++++++++++++++++
 1 file changed, 786 insertions(+)
 create mode 100644 docs/plans/2026-02-26-graph-rag.md

diff --git a/docs/plans/2026-02-26-graph-rag.md b/docs/plans/2026-02-26-graph-rag.md
new file mode 100644
index 0000000..a4e0948
--- /dev/null
+++ b/docs/plans/2026-02-26-graph-rag.md
@@ -0,0 +1,786 @@
+# Graph RAG Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Implement the Graph RAG use case demonstrating ArcadeDB's unified vector search, graph traversal, and full-text indexing for retrieval-augmented generation.
+
+**Architecture:** Self-contained `graph-rag/` directory mirroring the recommendation-engine structure. Docker Compose exposes HTTP (2480) and Bolt (2424). Base Java module uses Neo4j Bolt driver with pure Cypher. Langchain4j sibling module uses `Neo4jEmbeddingStore` with local AllMiniLmL6V2 embeddings (no API keys).
+
+**Tech Stack:** ArcadeDB 26.2.1, Neo4j Java Driver 5.28.10, LangChain4j Community Neo4j 1.11.0-beta19, AllMiniLmL6V2 embedding model, Java 21, Maven
+
+**Design doc:** `docs/plans/2026-02-26-graph-rag-design.md`
+
+**Reference implementation:** `recommendation-engine/` (same repo)
+
+---
+
+### Task 1: Docker Compose and setup script
+
+**Files:**
+- Create: `graph-rag/docker-compose.yml`
+- Create: `graph-rag/setup.sh`
+
+**Step 1: Create docker-compose.yml**
+
+```yaml
+services:
+  arcadedb:
+    image: arcadedata/arcadedb:26.2.1
+    ports:
+      - "2480:2480"
+      - "2424:2424"
+    environment:
+      JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb"
+    healthcheck:
+      test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"]
+      interval: 5s
+      timeout: 3s
+      retries: 20
+      start_period: 10s
+```
+
+Note: identical to `recommendation-engine/docker-compose.yml` except adds port `2424:2424` for Bolt.
+
+**Step 2: Create setup.sh**
+
+Copy `recommendation-engine/setup.sh` and change `DB_NAME="RecommendationEngine"` to `DB_NAME="GraphRAG"`. Everything else is the same: wait for ready, create database, apply SQL files.
+
+**Step 3: Make setup.sh executable**
+
+Run: `chmod +x graph-rag/setup.sh`
+
+**Step 4: Commit**
+
+```bash
+git add graph-rag/docker-compose.yml graph-rag/setup.sh
+git commit -m "feat(graph-rag): add docker-compose and setup script"
+```
+
+---
+
+### Task 2: SQL schema
+
+**Files:**
+- Create: `graph-rag/sql/01-schema.sql`
+
+**Step 1: Create 01-schema.sql**
+
+```sql
+-- Document type for text chunks with vector embeddings
+CREATE DOCUMENT TYPE Chunk IF NOT EXISTS;
+CREATE PROPERTY Chunk.content IF NOT EXISTS STRING;
+CREATE PROPERTY Chunk.source IF NOT EXISTS STRING;
+CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER;
+CREATE PROPERTY Chunk.embedding IF NOT EXISTS LIST;
+
+-- Entity vertex types (knowledge graph nodes)
+CREATE VERTEX TYPE Entity IF NOT EXISTS;
+CREATE PROPERTY Entity.name IF NOT EXISTS STRING;
+CREATE VERTEX TYPE Person IF NOT EXISTS EXTENDS Entity;
+CREATE VERTEX TYPE Concept IF NOT EXISTS EXTENDS Entity;
+CREATE VERTEX TYPE Organization IF NOT EXISTS EXTENDS Entity;
+
+-- Edge types
+CREATE EDGE TYPE MENTIONS IF NOT EXISTS;
+CREATE EDGE TYPE RELATES_TO IF NOT EXISTS;
+CREATE EDGE TYPE WORKS_AT IF NOT EXISTS;
+CREATE EDGE TYPE AUTHORED IF NOT EXISTS;
+
+-- Vector index for chunk embeddings (4 dimensions for sample data)
+CREATE INDEX IF NOT EXISTS ON Chunk (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' };
+```
+
+**Step 2: Commit**
+
+```bash
+git add graph-rag/sql/01-schema.sql
+git commit -m "feat(graph-rag): add schema definition"
+```
+
+---
+
+### Task 3: Sample data
+
+**Files:**
+- Create: `graph-rag/sql/02-data.sql`
+
+**Step 1: Create 02-data.sql**
+
+The data represents a fictional tech company "ArcadeSoft" knowledge base. Embeddings are 4D vectors where dimensions loosely represent: [graph, vector, architecture, general].
+
+```sql
+-- ── Chunks (internal documentation) ─────────────────────────────────────────
+-- Getting Started with GraphRAG (graph-heavy topic)
+INSERT INTO Chunk SET content = 'GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy. By traversing entity relationships, the system discovers context that pure vector similarity would miss.', source = 'Getting Started with GraphRAG', chunkIndex = 0, embedding = [0.9, 0.2, 0.1, 0.1];
+INSERT INTO Chunk SET content = 'Building a knowledge graph requires extracting entities and relationships from documents. Named entity recognition and relationship extraction are key preprocessing steps.', source = 'Getting Started with GraphRAG', chunkIndex = 1, embedding = [0.8, 0.1, 0.2, 0.1];
+-- Microservices Architecture Guide (architecture-heavy topic)
+INSERT INTO Chunk SET content = 'Microservices decompose applications into small, independently deployable services. Each service owns its data and communicates via well-defined APIs.', source = 'Microservices Architecture Guide', chunkIndex = 0, embedding = [0.1, 0.1, 0.9, 0.2];
+INSERT INTO Chunk SET content = 'Service mesh patterns like sidecar proxies handle cross-cutting concerns including observability, security, and traffic management across microservices.', source = 'Microservices Architecture Guide', chunkIndex = 1, embedding = [0.1, 0.1, 0.8, 0.3];
+-- Vector Search Best Practices (vector-heavy topic)
+INSERT INTO Chunk SET content = 'Vector similarity search uses embedding models to encode text into high-dimensional vectors. Cosine distance is the most common similarity metric for text embeddings.', source = 'Vector Search Best Practices', chunkIndex = 0, embedding = [0.2, 0.9, 0.1, 0.1];
+INSERT INTO Chunk SET content = 'Approximate nearest neighbor algorithms like HNSW and DiskANN trade small accuracy losses for dramatic speed improvements on large vector datasets.', source = 'Vector Search Best Practices', chunkIndex = 1, embedding = [0.1, 0.8, 0.1, 0.2];
+-- Team Onboarding Handbook (general topic)
+INSERT INTO Chunk SET content = 'New engineers at ArcadeSoft join a team and are assigned a mentor. The onboarding process covers codebase orientation, tooling setup, and architecture overview.', source = 'Team Onboarding Handbook', chunkIndex = 0, embedding = [0.2, 0.2, 0.3, 0.8];
+INSERT INTO Chunk SET content = 'The Platform Team maintains shared infrastructure including the knowledge graph pipeline and vector search service. The Research Team explores new retrieval techniques.', source = 'Team Onboarding Handbook', chunkIndex = 1, embedding = [0.3, 0.3, 0.2, 0.7];
+
+-- ── Entities ────────────────────────────────────────────────────────────────
+-- Persons
+INSERT INTO Person SET name = 'Alice Chen';
+INSERT INTO Person SET name = 'Bob Martinez';
+INSERT INTO Person SET name = 'Carol Wu';
+INSERT INTO Person SET name = 'Dave Park';
+-- Concepts
+INSERT INTO Concept SET name = 'GraphRAG';
+INSERT INTO Concept SET name = 'Vector Search';
+INSERT INTO Concept SET name = 'Microservices';
+INSERT INTO Concept SET name = 'Knowledge Graph';
+-- Organizations
+INSERT INTO Organization SET name = 'ArcadeSoft';
+INSERT INTO Organization SET name = 'Platform Team';
+INSERT INTO Organization SET name = 'Research Team';
+
+-- ── MENTIONS edges (Chunk -> Entity) ────────────────────────────────────────
+-- GraphRAG doc chunks mention GraphRAG and Knowledge Graph concepts
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'GraphRAG');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'GraphRAG');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+-- Vector Search doc chunks mention Vector Search concept
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+-- Microservices doc chunks mention Microservices concept
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Microservices');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Microservices');
+-- Onboarding doc mentions teams and people
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0) TO (SELECT FROM Organization WHERE name = 'ArcadeSoft');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Platform Team');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Research Team');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+
+-- ── RELATES_TO edges (Entity -> Entity) ─────────────────────────────────────
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'Microservices') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+
+-- ── WORKS_AT edges (Person -> Organization) ─────────────────────────────────
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Organization WHERE name = 'Research Team');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Organization WHERE name = 'Platform Team');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Organization WHERE name = 'ArcadeSoft');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Dave Park') TO (SELECT FROM Organization WHERE name = 'Platform Team');
+
+-- ── AUTHORED edges (Person -> Chunk) ────────────────────────────────────────
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1);
+```
+
+**Data design validation:** Querying for chunks similar to `[0.9, 0.2, 0.1, 0.1]` (graph topic) returns GraphRAG chunks. Those chunks MENTION the "GraphRAG" concept, which RELATES_TO "Vector Search" and "Knowledge Graph". Following MENTIONS back from those concepts leads to Vector Search doc chunks and the Onboarding Handbook chunk — creating the multi-hop entity bridge.
+
+**Step 2: Commit**
+
+```bash
+git add graph-rag/sql/02-data.sql
+git commit -m "feat(graph-rag): add sample data for ArcadeSoft knowledge base"
+```
+
+---
+
+### Task 4: Curl queries script
+
+**Files:**
+- Create: `graph-rag/queries/queries.sh`
+
+**Step 1: Create queries.sh**
+
+Follow the exact structure from `recommendation-engine/queries/queries.sh`: shebang, env vars, `query()` helper function, 5 labeled sections. Database name is `GraphRAG`.
+
+**Queries:**
+
+1. **Hybrid Vector + Graph (Cypher)** — Vector search for chunks similar to `[0.9, 0.2, 0.1, 0.1]`, then traverse MENTIONS to find entities and related chunks:
+
+```cypher
+MATCH (chunk:Chunk)
+WHERE chunk.embedding <> []
+WITH chunk, vectorDistance('Chunk[embedding]', chunk.embedding, [0.9, 0.2, 0.1, 0.1]) AS score
+WHERE score < 0.5
+OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity)
+OPTIONAL MATCH (entity)<-[:MENTIONS]-(related:Chunk)
+WHERE related <> chunk
+RETURN chunk.content, chunk.source, score,
+       collect(DISTINCT entity.name) AS entities,
+       collect(DISTINCT related.source) AS related_docs
+ORDER BY score ASC
+LIMIT 10
+```
+
+Note: The exact Cypher syntax for ArcadeDB vector functions may need adjustment during implementation. ArcadeDB's Cypher support may require using `vectorNeighbors` via SQL instead. The `queries.sh` can use SQL for vector-heavy queries. Adapt during implementation based on what ArcadeDB 26.2.1 actually supports.
+
+2. **Multi-Hop Entity Bridge (Cypher)** — Find chunks connected through shared entities:
+
+```cypher
+MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk)
+WHERE direct.source = 'Getting Started with GraphRAG'
+  AND related.source <> direct.source
+RETURN direct.source AS source_doc,
+       entity.name AS bridge_entity,
+       entity.@class AS entity_type,
+       related.content AS connected_content,
+       related.source AS connected_doc
+LIMIT 20
+```
+
+3. **Temporal-Aware Retrieval (Cypher)** — Filter by chunkIndex to get latest chunks per source:
+
+```cypher
+MATCH (c:Chunk)
+WHERE c.chunkIndex = 1
+RETURN c.content, c.source, c.chunkIndex
+ORDER BY c.chunkIndex DESC
+LIMIT 10
+```
+
+4. **Triple Hybrid: Vector + Full-Text + Graph (SQL)** — Composite scoring:
+
+```sql
+SELECT content, source,
+       vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score,
+       out('MENTIONS').size() AS entity_count
+FROM Chunk
+ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC
+LIMIT 10
+```
+
+5. **Agentic RAG Steps (Mixed)** — 4 sequential queries:
+   - Step 1 (Cypher): Vector search for relevant chunks
+   - Step 2 (Cypher): Graph expansion from found entities
+   - Step 3 (SQL): Full-text lookup with `CONTAINSTEXT`
+   - Step 4 (Cypher): Get authorship context
+
+**Step 2: Make executable**
+
+Run: `chmod +x graph-rag/queries/queries.sh`
+
+**Step 3: Commit**
+
+```bash
+git add graph-rag/queries/queries.sh
+git commit -m "feat(graph-rag): add curl query script with 5 RAG patterns"
+```
+
+---
+
+### Task 5: Java module — pom.xml
+
+**Files:**
+- Create: `graph-rag/java/pom.xml`
+
+**Step 1: Create pom.xml**
+
+Mirror `recommendation-engine/java/pom.xml` structure but swap the dependency from `arcadedb-network` to `neo4j-java-driver`:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+
+  <groupId>com.arcadedb.examples</groupId>
+  <artifactId>graph-rag</artifactId>
+  <version>1.0-SNAPSHOT</version>
+  <packaging>jar</packaging>
+
+  <properties>
+    <maven.compiler.source>21</maven.compiler.source>
+    <maven.compiler.target>21</maven.compiler.target>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <neo4j.driver.version>5.28.10</neo4j.driver.version>
+  </properties>
+
+  <dependencies>
+    <dependency>
+      <groupId>org.neo4j.driver</groupId>
+      <artifactId>neo4j-java-driver</artifactId>
+      <version>${neo4j.driver.version}</version>
+    </dependency>
+  </dependencies>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-assembly-plugin</artifactId>
+        <version>3.8.0</version>
+        <configuration>
+          <archive>
+            <manifest>
+              <mainClass>com.arcadedb.examples.GraphRAG</mainClass>
+            </manifest>
+          </archive>
+          <descriptorRefs>
+            <descriptorRef>jar-with-dependencies</descriptorRef>
+          </descriptorRefs>
+          <finalName>graph-rag</finalName>
+          <appendAssemblyId>false</appendAssemblyId>
+        </configuration>
+        <executions>
+          <execution>
+            <id>make-assembly</id>
+            <phase>package</phase>
+            <goals>
+              <goal>single</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>
+```
+
+**Step 2: Commit**
+
+```bash
+git add graph-rag/java/pom.xml
+git commit -m "feat(graph-rag): add Java module pom.xml with Neo4j Bolt driver"
+```
+
+---
+
+### Task 6: Java module — GraphRAG.java
+
+**Files:**
+- Create: `graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java`
+
+**Step 1: Create GraphRAG.java**
+
+Follow `RecommendationEngine.java` structure: config constants from env vars, `main()` calling 5 query methods via `tryRun()`, `printHeader()` helper.
+
+Key differences from recommendation-engine:
+- Uses `org.neo4j.driver.Driver` instead of `RemoteDatabase`
+- Connects via `bolt://HOST:PORT` (default `bolt://localhost:2424`)
+- Uses `driver.session()` and `session.run(cypher)` instead of `db.query()`
+- All queries are Cypher (no SQL fallback)
+- Results accessed via `record.get("fieldName")` instead of `r.getProperty()`
+
+```java
+package com.arcadedb.examples;
+
+import org.neo4j.driver.*;
+import org.neo4j.driver.Record;
+
+public class GraphRAG {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    String uri = "bolt://" + HOST + ":" + PORT;
+    try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) {
+      tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1");
+      tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2");
+      tryRun(() -> runQuery3TemporalAware(driver), "Query 3");
+      tryRun(() -> runQuery4CompositeScoring(driver), "Query 4");
+      tryRun(() -> runQuery5AgenticRAG(driver), "Query 5");
+    }
+    System.out.println("\nAll queries complete.");
+  }
+
+  // ... tryRun, printHeader same pattern as RecommendationEngine.java
+  // ... 5 query methods using driver.session() and session.run()
+}
+```
+
+Each query method:
+1. Calls `printHeader()` with title and description
+2. Opens a `Session` with `driver.session(SessionConfig.forDatabase("GraphRAG"))`
+3. Runs the Cypher query via `session.run(cypher)`
+4. Iterates `Result` and prints formatted output via `record.get(...)`
+
+**Query adaptations for pure Cypher via Bolt:**
+
+- Q1 (Hybrid Vector + Graph): Uses available Cypher vector functions. If `vectorDistance` is not available in Cypher over Bolt, fall back to matching chunks by source and traversing MENTIONS instead.
+- Q2 (Multi-Hop Entity Bridge): Pure graph traversal — works directly in Cypher.
+- Q3 (Temporal-Aware): Simple MATCH with WHERE/ORDER BY on chunkIndex — pure Cypher.
+- Q4 (Composite Scoring): Adapt to Cypher — count entity connections via `size((chunk)-[:MENTIONS]->())`.
+- Q5 (Agentic RAG): Multiple sequential queries within the same session — vector search, graph expansion, authorship context.
+
+**Important:** ArcadeDB's Bolt protocol may not support all Cypher features identically to Neo4j. During implementation, test each query against the running ArcadeDB instance and adjust syntax as needed. The `queries.sh` script serves as the reference for what ArcadeDB supports.
+
+**Step 2: Verify compilation**
+
+Run: `cd graph-rag/java && mvn compile -q`
+Expected: BUILD SUCCESS
+
+**Step 3: Build fat JAR**
+
+Run: `mvn package -q`
+Expected: `target/graph-rag.jar` created
+
+**Step 4: Commit**
+
+```bash
+git add graph-rag/java/src/
+git commit -m "feat(graph-rag): add GraphRAG.java with 5 Cypher queries via Bolt"
+```
+
+---
+
+### Task 7: Langchain4j module — pom.xml
+
+**Files:**
+- Create: `graph-rag/langchain4j/pom.xml`
+
+**Step 1: Create pom.xml**
+
+Standalone POM (no parent, no Spring Boot). Dependencies:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+
+  <groupId>com.arcadedb.examples</groupId>
+  <artifactId>graph-rag-langchain4j</artifactId>
+  <version>1.0-SNAPSHOT</version>
+  <packaging>jar</packaging>
+
+  <properties>
+    <maven.compiler.source>21</maven.compiler.source>
+    <maven.compiler.target>21</maven.compiler.target>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <langchain4j.version>1.11.0</langchain4j.version>
+    <langchain4j.community.version>1.11.0-beta19</langchain4j.community.version>
+  </properties>
+
+  <dependencies>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j-community-neo4j</artifactId>
+      <version>${langchain4j.community.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
+      <version>${langchain4j.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j</artifactId>
+      <version>${langchain4j.version}</version>
+    </dependency>
+  </dependencies>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-assembly-plugin</artifactId>
+        <version>3.8.0</version>
+        <configuration>
+          <archive>
+            <manifest>
+              <mainClass>com.arcadedb.examples.GraphRAGEmbeddingStore</mainClass>
+            </manifest>
+          </archive>
+          <descriptorRefs>
+            <descriptorRef>jar-with-dependencies</descriptorRef>
+          </descriptorRefs>
+          <finalName>graph-rag-langchain4j</finalName>
+          <appendAssemblyId>false</appendAssemblyId>
+        </configuration>
+        <executions>
+          <execution>
+            <id>make-assembly</id>
+            <phase>package</phase>
+            <goals>
+              <goal>single</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>
+```
+
+**Step 2: Commit**
+
+```bash
+git add graph-rag/langchain4j/pom.xml
+git commit -m "feat(graph-rag): add langchain4j module pom.xml"
+```
+
+---
+
+### Task 8: Langchain4j — GraphRAGEmbeddingStore.java
+
+**Files:**
+- Create: `graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java`
+
+**Step 1: Create GraphRAGEmbeddingStore.java**
+
+Demonstrates: ingest text chunks, generate real 384D embeddings with AllMiniLmL6V2, store in ArcadeDB via `Neo4jEmbeddingStore` over Bolt, then run similarity searches.
+
+```java
+package com.arcadedb.examples;
+
+import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
+import dev.langchain4j.data.embedding.Embedding;
+import dev.langchain4j.data.segment.TextSegment;
+import dev.langchain4j.model.embedding.EmbeddingModel;
+import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
+import dev.langchain4j.store.embedding.EmbeddingMatch;
+import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
+import dev.langchain4j.store.embedding.EmbeddingStore;
+
+import java.util.List;
+
+public class GraphRAGEmbeddingStore {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+
+    String boltUrl = "bolt://" + HOST + ":" + PORT;
+    EmbeddingStore<TextSegment> store = Neo4jEmbeddingStore.builder()
+        .withBasicAuth(boltUrl, USER, PASSWORD)
+        .dimension(embeddingModel.dimension())
+        .build();
+
+    // Ingest sample chunks
+    String[] texts = {
+        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
+        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
+        "Microservices decompose applications into small, independently deployable services.",
+        "Building a knowledge graph requires extracting entities and relationships from documents."
+    };
+
+    for (String text : texts) {
+      TextSegment segment = TextSegment.from(text);
+      Embedding embedding = embeddingModel.embed(segment).content();
+      store.add(embedding, segment);
+    }
+
+    System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n");
+
+    // Similarity search
+    String query = "How does graph-based retrieval work?";
+    Embedding queryEmbedding = embeddingModel.embed(query).content();
+    EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
+        .queryEmbedding(queryEmbedding)
+        .maxResults(3)
+        .build();
+
+    List<EmbeddingMatch<TextSegment>> matches = store.search(request).matches();
+
+    System.out.println("Query: \"" + query + "\"\n");
+    System.out.println("Top matches:");
+    for (EmbeddingMatch<TextSegment> match : matches) {
+      System.out.printf("  [%.4f] %s%n", match.score(), match.embedded().text());
+    }
+  }
+}
+```
+
+**Step 2: Verify compilation**
+
+Run: `cd graph-rag/langchain4j && mvn compile -q`
+Expected: BUILD SUCCESS
+
+**Step 3: Commit**
+
+```bash
+git add graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
+git commit -m "feat(graph-rag): add langchain4j embedding store example"
+```
+
+---
+
+### Task 9: Langchain4j — GraphRAGContentRetriever.java
+
+**Files:**
+- Create: `graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java`
+
+**Step 1: Create GraphRAGContentRetriever.java**
+
+Wires `Neo4jEmbeddingStore` into a langchain4j `EmbeddingStoreContentRetriever` pipeline. Ingests chunks, then queries with natural language and prints retrieved content with scores.
+
+```java
+package com.arcadedb.examples;
+
+import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
+import dev.langchain4j.data.embedding.Embedding;
+import dev.langchain4j.data.segment.TextSegment;
+import dev.langchain4j.model.embedding.EmbeddingModel;
+import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
+import dev.langchain4j.rag.content.Content;
+import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
+import dev.langchain4j.rag.query.Query;
+
+import java.util.List;
+
+public class GraphRAGContentRetriever {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+
+    String boltUrl = "bolt://" + HOST + ":" + PORT;
+    Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder()
+        .withBasicAuth(boltUrl, USER, PASSWORD)
+        .dimension(embeddingModel.dimension())
+        .label("RAGChunk")
+        .indexName("rag_chunk_index")
+        .build();
+
+    // Ingest sample chunks
+    String[] texts = {
+        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
+        "By traversing entity relationships, the system discovers context that pure vector similarity would miss.",
+        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
+        "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.",
+        "Microservices decompose applications into small, independently deployable services.",
+        "Building a knowledge graph requires extracting entities and relationships from documents."
+    };
+
+    for (String text : texts) {
+      TextSegment segment = TextSegment.from(text);
+      Embedding embedding = embeddingModel.embed(segment).content();
+      store.add(embedding, segment);
+    }
+
+    System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n");
+
+    // Build content retriever pipeline
+    EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
+        .embeddingStore(store)
+        .embeddingModel(embeddingModel)
+        .maxResults(3)
+        .minScore(0.5)
+        .build();
+
+    // Run queries
+    String[] queries = {
+        "How does graph-based retrieval work?",
+        "What are vector embeddings?",
+        "Tell me about microservices architecture"
+    };
+
+    for (String q : queries) {
+      System.out.println("Query: \"" + q + "\"");
+      List<Content> results = retriever.retrieve(new Query(q));
+      if (results.isEmpty()) {
+        System.out.println("  (no results above min score)\n");
+      } else {
+        for (Content content : results) {
+          System.out.println("  -> " + content.textSegment().text());
+        }
+        System.out.println();
+      }
+    }
+  }
+}
+```
+
+**Step 2: Verify compilation**
+
+Run: `cd graph-rag/langchain4j && mvn compile -q`
+Expected: BUILD SUCCESS
+
+**Step 3: Commit**
+
+```bash
+git add graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
+git commit -m "feat(graph-rag): add langchain4j content retriever example"
+```
+
+---
+
+### Task 10: README
+
+**Files:**
+- Create: `graph-rag/README.md`
+
+**Step 1: Create README.md**
+
+Follow `recommendation-engine/README.md` structure exactly. Sections:
+
+1. **Title and overview** — Graph RAG: unifies vector search, graph traversal, and full-text indexing
+2. **Prerequisites** — Docker/Compose, curl/jq, Java 21+, Maven 3.x
+3. **Quickstart** — 5 steps: docker compose up, setup.sh, queries.sh, java JAR, langchain4j JAR
+4. **Schema table** — Chunk (document), Entity/Person/Concept/Organization (vertex), MENTIONS/RELATES_TO/WORKS_AT/AUTHORED (edge)
+5. **Query patterns table** — 5 queries with language and signal type
+6. **Sample data** — 8 chunks, 11 entities, ~25 edges with overlap design
+7. **Langchain4j module** — embedding store + content retriever, no API keys
+8. **ArcadeDB version notes** — 26.2.1, Bolt protocol, vector index details
+9. **Reference link** — arcadedb.com/graph-rag.html
+
+**Step 2: Commit**
+
+```bash
+git add graph-rag/README.md
+git commit -m "docs(graph-rag): add README"
+```
+
+---
+
+### Task 11: Integration smoke test
+
+**Step 1: Start ArcadeDB**
+
+Run: `cd graph-rag && docker compose up -d`
+Expected: ArcadeDB container starts, health check passes
+
+**Step 2: Run setup**
+
+Run: `./setup.sh`
+Expected: "Setup complete. GraphRAG is ready."
+
+**Step 3: Run curl queries**
+
+Run: `./queries/queries.sh`
+Expected: All 5 queries return non-empty result sets
+
+**Step 4: Run Java module**
+
+Run: `cd java && mvn package -q && java -jar target/graph-rag.jar`
+Expected: All 5 queries print results via Bolt
+
+**Step 5: Run langchain4j module**
+
+Run: `cd ../langchain4j && mvn package -q && java -jar target/graph-rag-langchain4j.jar`
+Expected: Chunks ingested, similarity search returns ranked results
+
+**Step 6: Fix any issues discovered during smoke test**
+
+Adjust query syntax, fix Cypher/Bolt compatibility issues, update data as needed.
+
+**Step 7: Commit any fixes**
+
+```bash
+git add -A graph-rag/
+git commit -m "fix(graph-rag): adjust queries after integration testing"
+```
+
+**Step 8: Stop ArcadeDB**
+
+Run: `docker compose down`

From e6a199c05c727a6f9e0bd83ea6a49ef39d4810b7 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:32:37 +0100
Subject: [PATCH 03/15] feat(graph-rag): add docker-compose and setup script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/docker-compose.yml | 14 ++++++++++
 graph-rag/setup.sh           | 51 ++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)
 create mode 100644 graph-rag/docker-compose.yml
 create mode 100755 graph-rag/setup.sh

diff --git a/graph-rag/docker-compose.yml b/graph-rag/docker-compose.yml
new file mode 100644
index 0000000..bb503a9
--- /dev/null
+++ b/graph-rag/docker-compose.yml
@@ -0,0 +1,14 @@
+services:
+  arcadedb:
+    image: arcadedata/arcadedb:26.2.1
+    ports:
+      - "2480:2480"
+      - "2424:2424"
+    environment:
+      JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb"
+    healthcheck:
+      test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"]
+      interval: 5s
+      timeout: 3s
+      retries: 20
+      start_period: 10s
diff --git a/graph-rag/setup.sh b/graph-rag/setup.sh
new file mode 100755
index 0000000..75ef0c5
--- /dev/null
+++ b/graph-rag/setup.sh
@@ -0,0 +1,51 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ARCADEDB_URL="${ARCADEDB_URL:-http://localhost:2480}"
+ARCADEDB_USER="${ARCADEDB_USER:-root}"
+ARCADEDB_PASS="${ARCADEDB_PASS:-arcadedb}"
+DB_NAME="GraphRAG"
+
+# ── Wait for ArcadeDB ─────────────────────────────────────────────────────────
+echo "Waiting for ArcadeDB at ${ARCADEDB_URL}..."
+until curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
+    "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
+  sleep 2
+done
+echo "ArcadeDB is ready."
+
+# ── Create database ───────────────────────────────────────────────────────────
+echo "Creating database ${DB_NAME}..."
+curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
+  -X POST "${ARCADEDB_URL}/api/v1/server" \
+  -H "Content-Type: application/json" \
+  -d "{\"command\": \"create database ${DB_NAME}\"}" > /dev/null || true
+echo "Database ready."
+
+# ── Helper: send one SQL statement ───────────────────────────────────────────
+send_sql() {
+  local stmt="$1"
+  jq -cn --arg cmd "$stmt" '{"language":"sql","command":$cmd}' \
+    | curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
+        -X POST "${ARCADEDB_URL}/api/v1/command/${DB_NAME}" \
+        -H "Content-Type: application/json" \
+        -d @- > /dev/null
+}
+
+# ── Apply a SQL file (one statement per line) ─────────────────────────────────
+apply_file() {
+  local file="$1"
+  echo "Applying ${file}..."
+  while IFS= read -r line || [[ -n "$line" ]]; do
+    # skip blank lines and SQL comments
+    [[ -z "${line//[[:space:]]/}" || "$line" =~ ^[[:space:]]*-- ]] && continue
+    send_sql "${line%%;}"
+  done < "$file"
+  echo "Done: ${file}"
+}
+
+apply_file "sql/01-schema.sql"
+apply_file "sql/02-data.sql"
+
+echo ""
+echo "Setup complete. ${DB_NAME} is ready."

From 93d62cb15c1eae53137032daccfe3b1f440efbaf Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:33:16 +0100
Subject: [PATCH 04/15] feat(graph-rag): add schema definition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/sql/01-schema.sql | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 graph-rag/sql/01-schema.sql

diff --git a/graph-rag/sql/01-schema.sql b/graph-rag/sql/01-schema.sql
new file mode 100644
index 0000000..646e812
--- /dev/null
+++ b/graph-rag/sql/01-schema.sql
@@ -0,0 +1,22 @@
+-- Document type for text chunks with vector embeddings
+CREATE DOCUMENT TYPE Chunk IF NOT EXISTS;
+CREATE PROPERTY Chunk.content IF NOT EXISTS STRING;
+CREATE PROPERTY Chunk.source IF NOT EXISTS STRING;
+CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER;
+CREATE PROPERTY Chunk.embedding IF NOT EXISTS LIST;
+
+-- Entity vertex types (knowledge graph nodes)
+CREATE VERTEX TYPE Entity IF NOT EXISTS;
+CREATE PROPERTY Entity.name IF NOT EXISTS STRING;
+CREATE VERTEX TYPE Person IF NOT EXISTS EXTENDS Entity;
+CREATE VERTEX TYPE Concept IF NOT EXISTS EXTENDS Entity;
+CREATE VERTEX TYPE Organization IF NOT EXISTS EXTENDS Entity;
+
+-- Edge types
+CREATE EDGE TYPE MENTIONS IF NOT EXISTS;
+CREATE EDGE TYPE RELATES_TO IF NOT EXISTS;
+CREATE EDGE TYPE WORKS_AT IF NOT EXISTS;
+CREATE EDGE TYPE AUTHORED IF NOT EXISTS;
+
+-- Vector index for chunk embeddings (4 dimensions for sample data)
+CREATE INDEX IF NOT EXISTS ON Chunk (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' };

From de21aec5a30c585ded46cf6b6f9868cc33b69a2b Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:33:53 +0100
Subject: [PATCH 05/15] feat(graph-rag): add sample data for ArcadeSoft
 knowledge base

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/sql/02-data.sql | 69 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 graph-rag/sql/02-data.sql

diff --git a/graph-rag/sql/02-data.sql b/graph-rag/sql/02-data.sql
new file mode 100644
index 0000000..31d1cee
--- /dev/null
+++ b/graph-rag/sql/02-data.sql
@@ -0,0 +1,69 @@
+-- ── Chunks (internal documentation) ─────────────────────────────────────────
+-- Getting Started with GraphRAG (graph-heavy topic)
+INSERT INTO Chunk SET content = 'GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy. By traversing entity relationships, the system discovers context that pure vector similarity would miss.', source = 'Getting Started with GraphRAG', chunkIndex = 0, embedding = [0.9, 0.2, 0.1, 0.1];
+INSERT INTO Chunk SET content = 'Building a knowledge graph requires extracting entities and relationships from documents. Named entity recognition and relationship extraction are key preprocessing steps.', source = 'Getting Started with GraphRAG', chunkIndex = 1, embedding = [0.8, 0.1, 0.2, 0.1];
+-- Microservices Architecture Guide (architecture-heavy topic)
+INSERT INTO Chunk SET content = 'Microservices decompose applications into small, independently deployable services. Each service owns its data and communicates via well-defined APIs.', source = 'Microservices Architecture Guide', chunkIndex = 0, embedding = [0.1, 0.1, 0.9, 0.2];
+INSERT INTO Chunk SET content = 'Service mesh patterns like sidecar proxies handle cross-cutting concerns including observability, security, and traffic management across microservices.', source = 'Microservices Architecture Guide', chunkIndex = 1, embedding = [0.1, 0.1, 0.8, 0.3];
+-- Vector Search Best Practices (vector-heavy topic)
+INSERT INTO Chunk SET content = 'Vector similarity search uses embedding models to encode text into high-dimensional vectors. Cosine distance is the most common similarity metric for text embeddings.', source = 'Vector Search Best Practices', chunkIndex = 0, embedding = [0.2, 0.9, 0.1, 0.1];
+INSERT INTO Chunk SET content = 'Approximate nearest neighbor algorithms like HNSW and DiskANN trade small accuracy losses for dramatic speed improvements on large vector datasets.', source = 'Vector Search Best Practices', chunkIndex = 1, embedding = [0.1, 0.8, 0.1, 0.2];
+-- Team Onboarding Handbook (general topic)
+INSERT INTO Chunk SET content = 'New engineers at ArcadeSoft join a team and are assigned a mentor. The onboarding process covers codebase orientation, tooling setup, and architecture overview.', source = 'Team Onboarding Handbook', chunkIndex = 0, embedding = [0.2, 0.2, 0.3, 0.8];
+INSERT INTO Chunk SET content = 'The Platform Team maintains shared infrastructure including the knowledge graph pipeline and vector search service. The Research Team explores new retrieval techniques.', source = 'Team Onboarding Handbook', chunkIndex = 1, embedding = [0.3, 0.3, 0.2, 0.7];
+
+-- ── Entities ────────────────────────────────────────────────────────────────
+-- Persons
+INSERT INTO Person SET name = 'Alice Chen';
+INSERT INTO Person SET name = 'Bob Martinez';
+INSERT INTO Person SET name = 'Carol Wu';
+INSERT INTO Person SET name = 'Dave Park';
+-- Concepts
+INSERT INTO Concept SET name = 'GraphRAG';
+INSERT INTO Concept SET name = 'Vector Search';
+INSERT INTO Concept SET name = 'Microservices';
+INSERT INTO Concept SET name = 'Knowledge Graph';
+-- Organizations
+INSERT INTO Organization SET name = 'ArcadeSoft';
+INSERT INTO Organization SET name = 'Platform Team';
+INSERT INTO Organization SET name = 'Research Team';
+
+-- ── MENTIONS edges (Chunk -> Entity) ────────────────────────────────────────
+-- GraphRAG doc chunks mention GraphRAG and Knowledge Graph concepts
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'GraphRAG');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'GraphRAG');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+-- Vector Search doc chunks mention Vector Search concept
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+-- Microservices doc chunks mention Microservices concept
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0) TO (SELECT FROM Concept WHERE name = 'Microservices');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Microservices');
+-- Onboarding doc mentions teams and people
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0) TO (SELECT FROM Organization WHERE name = 'ArcadeSoft');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Platform Team');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Organization WHERE name = 'Research Team');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+CREATE EDGE MENTIONS FROM (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1) TO (SELECT FROM Concept WHERE name = 'Vector Search');
+
+-- ── RELATES_TO edges (Entity -> Entity) ─────────────────────────────────────
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Vector Search');
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'GraphRAG') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+CREATE EDGE RELATES_TO FROM (SELECT FROM Concept WHERE name = 'Microservices') TO (SELECT FROM Concept WHERE name = 'Knowledge Graph');
+
+-- ── WORKS_AT edges (Person -> Organization) ─────────────────────────────────
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Organization WHERE name = 'Research Team');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Organization WHERE name = 'Platform Team');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Organization WHERE name = 'ArcadeSoft');
+CREATE EDGE WORKS_AT FROM (SELECT FROM Person WHERE name = 'Dave Park') TO (SELECT FROM Organization WHERE name = 'Platform Team');
+
+-- ── AUTHORED edges (Person -> Chunk) ────────────────────────────────────────
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Getting Started with GraphRAG' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Bob Martinez') TO (SELECT FROM Chunk WHERE source = 'Microservices Architecture Guide' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Alice Chen') TO (SELECT FROM Chunk WHERE source = 'Vector Search Best Practices' AND chunkIndex = 1);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 0);
+CREATE EDGE AUTHORED FROM (SELECT FROM Person WHERE name = 'Carol Wu') TO (SELECT FROM Chunk WHERE source = 'Team Onboarding Handbook' AND chunkIndex = 1);

From c9833e811a7a311e0d8e2e8d57f6bf073464694d Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:34:45 +0100
Subject: [PATCH 06/15] feat(graph-rag): add Java module pom.xml with Neo4j
 Bolt driver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/java/pom.xml | 57 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100644 graph-rag/java/pom.xml

diff --git a/graph-rag/java/pom.xml b/graph-rag/java/pom.xml
new file mode 100644
index 0000000..fca28dc
--- /dev/null
+++ b/graph-rag/java/pom.xml
@@ -0,0 +1,57 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+
+  <groupId>com.arcadedb.examples</groupId>
+  <artifactId>graph-rag</artifactId>
+  <version>1.0-SNAPSHOT</version>
+  <packaging>jar</packaging>
+
+  <properties>
+    <maven.compiler.source>21</maven.compiler.source>
+    <maven.compiler.target>21</maven.compiler.target>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <neo4j.driver.version>5.28.10</neo4j.driver.version>
+  </properties>
+
+  <dependencies>
+    <dependency>
+      <groupId>org.neo4j.driver</groupId>
+      <artifactId>neo4j-java-driver</artifactId>
+      <version>${neo4j.driver.version}</version>
+    </dependency>
+  </dependencies>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-assembly-plugin</artifactId>
+        <version>3.8.0</version>
+        <configuration>
+          <archive>
+            <manifest>
+              <mainClass>com.arcadedb.examples.GraphRAG</mainClass>
+            </manifest>
+          </archive>
+          <descriptorRefs>
+            <descriptorRef>jar-with-dependencies</descriptorRef>
+          </descriptorRefs>
+          <finalName>graph-rag</finalName>
+          <appendAssemblyId>false</appendAssemblyId>
+        </configuration>
+        <executions>
+          <execution>
+            <id>make-assembly</id>
+            <phase>package</phase>
+            <goals>
+              <goal>single</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>

From 484a30a65f4fefae94206e4e2cf8bac954ea2747 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:34:52 +0100
Subject: [PATCH 07/15] feat(graph-rag): add curl query script with 5 RAG
 patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/queries/queries.sh | 118 +++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100755 graph-rag/queries/queries.sh

diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh
new file mode 100755
index 0000000..3ddc5df
--- /dev/null
+++ b/graph-rag/queries/queries.sh
@@ -0,0 +1,118 @@
+#!/usr/bin/env bash
+# Graph RAG — all five query patterns via curl
+# Prerequisites: ArcadeDB running, setup.sh already executed, jq installed
+# Usage: ./queries/queries.sh
+
+set -euo pipefail
+
+ARCADEDB_URL="${ARCADEDB_URL:-http://localhost:2480}"
+ARCADEDB_USER="${ARCADEDB_USER:-root}"
+ARCADEDB_PASS="${ARCADEDB_PASS:-arcadedb}"
+AUTH="${ARCADEDB_USER}:${ARCADEDB_PASS}"
+DB="GraphRAG"
+QUERY_URL="${ARCADEDB_URL}/api/v1/query/${DB}"
+
+query() {
+  local lang="$1" cmd="$2"
+  jq -cn --arg l "$lang" --arg c "$cmd" '{"language":$l,"command":$c}' \
+    | curl -sf -u "$AUTH" -X POST "$QUERY_URL" \
+        -H "Content-Type: application/json" -d @- \
+    | jq '.result'
+}
+
+# ─────────────────────────────────────────────────────────────────────────────
+echo "=== Query 1: Hybrid Vector + Graph (SQL+Cypher hybrid) ==="
+echo "Find chunks similar to a query embedding and include entity mentions."
+echo ""
+query "sql" "
+SELECT content, source,
+       out('MENTIONS').name AS entities
+FROM (
+  SELECT *, vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS score
+  FROM Chunk
+  ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC
+  LIMIT 5
+)
+"
+
+# ─────────────────────────────────────────────────────────────────────────────
+echo ""
+echo "=== Query 2: Multi-Hop Entity Bridge (Cypher) ==="
+echo "Find chunks connected through shared entities."
+echo ""
+query "cypher" "
+MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk)
+WHERE direct.source = 'Getting Started with GraphRAG'
+  AND related.source <> direct.source
+RETURN direct.source AS source_doc,
+       entity.name AS bridge_entity,
+       related.content AS connected_content,
+       related.source AS connected_doc
+LIMIT 20
+"
+
+# ─────────────────────────────────────────────────────────────────────────────
+echo ""
+echo "=== Query 3: Temporal-Aware Retrieval (Cypher) ==="
+echo "Get latest chunks per source."
+echo ""
+query "cypher" "
+MATCH (c:Chunk)
+WHERE c.chunkIndex = 1
+RETURN c.content, c.source, c.chunkIndex
+ORDER BY c.chunkIndex DESC
+LIMIT 10
+"
+
+# ─────────────────────────────────────────────────────────────────────────────
+echo ""
+echo "=== Query 4: Composite Scoring: Vector + Entity Count (SQL) ==="
+echo "Score chunks by vector distance and entity connections."
+echo ""
+query "sql" "
+SELECT content, source,
+       vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score,
+       out('MENTIONS').size() AS entity_count
+FROM Chunk
+ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC
+LIMIT 10
+"
+
+# ─────────────────────────────────────────────────────────────────────────────
+echo ""
+echo "=== Query 5: Agentic RAG Steps ==="
+echo "Simulate agent steps: vector search, graph expansion, full-text lookup, authorship."
+echo ""
+
+echo "--- Step 1: Vector search for relevant chunks ---"
+query "sql" "
+SELECT content, source
+FROM Chunk
+ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC
+LIMIT 5
+"
+
+echo ""
+echo "--- Step 2: Graph expansion — entities and relations ---"
+query "cypher" "
+MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e:Entity)-[:RELATES_TO]->(related)
+RETURN e.name, related.name
+LIMIT 10
+"
+
+echo ""
+echo "--- Step 3: Full-text lookup ---"
+query "sql" "
+SELECT content, source
+FROM Chunk
+WHERE content CONTAINSTEXT 'knowledge graph'
+LIMIT 5
+"
+
+echo ""
+echo "--- Step 4: Authorship ---"
+query "cypher" "
+MATCH (p:Person)-[:AUTHORED]->(c:Chunk)
+RETURN p.name, c.source, c.chunkIndex
+LIMIT 10
+"

From 1787d6cce195501c3d14438b0e8ad78ade40a2e3 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:36:58 +0100
Subject: [PATCH 08/15] feat(graph-rag): add GraphRAG.java with 5 Cypher
 queries via Bolt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .../java/com/arcadedb/examples/GraphRAG.java  | 207 ++++++++++++++++++
 1 file changed, 207 insertions(+)
 create mode 100644 graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java

diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
new file mode 100644
index 0000000..06f8dfe
--- /dev/null
+++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
@@ -0,0 +1,207 @@
+package com.arcadedb.examples;
+
+import org.neo4j.driver.*;
+import org.neo4j.driver.Record;
+
+import java.util.List;
+
+public class GraphRAG {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    String uri = "bolt://" + HOST + ":" + PORT;
+    try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) {
+      tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1");
+      tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2");
+      tryRun(() -> runQuery3TemporalAware(driver), "Query 3");
+      tryRun(() -> runQuery4CompositeScoring(driver), "Query 4");
+      tryRun(() -> runQuery5AgenticRAG(driver), "Query 5");
+    }
+    System.out.println("\nAll queries complete.");
+  }
+
+  private static void tryRun(Runnable r, String name) {
+    try {
+      r.run();
+    } catch (Exception e) {
+      System.err.println("[" + name + " FAILED] " + e.getMessage());
+    }
+  }
+
+  // Query 1: Hybrid Vector + Graph
+  // Finds chunks near the graph-topic embedding and their mentioned entities
+  private static void runQuery1HybridVectorGraph(Driver driver) {
+    printHeader("Query 1: Hybrid Vector + Graph Retrieval",
+        "Find chunks similar to graph-topic embedding and their mentioned entities.");
+
+    String cypher = """
+        MATCH (chunk:Chunk)-[:MENTIONS]->(entity:Entity)
+        RETURN chunk.content AS content, chunk.source AS source,
+               collect(DISTINCT entity.name) AS entities
+        LIMIT 10""";
+
+    try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+      Result result = session.run(cypher);
+      List<Record> records = result.list();
+      for (Record r : records) {
+        System.out.printf("  %-40.40s | %-35.35s | %s%n",
+            r.get("source").asString(),
+            truncate(r.get("content").asString(), 35),
+            r.get("entities").asList());
+      }
+    }
+  }
+
+  // Query 2: Multi-Hop Entity Bridge
+  // Discovers documents connected through shared entity chains
+  private static void runQuery2MultiHopEntityBridge(Driver driver) {
+    printHeader("Query 2: Multi-Hop Entity Bridge",
+        "Find chunks connected through shared entities from GraphRAG docs.");
+
+    String cypher = """
+        MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk)
+        WHERE direct.source = 'Getting Started with GraphRAG'
+          AND related.source <> direct.source
+        RETURN direct.source AS source_doc,
+               entity.name AS bridge_entity,
+               related.content AS connected_content,
+               related.source AS connected_doc
+        LIMIT 20""";
+
+    try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+      Result result = session.run(cypher);
+      List<Record> records = result.list();
+      for (Record r : records) {
+        System.out.printf("  [%s] --%s--> %s%n",
+            r.get("source_doc").asString(),
+            r.get("bridge_entity").asString(),
+            r.get("connected_doc").asString());
+        System.out.printf("    -> %s%n", truncate(r.get("connected_content").asString(), 80));
+      }
+    }
+  }
+
+  // Query 3: Temporal-Aware Retrieval
+  // Filters chunks by chunkIndex to get latest context per source
+  private static void runQuery3TemporalAware(Driver driver) {
+    printHeader("Query 3: Temporal-Aware Retrieval",
+        "Get the latest chunk (highest chunkIndex) per source.");
+
+    String cypher = """
+        MATCH (c:Chunk)
+        WHERE c.chunkIndex = 1
+        RETURN c.content AS content, c.source AS source, c.chunkIndex AS chunkIndex
+        ORDER BY c.chunkIndex DESC
+        LIMIT 10""";
+
+    try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+      Result result = session.run(cypher);
+      List<Record> records = result.list();
+      for (Record r : records) {
+        System.out.printf("  %-40.40s | chunk %d | %s%n",
+            r.get("source").asString(),
+            r.get("chunkIndex").asInt(),
+            truncate(r.get("content").asString(), 50));
+      }
+    }
+  }
+
+  // Query 4: Composite Scoring — entity count
+  // Ranks chunks by number of entity connections
+  private static void runQuery4CompositeScoring(Driver driver) {
+    printHeader("Query 4: Composite Scoring (Entity Connections)",
+        "Rank chunks by number of mentioned entities.");
+
+    String cypher = """
+        MATCH (chunk:Chunk)
+        OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity)
+        RETURN chunk.content AS content, chunk.source AS source,
+               count(entity) AS entity_count
+        ORDER BY entity_count DESC
+        LIMIT 10""";
+
+    try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+      Result result = session.run(cypher);
+      List<Record> records = result.list();
+      for (Record r : records) {
+        System.out.printf("  %-40.40s | entities: %d | %s%n",
+            r.get("source").asString(),
+            r.get("entity_count").asInt(),
+            truncate(r.get("content").asString(), 40));
+      }
+    }
+  }
+
+  // Query 5: Agentic RAG — multi-step retrieval
+  // Simulates an agent workflow: graph expansion, then authorship
+  private static void runQuery5AgenticRAG(Driver driver) {
+    printHeader("Query 5: Agentic RAG (Multi-Step Retrieval)",
+        "Simulate an agent: graph expansion -> related concepts -> authorship.");
+
+    try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+      // Step 1: Find entities mentioned in GraphRAG docs
+      System.out.println("  Step 1: Graph expansion from GraphRAG docs");
+      String step1 = """
+          MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})
+                -[:MENTIONS]->(e:Entity)
+                -[:RELATES_TO]->(related)
+          RETURN e.name AS entity, related.name AS related_concept
+          LIMIT 10""";
+
+      Result r1 = session.run(step1);
+      List<Record> records1 = r1.list();
+      for (Record r : records1) {
+        System.out.printf("    %s --> %s%n",
+            r.get("entity").asString(),
+            r.get("related_concept").asString());
+      }
+
+      // Step 2: Get authorship context
+      System.out.println("\n  Step 2: Authorship context");
+      String step2 = """
+          MATCH (p:Person)-[:AUTHORED]->(c:Chunk)
+          RETURN p.name AS author, c.source AS document, c.chunkIndex AS chunk
+          ORDER BY p.name, c.source
+          LIMIT 10""";
+
+      Result r2 = session.run(step2);
+      List<Record> records2 = r2.list();
+      for (Record r : records2) {
+        System.out.printf("    %s authored '%s' (chunk %d)%n",
+            r.get("author").asString(),
+            r.get("document").asString(),
+            r.get("chunk").asInt());
+      }
+
+      // Step 3: Team context — who works where
+      System.out.println("\n  Step 3: Team context");
+      String step3 = """
+          MATCH (p:Person)-[:WORKS_AT]->(org:Organization)
+          RETURN p.name AS person, org.name AS team
+          LIMIT 10""";
+
+      Result r3 = session.run(step3);
+      List<Record> records3 = r3.list();
+      for (Record r : records3) {
+        System.out.printf("    %s works at %s%n",
+            r.get("person").asString(),
+            r.get("team").asString());
+      }
+    }
+  }
+
+  private static void printHeader(String title, String description) {
+    System.out.println("\n" + "=".repeat(70));
+    System.out.println("  " + title);
+    System.out.println("  " + description);
+    System.out.println("=".repeat(70));
+  }
+
+  private static String truncate(String s, int maxLen) {
+    return s.length() <= maxLen ? s : s.substring(0, maxLen - 3) + "...";
+  }
+}

From b1c3c2ac8b136196312668a69f4c4b2e1d0ece41 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:37:31 +0100
Subject: [PATCH 09/15] feat(graph-rag): add langchain4j module pom.xml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/langchain4j/pom.xml | 68 +++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)
 create mode 100644 graph-rag/langchain4j/pom.xml

diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml
new file mode 100644
index 0000000..32857c0
--- /dev/null
+++ b/graph-rag/langchain4j/pom.xml
@@ -0,0 +1,68 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+  <modelVersion>4.0.0</modelVersion>
+
+  <groupId>com.arcadedb.examples</groupId>
+  <artifactId>graph-rag-langchain4j</artifactId>
+  <version>1.0-SNAPSHOT</version>
+  <packaging>jar</packaging>
+
+  <properties>
+    <maven.compiler.source>21</maven.compiler.source>
+    <maven.compiler.target>21</maven.compiler.target>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <langchain4j.version>1.11.0</langchain4j.version>
+    <langchain4j.community.version>1.11.0-beta19</langchain4j.community.version>
+  </properties>
+
+  <dependencies>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j-community-neo4j</artifactId>
+      <version>${langchain4j.community.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
+      <version>${langchain4j.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>dev.langchain4j</groupId>
+      <artifactId>langchain4j</artifactId>
+      <version>${langchain4j.version}</version>
+    </dependency>
+  </dependencies>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-assembly-plugin</artifactId>
+        <version>3.8.0</version>
+        <configuration>
+          <archive>
+            <manifest>
+              <mainClass>com.arcadedb.examples.GraphRAGEmbeddingStore</mainClass>
+            </manifest>
+          </archive>
+          <descriptorRefs>
+            <descriptorRef>jar-with-dependencies</descriptorRef>
+          </descriptorRefs>
+          <finalName>graph-rag-langchain4j</finalName>
+          <appendAssemblyId>false</appendAssemblyId>
+        </configuration>
+        <executions>
+          <execution>
+            <id>make-assembly</id>
+            <phase>package</phase>
+            <goals>
+              <goal>single</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>

From b597c91d104cc8fd73935f14e09c4e3d181c0d0b Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:41:19 +0100
Subject: [PATCH 10/15] feat(graph-rag): add langchain4j embedding store and
 content retriever examples

Add GraphRAGEmbeddingStore.java demonstrating vector ingestion and
similarity search, and GraphRAGContentRetriever.java showing the
RAG content retriever pipeline with min-score filtering.

Fix embedding model dependency version to use community beta release
(1.11.0-beta19) since the stable artifact is not published.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/langchain4j/pom.xml                 |  2 +-
 .../examples/GraphRAGContentRetriever.java    | 78 +++++++++++++++++++
 .../examples/GraphRAGEmbeddingStore.java      | 62 +++++++++++++++
 3 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
 create mode 100644 graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java

diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml
index 32857c0..eee5aee 100644
--- a/graph-rag/langchain4j/pom.xml
+++ b/graph-rag/langchain4j/pom.xml
@@ -26,7 +26,7 @@
     <dependency>
       <groupId>dev.langchain4j</groupId>
       <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
-      <version>${langchain4j.version}</version>
+      <version>${langchain4j.community.version}</version>
     </dependency>
     <dependency>
       <groupId>dev.langchain4j</groupId>
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
new file mode 100644
index 0000000..37a478a
--- /dev/null
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
@@ -0,0 +1,78 @@
+package com.arcadedb.examples;
+
+import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
+import dev.langchain4j.data.embedding.Embedding;
+import dev.langchain4j.data.segment.TextSegment;
+import dev.langchain4j.model.embedding.EmbeddingModel;
+import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
+import dev.langchain4j.rag.content.Content;
+import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
+import dev.langchain4j.rag.query.Query;
+
+import java.util.List;
+
+public class GraphRAGContentRetriever {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+
+    String boltUrl = "bolt://" + HOST + ":" + PORT;
+    Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder()
+        .withBasicAuth(boltUrl, USER, PASSWORD)
+        .dimension(embeddingModel.dimension())
+        .label("RAGChunk")
+        .indexName("rag_chunk_index")
+        .build();
+
+    // Ingest sample chunks
+    String[] texts = {
+        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
+        "By traversing entity relationships, the system discovers context that pure vector similarity would miss.",
+        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
+        "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.",
+        "Microservices decompose applications into small, independently deployable services.",
+        "Building a knowledge graph requires extracting entities and relationships from documents."
+    };
+
+    for (String text : texts) {
+      TextSegment segment = TextSegment.from(text);
+      Embedding embedding = embeddingModel.embed(segment).content();
+      store.add(embedding, segment);
+    }
+
+    System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n");
+
+    // Build content retriever pipeline
+    EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
+        .embeddingStore(store)
+        .embeddingModel(embeddingModel)
+        .maxResults(3)
+        .minScore(0.5)
+        .build();
+
+    // Run queries
+    String[] queries = {
+        "How does graph-based retrieval work?",
+        "What are vector embeddings?",
+        "Tell me about microservices architecture"
+    };
+
+    for (String q : queries) {
+      System.out.println("Query: \"" + q + "\"");
+      List<Content> results = retriever.retrieve(new Query(q));
+      if (results.isEmpty()) {
+        System.out.println("  (no results above min score)\n");
+      } else {
+        for (Content content : results) {
+          System.out.println("  -> " + content.textSegment().text());
+        }
+        System.out.println();
+      }
+    }
+  }
+}
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
new file mode 100644
index 0000000..e20383a
--- /dev/null
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
@@ -0,0 +1,62 @@
+package com.arcadedb.examples;
+
+import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
+import dev.langchain4j.data.embedding.Embedding;
+import dev.langchain4j.data.segment.TextSegment;
+import dev.langchain4j.model.embedding.EmbeddingModel;
+import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
+import dev.langchain4j.store.embedding.EmbeddingMatch;
+import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
+import dev.langchain4j.store.embedding.EmbeddingStore;
+
+import java.util.List;
+
+public class GraphRAGEmbeddingStore {
+
+  private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
+  private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
+
+  public static void main(String[] args) {
+    EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+
+    String boltUrl = "bolt://" + HOST + ":" + PORT;
+    EmbeddingStore<TextSegment> store = Neo4jEmbeddingStore.builder()
+        .withBasicAuth(boltUrl, USER, PASSWORD)
+        .dimension(embeddingModel.dimension())
+        .build();
+
+    // Ingest sample chunks
+    String[] texts = {
+        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
+        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
+        "Microservices decompose applications into small, independently deployable services.",
+        "Building a knowledge graph requires extracting entities and relationships from documents."
+    };
+
+    for (String text : texts) {
+      TextSegment segment = TextSegment.from(text);
+      Embedding embedding = embeddingModel.embed(segment).content();
+      store.add(embedding, segment);
+    }
+
+    System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n");
+
+    // Similarity search
+    String query = "How does graph-based retrieval work?";
+    Embedding queryEmbedding = embeddingModel.embed(query).content();
+    EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
+        .queryEmbedding(queryEmbedding)
+        .maxResults(3)
+        .build();
+
+    List<EmbeddingMatch<TextSegment>> matches = store.search(request).matches();
+
+    System.out.println("Query: \"" + query + "\"\n");
+    System.out.println("Top matches:");
+    for (EmbeddingMatch<TextSegment> match : matches) {
+      System.out.printf("  [%.4f] %s%n", match.score(), match.embedded().text());
+    }
+  }
+}

From 4e88925aa71cb3f7fb4886f73794bc1ef2bd2462 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 09:42:25 +0100
Subject: [PATCH 11/15] docs(graph-rag): add README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/README.md | 104 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 graph-rag/README.md

diff --git a/graph-rag/README.md b/graph-rag/README.md
new file mode 100644
index 0000000..1677cd7
--- /dev/null
+++ b/graph-rag/README.md
@@ -0,0 +1,104 @@
+# Graph RAG
+
+Demonstrates ArcadeDB's multi-model capabilities by implementing a Graph RAG
+(Retrieval-Augmented Generation) system that unifies three retrieval signals
+in a single database:
+
+- **Graph traversal** — multi-hop entity bridging via knowledge graph relationships
+- **Vector similarity** — semantic chunk retrieval using embeddings
+- **Full-text indexing** — keyword-based content lookup
+
+## Prerequisites
+
+- Docker and Docker Compose
+- `curl` and `jq`
+- Java 21+ and Maven 3.x (for the Java demos)
+
+## Quickstart
+
+### 1. Start ArcadeDB
+
+```bash
+docker compose up -d
+```
+
+### 2. Create database and load data
+
+```bash
+./setup.sh
+```
+
+This creates the `GraphRAG` database, applies the schema, and inserts sample data.
+
+### 3a. Run queries via curl
+
+```bash
+./queries/queries.sh
+```
+
+### 3b. Run queries via Java (Bolt)
+
+```bash
+cd java
+mvn package -q
+java -jar target/graph-rag.jar
+```
+
+### 3c. Run queries via Langchain4j
+
+```bash
+cd langchain4j
+mvn package -q
+java -jar target/graph-rag-langchain4j.jar
+```
+
+## Schema
+
+| Type | Kind | Key properties |
+|------|------|----------------|
+| `Chunk` | Document | `content`, `source`, `chunkIndex`, `embedding` |
+| `Entity` | Vertex | `name` |
+| `Person` | Vertex (extends Entity) | `name` |
+| `Concept` | Vertex (extends Entity) | `name` |
+| `Organization` | Vertex (extends Entity) | `name` |
+| `MENTIONS` | Edge | Chunk → Entity |
+| `RELATES_TO` | Edge | Entity → Entity |
+| `WORKS_AT` | Edge | Person → Organization |
+| `AUTHORED` | Edge | Person → Chunk |
+
+## Query Patterns
+
+| # | Pattern | Language | Signal type |
+|---|---------|----------|-------------|
+| 1 | Hybrid Vector + Graph | SQL | Vector + Graph |
+| 2 | Multi-Hop Entity Bridge | Cypher | Graph |
+| 3 | Temporal-Aware Retrieval | Cypher | Graph |
+| 4 | Composite Scoring | SQL | Vector + Graph |
+| 5 | Agentic RAG Steps | Mixed | Multi-signal |
+
+## Sample Data
+
+- 8 chunks from 4 internal documents with 4D embeddings
+- 11 entities (4 persons, 4 concepts, 3 organizations)
+- ~25 edges (MENTIONS, RELATES_TO, WORKS_AT, AUTHORED)
+- Multi-hop design: querying "Vector Search" bridges to GraphRAG docs via shared entity mentions
+
+## Langchain4j Module
+
+The `langchain4j/` directory contains two standalone examples using LangChain4j
+with ArcadeDB via the Neo4j Bolt protocol:
+
+- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2) and performs similarity search
+- **GraphRAGContentRetriever** — wires the embedding store into a LangChain4j `EmbeddingStoreContentRetriever` pipeline
+
+No external API keys required — the embedding model runs in-process.
+
+## ArcadeDB Version Notes
+
+This use case targets ArcadeDB **26.2.1**. Vector similarity queries use
+`vectorNeighbors('IndexName[property]', vector, k)` with an `LSM_VECTOR`
+index. The Bolt protocol (port 2424) enables Neo4j driver compatibility.
+
+## Reference
+
+[ArcadeDB Graph RAG use case](https://arcadedb.com/graph-rag.html)

From 178c17a6d868033ca211a378c77b11839c7d7931 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 13:43:05 +0100
Subject: [PATCH 12/15] fix(graph-rag): integration smoke test fixes

- Change Chunk from DOCUMENT TYPE to VERTEX TYPE (edges require vertices)
- Enable BoltProtocolPlugin in docker-compose.yml (port 7687)
- Downgrade neo4j-java-driver to 4.4.12 (compatible with ArcadeDB Bolt v4)
- Remove :Entity labels from Cypher queries (ArcadeDB doesn't resolve parent type)
- Simplify SQL vector queries (remove vectorDistance subquery)
- Rewrite langchain4j to use Neo4j driver directly (ArcadeDB doesn't support
  Neo4j's SHOW VECTOR INDEX DDL used by Neo4jEmbeddingStore)
- Update README with correct port, schema types, and run instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/README.md                           |  15 +-
 graph-rag/docker-compose.yml                  |   7 +-
 graph-rag/java/pom.xml                        |   2 +-
 .../java/com/arcadedb/examples/GraphRAG.java  |  10 +-
 graph-rag/langchain4j/pom.xml                 |  11 +-
 .../examples/GraphRAGContentRetriever.java    | 129 +++++++++++-------
 .../examples/GraphRAGEmbeddingStore.java      | 103 +++++++++-----
 graph-rag/queries/queries.sh                  |  14 +-
 graph-rag/sql/01-schema.sql                   |   4 +-
 9 files changed, 176 insertions(+), 119 deletions(-)

diff --git a/graph-rag/README.md b/graph-rag/README.md
index 1677cd7..de8f3a8 100644
--- a/graph-rag/README.md
+++ b/graph-rag/README.md
@@ -44,19 +44,24 @@ mvn package -q
 java -jar target/graph-rag.jar
 ```
 
-### 3c. Run queries via Langchain4j
+### 3c. Run LangChain4j demos
 
 ```bash
 cd langchain4j
 mvn package -q
+
+# Embedding store: ingest + similarity search
 java -jar target/graph-rag-langchain4j.jar
+
+# Content retriever: semantic search + graph expansion
+java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentRetriever
 ```
 
 ## Schema
 
 | Type | Kind | Key properties |
 |------|------|----------------|
-| `Chunk` | Document | `content`, `source`, `chunkIndex`, `embedding` |
+| `Chunk` | Vertex | `content`, `source`, `chunkIndex`, `embedding` |
 | `Entity` | Vertex | `name` |
 | `Person` | Vertex (extends Entity) | `name` |
 | `Concept` | Vertex (extends Entity) | `name` |
@@ -88,8 +93,8 @@ java -jar target/graph-rag-langchain4j.jar
 The `langchain4j/` directory contains two standalone examples using LangChain4j
 with ArcadeDB via the Neo4j Bolt protocol:
 
-- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2) and performs similarity search
-- **GraphRAGContentRetriever** — wires the embedding store into a LangChain4j `EmbeddingStoreContentRetriever` pipeline
+- **GraphRAGEmbeddingStore** — ingests text chunks with real 384D embeddings (AllMiniLmL6V2), stores via Cypher over Bolt, and performs similarity search using LangChain4j's cosine similarity
+- **GraphRAGContentRetriever** — re-embeds the sample Chunk data with 384D vectors, runs semantic search, then enriches results with graph context via Cypher traversal (entities mentioned by top matches)
 
 No external API keys required — the embedding model runs in-process.
 
@@ -97,7 +102,7 @@ No external API keys required — the embedding model runs in-process.
 
 This use case targets ArcadeDB **26.2.1**. Vector similarity queries use
 `vectorNeighbors('IndexName[property]', vector, k)` with an `LSM_VECTOR`
-index. The Bolt protocol (port 2424) enables Neo4j driver compatibility.
+index. The Bolt protocol (port 7687) enables Neo4j driver compatibility.
 
 ## Reference
 
diff --git a/graph-rag/docker-compose.yml b/graph-rag/docker-compose.yml
index bb503a9..eef84e7 100644
--- a/graph-rag/docker-compose.yml
+++ b/graph-rag/docker-compose.yml
@@ -3,9 +3,12 @@ services:
     image: arcadedata/arcadedb:26.2.1
     ports:
       - "2480:2480"
-      - "2424:2424"
+      - "7687:7687"
     environment:
-      JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb"
+      JAVA_OPTS: >-
+        -Darcadedb.server.rootPassword=arcadedb
+        -Darcadedb.server.plugins=BoltProtocolPlugin
+        -Darcadedb.bolt.defaultDatabase=GraphRAG
     healthcheck:
       test: ["CMD", "curl", "-sf", "http://localhost:2480/api/v1/ready"]
       interval: 5s
diff --git a/graph-rag/java/pom.xml b/graph-rag/java/pom.xml
index fca28dc..a2068ea 100644
--- a/graph-rag/java/pom.xml
+++ b/graph-rag/java/pom.xml
@@ -13,7 +13,7 @@
     <maven.compiler.source>21</maven.compiler.source>
     <maven.compiler.target>21</maven.compiler.target>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
-    <neo4j.driver.version>5.28.10</neo4j.driver.version>
+    <neo4j.driver.version>4.4.12</neo4j.driver.version>
   </properties>
 
   <dependencies>
diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
index 06f8dfe..7144041 100644
--- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
+++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
@@ -8,7 +8,7 @@
 public class GraphRAG {
 
   private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
-  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687");
   private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
   private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
 
@@ -39,7 +39,7 @@ private static void runQuery1HybridVectorGraph(Driver driver) {
         "Find chunks similar to graph-topic embedding and their mentioned entities.");
 
     String cypher = """
-        MATCH (chunk:Chunk)-[:MENTIONS]->(entity:Entity)
+        MATCH (chunk:Chunk)-[:MENTIONS]->(entity)
         RETURN chunk.content AS content, chunk.source AS source,
                collect(DISTINCT entity.name) AS entities
         LIMIT 10""";
@@ -63,7 +63,7 @@ private static void runQuery2MultiHopEntityBridge(Driver driver) {
         "Find chunks connected through shared entities from GraphRAG docs.");
 
     String cypher = """
-        MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk)
+        MATCH (direct:Chunk)-[:MENTIONS]->(entity)<-[:MENTIONS]-(related:Chunk)
         WHERE direct.source = 'Getting Started with GraphRAG'
           AND related.source <> direct.source
         RETURN direct.source AS source_doc,
@@ -118,7 +118,7 @@ private static void runQuery4CompositeScoring(Driver driver) {
 
     String cypher = """
         MATCH (chunk:Chunk)
-        OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity:Entity)
+        OPTIONAL MATCH (chunk)-[:MENTIONS]->(entity)
         RETURN chunk.content AS content, chunk.source AS source,
                count(entity) AS entity_count
         ORDER BY entity_count DESC
@@ -147,7 +147,7 @@ private static void runQuery5AgenticRAG(Driver driver) {
       System.out.println("  Step 1: Graph expansion from GraphRAG docs");
       String step1 = """
           MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})
-                -[:MENTIONS]->(e:Entity)
+                -[:MENTIONS]->(e)
                 -[:RELATES_TO]->(related)
           RETURN e.name AS entity, related.name AS related_concept
           LIMIT 10""";
diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml
index eee5aee..3475dd9 100644
--- a/graph-rag/langchain4j/pom.xml
+++ b/graph-rag/langchain4j/pom.xml
@@ -15,14 +15,10 @@
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     <langchain4j.version>1.11.0</langchain4j.version>
     <langchain4j.community.version>1.11.0-beta19</langchain4j.community.version>
+    <neo4j.driver.version>4.4.12</neo4j.driver.version>
   </properties>
 
   <dependencies>
-    <dependency>
-      <groupId>dev.langchain4j</groupId>
-      <artifactId>langchain4j-community-neo4j</artifactId>
-      <version>${langchain4j.community.version}</version>
-    </dependency>
     <dependency>
       <groupId>dev.langchain4j</groupId>
       <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
@@ -33,6 +29,11 @@
       <artifactId>langchain4j</artifactId>
       <version>${langchain4j.version}</version>
     </dependency>
+    <dependency>
+      <groupId>org.neo4j.driver</groupId>
+      <artifactId>neo4j-java-driver</artifactId>
+      <version>${neo4j.driver.version}</version>
+    </dependency>
   </dependencies>
 
   <build>
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
index 37a478a..dd0696a 100644
--- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
@@ -1,78 +1,101 @@
 package com.arcadedb.examples;
 
-import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
 import dev.langchain4j.data.embedding.Embedding;
 import dev.langchain4j.data.segment.TextSegment;
 import dev.langchain4j.model.embedding.EmbeddingModel;
 import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
-import dev.langchain4j.rag.content.Content;
-import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
-import dev.langchain4j.rag.query.Query;
+import dev.langchain4j.store.embedding.CosineSimilarity;
 
+import org.neo4j.driver.*;
+import org.neo4j.driver.Record;
+
+import java.util.ArrayList;
+import java.util.Comparator;
 import java.util.List;
+import java.util.stream.Collectors;
 
+/**
+ * Demonstrates a Graph RAG content retrieval pipeline that combines LangChain4j
+ * embeddings with ArcadeDB's graph traversal via the Neo4j Bolt driver.
+ *
+ * Pipeline: embed query → vector similarity for chunks → graph expansion
+ * to find related entities → return enriched context.
+ */
 public class GraphRAGContentRetriever {
 
   private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
-  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687");
   private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
   private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
 
   public static void main(String[] args) {
     EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+    System.out.println("Embedding model: AllMiniLmL6V2 (" + embeddingModel.dimension() + "D)\n");
 
-    String boltUrl = "bolt://" + HOST + ":" + PORT;
-    Neo4jEmbeddingStore store = Neo4jEmbeddingStore.builder()
-        .withBasicAuth(boltUrl, USER, PASSWORD)
-        .dimension(embeddingModel.dimension())
-        .label("RAGChunk")
-        .indexName("rag_chunk_index")
-        .build();
-
-    // Ingest sample chunks
-    String[] texts = {
-        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
-        "By traversing entity relationships, the system discovers context that pure vector similarity would miss.",
-        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
-        "Approximate nearest neighbor algorithms like HNSW trade small accuracy losses for dramatic speed improvements.",
-        "Microservices decompose applications into small, independently deployable services.",
-        "Building a knowledge graph requires extracting entities and relationships from documents."
-    };
-
-    for (String text : texts) {
-      TextSegment segment = TextSegment.from(text);
-      Embedding embedding = embeddingModel.embed(segment).content();
-      store.add(embedding, segment);
-    }
+    String uri = "bolt://" + HOST + ":" + PORT;
+    try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD));
+         Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+
+      // Step 1: Re-embed the existing Chunk data with real 384D vectors
+      System.out.println("Step 1: Re-embedding existing chunks with 384D vectors...");
+      Result chunks = session.run("MATCH (c:Chunk) RETURN c.content AS content, c.source AS source");
+      List<EmbeddedChunk> embeddedChunks = new ArrayList<>();
+
+      for (Record r : chunks.list()) {
+        String content = r.get("content").asString();
+        String source = r.get("source").asString();
+        Embedding embedding = embeddingModel.embed(TextSegment.from(content)).content();
+        embeddedChunks.add(new EmbeddedChunk(content, source, embedding));
+      }
+      System.out.println("  Embedded " + embeddedChunks.size() + " chunks with 384D vectors.\n");
+
+      // Step 2: Run queries — semantic search + graph enrichment
+      String[] queries = {
+          "How does graph-based retrieval work?",
+          "What are vector embeddings?",
+          "Tell me about microservices architecture"
+      };
+
+      for (String q : queries) {
+        System.out.println("Query: \"" + q + "\"");
+        Embedding queryEmbedding = embeddingModel.embed(q).content();
 
-    System.out.println("Ingested " + texts.length + " chunks into RAGChunk nodes.\n");
-
-    // Build content retriever pipeline
-    EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
-        .embeddingStore(store)
-        .embeddingModel(embeddingModel)
-        .maxResults(3)
-        .minScore(0.5)
-        .build();
-
-    // Run queries
-    String[] queries = {
-        "How does graph-based retrieval work?",
-        "What are vector embeddings?",
-        "Tell me about microservices architecture"
-    };
-
-    for (String q : queries) {
-      System.out.println("Query: \"" + q + "\"");
-      List<Content> results = retriever.retrieve(new Query(q));
-      if (results.isEmpty()) {
-        System.out.println("  (no results above min score)\n");
-      } else {
-        for (Content content : results) {
-          System.out.println("  -> " + content.textSegment().text());
+        // Find top-3 most similar chunks
+        List<ScoredChunk> scored = embeddedChunks.stream()
+            .map(ec -> new ScoredChunk(ec, CosineSimilarity.between(queryEmbedding, ec.embedding())))
+            .sorted(Comparator.comparingDouble(ScoredChunk::score).reversed())
+            .limit(3)
+            .toList();
+
+        System.out.println("  Semantic matches:");
+        for (ScoredChunk sc : scored) {
+          System.out.printf("    [%.4f] [%s] %s%n",
+              sc.score(), sc.chunk().source(), truncate(sc.chunk().content(), 70));
+        }
+
+        // Step 3: Graph expansion — find entities mentioned by top match
+        String topSource = scored.get(0).chunk().source();
+        Result entities = session.run(
+            "MATCH (c:Chunk)-[:MENTIONS]->(e) WHERE c.source = $source " +
+                "RETURN DISTINCT e.name AS entity LIMIT 5",
+            Values.parameters("source", topSource));
+
+        List<Record> entityList = entities.list();
+        if (!entityList.isEmpty()) {
+          System.out.print("  Graph context: ");
+          System.out.println(entityList.stream()
+              .map(r -> r.get("entity").asString())
+              .collect(Collectors.joining(", ")));
         }
         System.out.println();
       }
     }
   }
+
+  private record EmbeddedChunk(String content, String source, Embedding embedding) {}
+  private record ScoredChunk(EmbeddedChunk chunk, double score) {}
+
+  private static String truncate(String s, int maxLen) {
+    return s.length() <= maxLen ? s : s.substring(0, maxLen - 3) + "...";
+  }
 }
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
index e20383a..5cb9520 100644
--- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
@@ -1,62 +1,91 @@
 package com.arcadedb.examples;
 
-import dev.langchain4j.community.store.embedding.neo4j.Neo4jEmbeddingStore;
 import dev.langchain4j.data.embedding.Embedding;
 import dev.langchain4j.data.segment.TextSegment;
 import dev.langchain4j.model.embedding.EmbeddingModel;
 import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
-import dev.langchain4j.store.embedding.EmbeddingMatch;
-import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
-import dev.langchain4j.store.embedding.EmbeddingStore;
+import dev.langchain4j.store.embedding.CosineSimilarity;
 
+import org.neo4j.driver.*;
+import org.neo4j.driver.Record;
+
+import java.util.ArrayList;
+import java.util.Comparator;
 import java.util.List;
 
+/**
+ * Demonstrates LangChain4j embedding generation combined with ArcadeDB graph
+ * storage via the Neo4j Bolt driver.
+ *
+ * LangChain4j generates 384-dimensional embeddings using AllMiniLmL6V2 (runs
+ * in-process, no API keys). The embeddings are stored in ArcadeDB's LCChunk
+ * vertex type via Cypher over Bolt. Similarity is computed using LangChain4j's
+ * CosineSimilarity.
+ */
 public class GraphRAGEmbeddingStore {
 
   private static final String HOST     = System.getenv().getOrDefault("ARCADEDB_HOST", "localhost");
-  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "2424");
+  private static final String PORT     = System.getenv().getOrDefault("ARCADEDB_BOLT_PORT", "7687");
   private static final String USER     = System.getenv().getOrDefault("ARCADEDB_USER", "root");
   private static final String PASSWORD = System.getenv().getOrDefault("ARCADEDB_PASS", "arcadedb");
 
   public static void main(String[] args) {
     EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
+    System.out.println("Embedding model: AllMiniLmL6V2 (" + embeddingModel.dimension() + "D)\n");
 
-    String boltUrl = "bolt://" + HOST + ":" + PORT;
-    EmbeddingStore<TextSegment> store = Neo4jEmbeddingStore.builder()
-        .withBasicAuth(boltUrl, USER, PASSWORD)
-        .dimension(embeddingModel.dimension())
-        .build();
-
-    // Ingest sample chunks
-    String[] texts = {
-        "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
-        "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
-        "Microservices decompose applications into small, independently deployable services.",
-        "Building a knowledge graph requires extracting entities and relationships from documents."
-    };
-
-    for (String text : texts) {
-      TextSegment segment = TextSegment.from(text);
-      Embedding embedding = embeddingModel.embed(segment).content();
-      store.add(embedding, segment);
-    }
+    String uri = "bolt://" + HOST + ":" + PORT;
+    try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD));
+         Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
+
+      // Ingest sample chunks with real embeddings via Cypher over Bolt
+      String[] texts = {
+          "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
+          "Vector similarity search uses embedding models to encode text into high-dimensional vectors.",
+          "Microservices decompose applications into small, independently deployable services.",
+          "Building a knowledge graph requires extracting entities and relationships from documents."
+      };
+
+      for (String text : texts) {
+        Embedding embedding = embeddingModel.embed(TextSegment.from(text)).content();
+        List<Double> vector = toDoubleList(embedding.vector());
+        session.run("CREATE (c:LCChunk {content: $content, embedding: $embedding})",
+            Values.parameters("content", text, "embedding", vector));
+      }
+      System.out.println("Ingested " + texts.length + " chunks with " + embeddingModel.dimension() + "D embeddings.\n");
 
-    System.out.println("Ingested " + texts.length + " chunks with 384D embeddings.\n");
+      // Similarity search: embed query, fetch stored embeddings, rank by cosine similarity
+      String query = "How does graph-based retrieval work?";
+      Embedding queryEmbedding = embeddingModel.embed(query).content();
 
-    // Similarity search
-    String query = "How does graph-based retrieval work?";
-    Embedding queryEmbedding = embeddingModel.embed(query).content();
-    EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
-        .queryEmbedding(queryEmbedding)
-        .maxResults(3)
-        .build();
+      System.out.println("Query: \"" + query + "\"\n");
+      System.out.println("Top matches (cosine similarity via LangChain4j):");
 
-    List<EmbeddingMatch<TextSegment>> matches = store.search(request).matches();
+      Result result = session.run("MATCH (c:LCChunk) RETURN c.content AS content, c.embedding AS embedding");
+      List<ScoredChunk> scored = new ArrayList<>();
 
-    System.out.println("Query: \"" + query + "\"\n");
-    System.out.println("Top matches:");
-    for (EmbeddingMatch<TextSegment> match : matches) {
-      System.out.printf("  [%.4f] %s%n", match.score(), match.embedded().text());
+      for (Record r : result.list()) {
+        String content = r.get("content").asString();
+        List<Object> rawEmbedding = r.get("embedding").asList();
+        float[] storedVector = new float[rawEmbedding.size()];
+        for (int i = 0; i < rawEmbedding.size(); i++) {
+          storedVector[i] = ((Number) rawEmbedding.get(i)).floatValue();
+        }
+        double score = CosineSimilarity.between(queryEmbedding, new Embedding(storedVector));
+        scored.add(new ScoredChunk(content, score));
+      }
+
+      scored.sort(Comparator.comparingDouble(ScoredChunk::score).reversed());
+      for (int i = 0; i < Math.min(3, scored.size()); i++) {
+        System.out.printf("  [%.4f] %s%n", scored.get(i).score(), scored.get(i).content());
+      }
     }
   }
+
+  private record ScoredChunk(String content, double score) {}
+
+  private static List<Double> toDoubleList(float[] vector) {
+    List<Double> list = new ArrayList<>(vector.length);
+    for (float f : vector) list.add((double) f);
+    return list;
+  }
 }
diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh
index 3ddc5df..2f93b70 100755
--- a/graph-rag/queries/queries.sh
+++ b/graph-rag/queries/queries.sh
@@ -27,12 +27,9 @@ echo ""
 query "sql" "
 SELECT content, source,
        out('MENTIONS').name AS entities
-FROM (
-  SELECT *, vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS score
-  FROM Chunk
-  ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC
-  LIMIT 5
-)
+FROM Chunk
+ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 5) DESC
+LIMIT 5
 "
 
 # ─────────────────────────────────────────────────────────────────────────────
@@ -41,7 +38,7 @@ echo "=== Query 2: Multi-Hop Entity Bridge (Cypher) ==="
 echo "Find chunks connected through shared entities."
 echo ""
 query "cypher" "
-MATCH (direct:Chunk)-[:MENTIONS]->(entity:Entity)<-[:MENTIONS]-(related:Chunk)
+MATCH (direct:Chunk)-[:MENTIONS]->(entity)<-[:MENTIONS]-(related:Chunk)
 WHERE direct.source = 'Getting Started with GraphRAG'
   AND related.source <> direct.source
 RETURN direct.source AS source_doc,
@@ -71,7 +68,6 @@ echo "Score chunks by vector distance and entity connections."
 echo ""
 query "sql" "
 SELECT content, source,
-       vectorDistance('Chunk[embedding]', embedding, [0.9, 0.2, 0.1, 0.1]) AS vector_score,
        out('MENTIONS').size() AS entity_count
 FROM Chunk
 ORDER BY vectorNeighbors('Chunk[embedding]', [0.9, 0.2, 0.1, 0.1], 10) DESC
@@ -95,7 +91,7 @@ LIMIT 5
 echo ""
 echo "--- Step 2: Graph expansion — entities and relations ---"
 query "cypher" "
-MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e:Entity)-[:RELATES_TO]->(related)
+MATCH (c:Chunk {source: 'Getting Started with GraphRAG'})-[:MENTIONS]->(e)-[:RELATES_TO]->(related)
 RETURN e.name, related.name
 LIMIT 10
 "
diff --git a/graph-rag/sql/01-schema.sql b/graph-rag/sql/01-schema.sql
index 646e812..cdc74c0 100644
--- a/graph-rag/sql/01-schema.sql
+++ b/graph-rag/sql/01-schema.sql
@@ -1,5 +1,5 @@
--- Document type for text chunks with vector embeddings
-CREATE DOCUMENT TYPE Chunk IF NOT EXISTS;
+-- Vertex type for text chunks with vector embeddings
+CREATE VERTEX TYPE Chunk IF NOT EXISTS;
 CREATE PROPERTY Chunk.content IF NOT EXISTS STRING;
 CREATE PROPERTY Chunk.source IF NOT EXISTS STRING;
 CREATE PROPERTY Chunk.chunkIndex IF NOT EXISTS INTEGER;

From 78e85373fa7353aeebf1bf05b11ec4126074ea55 Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 13:47:24 +0100
Subject: [PATCH 13/15] fix(graph-rag): address code review findings

- Rename Query 1 header in GraphRAG.java to "Graph Traversal with Entity
  Collection" (was misleadingly "Hybrid Vector + Graph" without vector)
- Add empty-result guard in GraphRAGContentRetriever
- Add version comment in langchain4j pom.xml for embedding model artifact
- Document all implementation deviations in design doc (port, driver version,
  vertex type, Cypher labels, Neo4jEmbeddingStore incompatibility)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/plans/2026-02-26-graph-rag-design.md          | 14 ++++++++++++++
 .../main/java/com/arcadedb/examples/GraphRAG.java  |  9 +++++----
 graph-rag/langchain4j/pom.xml                      |  2 +-
 .../examples/GraphRAGContentRetriever.java         |  5 +++++
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/docs/plans/2026-02-26-graph-rag-design.md b/docs/plans/2026-02-26-graph-rag-design.md
index 332ee25..a91e1af 100644
--- a/docs/plans/2026-02-26-graph-rag-design.md
+++ b/docs/plans/2026-02-26-graph-rag-design.md
@@ -145,3 +145,17 @@ Adapts the 5 patterns to pure Cypher. Queries that rely on SQL-specific features
 - `queries.sh` runs all 5 queries and returns non-empty result sets
 - `mvn package && java -jar target/graph-rag.jar` connects via Bolt, runs all 5 Cypher queries
 - `mvn package && java -jar target/graph-rag-langchain4j.jar` ingests chunks, generates embeddings, runs similarity search and content retrieval
+
+## Implementation Deviations
+
+The following changes were made during integration testing:
+
+| Design | Implementation | Reason |
+|--------|---------------|--------|
+| Bolt port 2424 | Port 7687 | ArcadeDB's BoltProtocolPlugin defaults to 7687 (standard Neo4j port) |
+| `neo4j-java-driver:5.28.x` | `neo4j-java-driver:4.4.12` | ArcadeDB's Bolt implements protocol v4; driver 5.x fails handshake |
+| `Chunk` as DOCUMENT TYPE | `Chunk` as VERTEX TYPE | Edges (MENTIONS, AUTHORED) require vertex endpoints |
+| `:Entity` label in Cypher | Unlabeled `(entity)` | ArcadeDB Cypher doesn't resolve parent type labels to subtypes |
+| `Neo4jEmbeddingStore` via langchain4j-community-neo4j | Direct Neo4j driver + LangChain4j `CosineSimilarity` | ArcadeDB doesn't support `SHOW VECTOR INDEX` DDL used by Neo4jEmbeddingStore |
+| `vectorDistance` in SQL subquery | Direct `vectorNeighbors` ordering | `vectorDistance` doesn't work in subqueries in ArcadeDB 26.2.1 |
+| Docker JAVA_OPTS single line | Multi-line with plugins | BoltProtocolPlugin must be explicitly enabled via `arcadedb.server.plugins` |
diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
index 7144041..6dc844a 100644
--- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
+++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
@@ -32,11 +32,12 @@ private static void tryRun(Runnable r, String name) {
     }
   }
 
-  // Query 1: Hybrid Vector + Graph
-  // Finds chunks near the graph-topic embedding and their mentioned entities
+  // Query 1: Graph Traversal with Entity Collection
+  // Finds chunks and their mentioned entities via graph traversal
+  // (vector search requires SQL; see queries.sh Query 1 for the hybrid version)
   private static void runQuery1HybridVectorGraph(Driver driver) {
-    printHeader("Query 1: Hybrid Vector + Graph Retrieval",
-        "Find chunks similar to graph-topic embedding and their mentioned entities.");
+    printHeader("Query 1: Graph Traversal with Entity Collection",
+        "Find chunks and their mentioned entities via graph traversal.");
 
     String cypher = """
         MATCH (chunk:Chunk)-[:MENTIONS]->(entity)
diff --git a/graph-rag/langchain4j/pom.xml b/graph-rag/langchain4j/pom.xml
index 3475dd9..d7bf57d 100644
--- a/graph-rag/langchain4j/pom.xml
+++ b/graph-rag/langchain4j/pom.xml
@@ -22,7 +22,7 @@
     <dependency>
       <groupId>dev.langchain4j</groupId>
       <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
-      <version>${langchain4j.community.version}</version>
+      <version>${langchain4j.community.version}</version> <!-- 1.11.0 GA not published for this artifact -->
     </dependency>
     <dependency>
       <groupId>dev.langchain4j</groupId>
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
index dd0696a..d7717bb 100644
--- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
@@ -67,6 +67,11 @@ public static void main(String[] args) {
             .limit(3)
             .toList();
 
+        if (scored.isEmpty()) {
+          System.out.println("  (no chunks found)\n");
+          continue;
+        }
+
         System.out.println("  Semantic matches:");
         for (ScoredChunk sc : scored) {
           System.out.printf("    [%.4f] [%s] %s%n",

From 0f27d323edff715e51b6a8467266a394886ec29b Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 13:52:34 +0100
Subject: [PATCH 14/15] ci(graph-rag): add GitHub Actions workflow

Test curl queries, Java Bolt driver, and LangChain4j demos in CI.
Also add knowledge-graphs and graph-rag to root README use case table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .github/workflows/graph-rag.yml | 81 +++++++++++++++++++++++++++++++++
 README.md                       |  4 +-
 2 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 .github/workflows/graph-rag.yml

diff --git a/.github/workflows/graph-rag.yml b/.github/workflows/graph-rag.yml
new file mode 100644
index 0000000..351616b
--- /dev/null
+++ b/.github/workflows/graph-rag.yml
@@ -0,0 +1,81 @@
+name: Graph RAG CI
+
+on:
+  push:
+    paths:
+      - graph-rag/**
+      - .github/workflows/graph-rag.yml
+  pull_request:
+    paths:
+      - graph-rag/**
+      - .github/workflows/graph-rag.yml
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    permissions:
+      contents: read
+    strategy:
+      fail-fast: false
+      matrix:
+        runner: [curl, java, langchain4j]
+
+    env:
+      ARCADEDB_URL: http://localhost:2480
+      ARCADEDB_USER: root
+      ARCADEDB_PASS: arcadedb
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          fetch-depth: 1
+
+      - name: Set up Java
+        if: matrix.runner == 'java' || matrix.runner == 'langchain4j'
+        uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0
+        with:
+          java-version: '21'
+          distribution: 'temurin'
+
+      - name: Cache Maven repository
+        if: matrix.runner == 'java' || matrix.runner == 'langchain4j'
+        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
+        with:
+          path: ~/.m2
+          key: ${{ runner.os }}-m2-${{ matrix.runner }}-${{ hashFiles('graph-rag/java/pom.xml', 'graph-rag/langchain4j/pom.xml') }}
+          restore-keys: ${{ runner.os }}-m2-${{ matrix.runner }}-
+
+      - name: Start ArcadeDB
+        working-directory: graph-rag
+        run: docker compose up -d
+
+      - name: Setup database
+        working-directory: graph-rag
+        run: ./setup.sh
+
+      - name: Run curl queries
+        if: matrix.runner == 'curl'
+        working-directory: graph-rag
+        run: ./queries/queries.sh
+
+      - name: Build and run Java (Bolt)
+        if: matrix.runner == 'java'
+        working-directory: graph-rag/java
+        run: |
+          mvn package --no-transfer-progress
+          java -jar target/graph-rag.jar
+
+      - name: Build and run LangChain4j
+        if: matrix.runner == 'langchain4j'
+        working-directory: graph-rag/langchain4j
+        run: |
+          mvn package --no-transfer-progress
+          java -jar target/graph-rag-langchain4j.jar
+          java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentRetriever
+
+      - name: Teardown
+        if: always()
+        working-directory: graph-rag
+        run: docker compose down
diff --git a/README.md b/README.md
index eab587f..88dd13b 100644
--- a/README.md
+++ b/README.md
@@ -10,6 +10,8 @@ and runnable demos via both `curl` and a Java program.
 | Directory | Description | ArcadeDB features |
 |-----------|-------------|-------------------|
 | [recommendation-engine](./recommendation-engine/) | Intelligent product and content recommendations | Graph traversal, Vector similarity, Time-series |
+| [knowledge-graphs](./knowledge-graphs/) | Academic research knowledge graph with co-authorship and citation networks | Graph traversal, Vector similarity, Full-text search, Time-series |
+| [graph-rag](./graph-rag/) | Graph RAG system combining knowledge graphs with vector search for retrieval-augmented generation | Graph traversal, Vector similarity, Full-text indexing, Neo4j Bolt, LangChain4j |
 
 ## Structure
 
@@ -19,5 +21,5 @@ Each use case directory contains:
 - `sql/01-schema.sql` — vertex/edge type definitions
 - `sql/02-data.sql` — sample data
 - `queries/queries.sh` — all queries via `curl`
-- `java/` — standalone Maven project running the same queries via `arcadedb-network`
+- `java/` — standalone Maven project running the same queries via Java
 - `README.md` — quickstart guide

From 7e6122229f8ca584787672fb28f9395cd62e10ef Mon Sep 17 00:00:00 2001
From: robfrank <ro.franchini@gmail.com>
Date: Thu, 26 Feb 2026 14:26:40 +0100
Subject: [PATCH 15/15] fix(graph-rag): address PR review feedback

- Rename Query 1 method to runQuery1GraphTraversal (was misleadingly
  named HybridVectorGraph despite being graph-only over Bolt)
- Fix Query 3: remove no-op WHERE chunkIndex=1 filter, rename from
  "Temporal-Aware Retrieval" to "Latest Chunk Per Document"
- Clean up LCChunk nodes before inserting in GraphRAGEmbeddingStore
  to prevent data accumulation across repeated demo runs
- Clarify in Javadoc that similarity is computed in-memory because
  vectorNeighbors() is SQL-only, not available over Bolt protocol

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 graph-rag/README.md                           |  2 +-
 .../java/com/arcadedb/examples/GraphRAG.java  | 19 +++++++++----------
 .../examples/GraphRAGContentRetriever.java    |  7 ++++++-
 .../examples/GraphRAGEmbeddingStore.java      | 11 +++++++++--
 graph-rag/queries/queries.sh                  |  7 +++----
 5 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/graph-rag/README.md b/graph-rag/README.md
index de8f3a8..be39da7 100644
--- a/graph-rag/README.md
+++ b/graph-rag/README.md
@@ -77,7 +77,7 @@ java -cp target/graph-rag-langchain4j.jar com.arcadedb.examples.GraphRAGContentR
 |---|---------|----------|-------------|
 | 1 | Hybrid Vector + Graph | SQL | Vector + Graph |
 | 2 | Multi-Hop Entity Bridge | Cypher | Graph |
-| 3 | Temporal-Aware Retrieval | Cypher | Graph |
+| 3 | Latest Chunk Per Document | Cypher | Graph |
 | 4 | Composite Scoring | SQL | Vector + Graph |
 | 5 | Agentic RAG Steps | Mixed | Multi-signal |
 
diff --git a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
index 6dc844a..ce2bedd 100644
--- a/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
+++ b/graph-rag/java/src/main/java/com/arcadedb/examples/GraphRAG.java
@@ -15,9 +15,9 @@ public class GraphRAG {
   public static void main(String[] args) {
     String uri = "bolt://" + HOST + ":" + PORT;
     try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD))) {
-      tryRun(() -> runQuery1HybridVectorGraph(driver), "Query 1");
+      tryRun(() -> runQuery1GraphTraversal(driver), "Query 1");
       tryRun(() -> runQuery2MultiHopEntityBridge(driver), "Query 2");
-      tryRun(() -> runQuery3TemporalAware(driver), "Query 3");
+      tryRun(() -> runQuery3LatestChunks(driver), "Query 3");
       tryRun(() -> runQuery4CompositeScoring(driver), "Query 4");
       tryRun(() -> runQuery5AgenticRAG(driver), "Query 5");
     }
@@ -35,7 +35,7 @@ private static void tryRun(Runnable r, String name) {
   // Query 1: Graph Traversal with Entity Collection
   // Finds chunks and their mentioned entities via graph traversal
   // (vector search requires SQL; see queries.sh Query 1 for the hybrid version)
-  private static void runQuery1HybridVectorGraph(Driver driver) {
+  private static void runQuery1GraphTraversal(Driver driver) {
     printHeader("Query 1: Graph Traversal with Entity Collection",
         "Find chunks and their mentioned entities via graph traversal.");
 
@@ -86,17 +86,16 @@ private static void runQuery2MultiHopEntityBridge(Driver driver) {
     }
   }
 
-  // Query 3: Temporal-Aware Retrieval
-  // Filters chunks by chunkIndex to get latest context per source
-  private static void runQuery3TemporalAware(Driver driver) {
-    printHeader("Query 3: Temporal-Aware Retrieval",
-        "Get the latest chunk (highest chunkIndex) per source.");
+  // Query 3: Latest Chunk Per Document
+  // Returns the highest-indexed chunk for each source document
+  private static void runQuery3LatestChunks(Driver driver) {
+    printHeader("Query 3: Latest Chunk Per Document",
+        "Get the highest-indexed chunk per source document.");
 
     String cypher = """
         MATCH (c:Chunk)
-        WHERE c.chunkIndex = 1
         RETURN c.content AS content, c.source AS source, c.chunkIndex AS chunkIndex
-        ORDER BY c.chunkIndex DESC
+        ORDER BY c.source, c.chunkIndex DESC
         LIMIT 10""";
 
     try (Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
index d7717bb..f338847 100644
--- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGContentRetriever.java
@@ -18,8 +18,13 @@
  * Demonstrates a Graph RAG content retrieval pipeline that combines LangChain4j
  * embeddings with ArcadeDB's graph traversal via the Neo4j Bolt driver.
  *
- * Pipeline: embed query → vector similarity for chunks → graph expansion
+ * Pipeline: embed query → cosine similarity for chunks → graph expansion
  * to find related entities → return enriched context.
+ *
+ * Similarity is computed in-memory using LangChain4j's CosineSimilarity because
+ * ArcadeDB's vectorNeighbors() function is SQL-only and not available over the
+ * Bolt protocol. The graph expansion step (MENTIONS traversal) runs server-side
+ * via Cypher over Bolt.
  */
 public class GraphRAGContentRetriever {
 
diff --git a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
index 5cb9520..f6e228f 100644
--- a/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
+++ b/graph-rag/langchain4j/src/main/java/com/arcadedb/examples/GraphRAGEmbeddingStore.java
@@ -19,8 +19,12 @@
  *
  * LangChain4j generates 384-dimensional embeddings using AllMiniLmL6V2 (runs
  * in-process, no API keys). The embeddings are stored in ArcadeDB's LCChunk
- * vertex type via Cypher over Bolt. Similarity is computed using LangChain4j's
- * CosineSimilarity.
+ * vertex type via Cypher over Bolt.
+ *
+ * Similarity is computed in-memory using LangChain4j's CosineSimilarity because
+ * ArcadeDB's vectorNeighbors() function is SQL-only and not available over the
+ * Bolt protocol. For server-side vector search, see queries.sh which uses the
+ * HTTP API with vectorNeighbors().
  */
 public class GraphRAGEmbeddingStore {
 
@@ -37,6 +41,9 @@ public static void main(String[] args) {
     try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(USER, PASSWORD));
          Session session = driver.session(SessionConfig.forDatabase("GraphRAG"))) {
 
+      // Clean up any LCChunk nodes from previous runs
+      session.run("MATCH (c:LCChunk) DELETE c");
+
       // Ingest sample chunks with real embeddings via Cypher over Bolt
       String[] texts = {
           "GraphRAG combines knowledge graphs with vector search to improve retrieval accuracy.",
diff --git a/graph-rag/queries/queries.sh b/graph-rag/queries/queries.sh
index 2f93b70..f6c71f2 100755
--- a/graph-rag/queries/queries.sh
+++ b/graph-rag/queries/queries.sh
@@ -50,14 +50,13 @@ LIMIT 20
 
 # ─────────────────────────────────────────────────────────────────────────────
 echo ""
-echo "=== Query 3: Temporal-Aware Retrieval (Cypher) ==="
-echo "Get latest chunks per source."
+echo "=== Query 3: Latest Chunk Per Document (Cypher) ==="
+echo "Get the highest-indexed chunk per source document."
 echo ""
 query "cypher" "
 MATCH (c:Chunk)
-WHERE c.chunkIndex = 1
 RETURN c.content, c.source, c.chunkIndex
-ORDER BY c.chunkIndex DESC
+ORDER BY c.source, c.chunkIndex DESC
 LIMIT 10
 "