Feat/knowledge graphs by robfrank · Pull Request #9 · ArcadeData/arcadedb-usecases

robfrank · 2026-02-25T23:04:01Z

No description provided.

gemini-code-assist · 2026-02-25T23:04:24Z

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, self-contained knowledge-graphs use case within the arcadedb-usecases repository. The primary goal is to showcase ArcadeDB's unique ability to unify graph traversal, vector similarity, full-text search, and time-series data within a single database, applied to an academic research knowledge graph domain. The implementation includes a robust CI setup to ensure the use case's functionality across different execution environments.

Highlights

New Use Case: Knowledge Graphs: Introduced a new, self-contained knowledge-graphs use case to demonstrate ArcadeDB's multi-model capabilities.
Multi-Model Demonstration: The use case showcases graph traversal, vector similarity, full-text search, and time-series data within a single academic research knowledge graph.
CI Workflow Integration: Implemented a GitHub Actions CI workflow for the knowledge-graphs use case, ensuring functionality across curl and java runners.
Comprehensive Documentation: Provided detailed design and implementation plans, along with a comprehensive README, to guide users through the new use case.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/plans/2026-02-25-knowledge-graphs-ci.md
- Added CI implementation plan for the knowledge graphs use case.
docs/plans/2026-02-25-knowledge-graphs-design.md
- Added design document for the knowledge graphs use case.
docs/plans/2026-02-25-knowledge-graphs.md
- Added main implementation plan for the knowledge graphs use case.
knowledge-graphs/README.md
- Added a comprehensive README for the new knowledge graphs use case.
knowledge-graphs/docker-compose.yml
- Added Docker Compose configuration to run ArcadeDB for the use case.
knowledge-graphs/java/pom.xml
- Added Maven Project Object Model (POM) for the Java example application.
knowledge-graphs/java/src/main/java/com/arcadedb/examples/KnowledgeGraph.java
- Added the main Java application demonstrating knowledge graph queries.
knowledge-graphs/queries/queries.sh
- Added a shell script to execute various curl queries against the knowledge graph.
knowledge-graphs/setup.sh
- Added a setup script to initialize the ArcadeDB database and load schema and data.
knowledge-graphs/sql/01-schema.sql
- Added SQL schema definition for the knowledge graph database.
knowledge-graphs/sql/02-data.sql
- Added SQL statements to populate the knowledge graph with sample data.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/knowledge-graphs.yml

Activity

No human activity (comments, reviews) has been recorded on this pull request yet.
One commit was co-authored by 'Claude Sonnet 4.6'.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Replace SEARCH_CLASS() with SEARCH_INDEX('Paper[abstract]', ...) for full-text search (Q3): SEARCH_CLASS is not a known ArcadeDB function - Replace Q5 MATCH subquery with hardcoded IDs ['p2','p8','p4'] as ArcadeDB does not support subqueries inside MATCH where clauses - Fix docker-compose.yml healthcheck to use wget instead of curl since the arcadedb container image does not include curl Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-engine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- docker-compose.yml: switch healthcheck from wget/CMD-SHELL to curl/CMD, matching recommendation-engine and avoiding wget dependency in the image - sql/02-data.sql: replace date() with fixed ISO 8601 datetime strings to correct the DATE/DATETIME type mismatch and make test data reproducible - setup.sh: add retry counter (30 × 2s = 60s max) to the startup loop so ArcadeDB failures surface quickly instead of hanging until CI timeout - queries/queries.sh: document that Query 5 paper IDs are hardcoded due to ArcadeDB not supporting SELECT subqueries inside MATCH where clauses - remove redundant .gitkeep files from queries/ and java/examples/ dirs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a new knowledge-graphs use case, including the application code in Java, query scripts, SQL schema and data, and associated documentation and implementation plans. The implementation is well-structured, mirroring the existing recommendation-engine use case, and the code is clean and correct. The use of fixed versions, robust shell scripting, and clear documentation is commendable.

My review focuses on improving the implementation plan documents to align them with the superior implementation, ensuring that the documentation remains accurate and maintainable. I've identified a few areas in docs/plans/2026-02-25-knowledge-graphs.md where the plan could be updated to reflect the more robust and reproducible code that was actually implemented.

I am having trouble creating individual review comments. Click here to see my feedback.

docs/plans/2026-02-25-knowledge-graphs.md (136-139)

The setup.sh script in the implementation includes a timeout for waiting for ArcadeDB to start, which is more robust than the infinite loop in this plan. Consider updating the plan to reflect this improvement. The implemented version is:

retries=0
until curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
    "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
  (( retries++ >= 30 )) && { echo "ArcadeDB did not start after 60s"; exit 1; }
  sleep 2
done

docs/plans/2026-02-25-knowledge-graphs.md (340-349)

The plan uses date() for timestamps in PaperActivity records. This makes time-series queries non-deterministic because the results would depend on when the setup script is run. The implementation correctly uses fixed timestamp strings (e.g., '2024-01-15T10:00:00'). The plan should be updated to use fixed timestamps for reproducibility.

docs/plans/2026-02-25-knowledge-graphs.md (441)

The plan for Query 3 (Full-Text Search) uses SEARCH_CLASS. However, the implementation correctly uses SEARCH_INDEX('Paper[abstract]', '...') which is more specific and aligns with the created FULL_TEXT index on the abstract property. The plan (including the queries.sh, KnowledgeGraph.java, and README.md sections) should be updated to use SEARCH_INDEX for consistency and correctness.

WHERE SEARCH_INDEX('Paper[abstract]', 'distributed AND consensus') = true

claude · 2026-02-25T23:09:11Z

PR Review: knowledge-graphs use case

Overall this is a clean, well-structured addition that follows the recommendation-engine pattern closely. The fixes in this final batch of commits address the prior review feedback well. A few observations below.

What's working well

Healthcheck fix (docker-compose.yml): switching to ["CMD", "curl", "-sf", ...] is the right call and now matches recommendation-engine/docker-compose.yml exactly.
Retry timeout (setup.sh): the (( retries++ >= 30 )) pattern is correct — 30 iterations × 2 s = 60 s max wait, and the arithmetic expression is safely exempt from set -e because it is the first element in a && chain.
Fixed datetime literals (sql/02-data.sql): replacing date() with ISO 8601 strings eliminates the DATE/DATETIME type mismatch and makes test data fully reproducible.
Q5 comment (queries.sh): documenting the ArcadeDB subquery limitation honestly is better than silently hardcoding IDs.
.gitkeep cleanup: removing empty placeholder files is correct hygiene.

Issues worth addressing

1. queries.sh exits on the first failing query

The script runs under set -euo pipefail, so a single query error terminates the whole run with no output for subsequent queries. The Java counterpart wraps every query in tryRun() which catches and logs failures individually. Consider wrapping each query call similarly:

query "cypher" "..." || echo "[Query 1 FAILED]"

2. Researcher.embedding has no vector index

01-schema.sql creates LSM_VECTOR indexes on Paper(embedding) and Topic(embedding), but not on Researcher(embedding), even though embedding is populated for every researcher in 02-data.sql. Either:

add CREATE INDEX IF NOT EXISTS ON Researcher (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' }; to the schema (enables researcher-similarity queries in the future), or
drop the embedding property from Researcher to avoid misleading readers.

3. Hardcoded Query 5 seed IDs are difficult to verify

The comment says the IDs ['p2', 'p8', 'p4'] are "the top-3 vector-similarity results" for [0.8, 0.2, 0.1, 0.1], but a reader running the queries in order has no easy way to confirm this without re-running Query 2 and cross-referencing the output. A brief note pointing to Query 2 output (or an inline comment listing the cosine scores) would help users validate the example.

4. apply_file silently skips multi-line statements

setup.sh processes SQL files one line at a time. Any statement that spans multiple lines would be split, mangled, and sent as broken fragments (the remainder would be silently dropped since blank/comment lines are skipped). All current statements fit on one line, so there is no immediate bug, but this is a fragile assumption. Worth a comment in the function:

# NOTE: each SQL statement must occupy exactly one line in the file.

Nits

queries.sh re-echoes the description of Q5 twice (once in the echo "Find topics..." line and again in the new note). Minor redundancy but not a problem.
PaperActivity uses paperId (camelCase) as a foreign-key string rather than an actual graph edge. This is intentional to demonstrate the document/time-series pillar, but a schema comment explaining why it is not an edge would help newcomers.

Summary

The blocking issues are minor — the most actionable ones are the queries.sh error-handling gap (#1) and the orphaned Researcher.embedding property (#2). Everything else is polish. The use case is a solid, runnable demo of ArcadeDB's multi-model capabilities.

robfrank and others added 10 commits February 26, 2026 00:04

feat: scaffold knowledge-graphs directory structure

92efa23

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add knowledge-graphs docker-compose and setup script

8c9d204

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add knowledge-graphs schema

b037e18

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add knowledge-graphs curl queries

6571655

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add KnowledgeGraph Java program

2fa6d29

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add knowledge-graphs README

6155847

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: correct SEARCH_INDEX syntax in README

0846d24

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: add semicolons to schema SQL for consistency with recommendation…

ccaaa75

…-engine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

robfrank force-pushed the feat/knowledge-graphs branch from 521cf97 to a74cf18 Compare February 25, 2026 23:06

gemini-code-assist Bot reviewed Feb 25, 2026

View reviewed changes

robfrank merged commit d33a9f0 into main Feb 25, 2026
6 checks passed

claude Bot mentioned this pull request Mar 2, 2026

feat: add fraud detection use case #16

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/knowledge graphs#9

Feat/knowledge graphs#9
robfrank merged 10 commits into
mainfrom
feat/knowledge-graphs

robfrank commented Feb 25, 2026

Uh oh!

gemini-code-assist Bot commented Feb 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robfrank commented Feb 25, 2026

Uh oh!

gemini-code-assist Bot commented Feb 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

docs/plans/2026-02-25-knowledge-graphs.md (136-139)

docs/plans/2026-02-25-knowledge-graphs.md (340-349)

docs/plans/2026-02-25-knowledge-graphs.md (441)

Uh oh!

Uh oh!

claude Bot commented Feb 25, 2026

PR Review: knowledge-graphs use case

What's working well

Issues worth addressing

Nits

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant