Skip to content

Feat/knowledge graphs#9

Merged
robfrank merged 10 commits into
mainfrom
feat/knowledge-graphs
Feb 25, 2026
Merged

Feat/knowledge graphs#9
robfrank merged 10 commits into
mainfrom
feat/knowledge-graphs

Conversation

@robfrank

Copy link
Copy Markdown
Contributor

No description provided.

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, self-contained knowledge-graphs use case within the arcadedb-usecases repository. The primary goal is to showcase ArcadeDB's unique ability to unify graph traversal, vector similarity, full-text search, and time-series data within a single database, applied to an academic research knowledge graph domain. The implementation includes a robust CI setup to ensure the use case's functionality across different execution environments.

Highlights

  • New Use Case: Knowledge Graphs: Introduced a new, self-contained knowledge-graphs use case to demonstrate ArcadeDB's multi-model capabilities.
  • Multi-Model Demonstration: The use case showcases graph traversal, vector similarity, full-text search, and time-series data within a single academic research knowledge graph.
  • CI Workflow Integration: Implemented a GitHub Actions CI workflow for the knowledge-graphs use case, ensuring functionality across curl and java runners.
  • Comprehensive Documentation: Provided detailed design and implementation plans, along with a comprehensive README, to guide users through the new use case.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/plans/2026-02-25-knowledge-graphs-ci.md
    • Added CI implementation plan for the knowledge graphs use case.
  • docs/plans/2026-02-25-knowledge-graphs-design.md
    • Added design document for the knowledge graphs use case.
  • docs/plans/2026-02-25-knowledge-graphs.md
    • Added main implementation plan for the knowledge graphs use case.
  • knowledge-graphs/README.md
    • Added a comprehensive README for the new knowledge graphs use case.
  • knowledge-graphs/docker-compose.yml
    • Added Docker Compose configuration to run ArcadeDB for the use case.
  • knowledge-graphs/java/pom.xml
    • Added Maven Project Object Model (POM) for the Java example application.
  • knowledge-graphs/java/src/main/java/com/arcadedb/examples/KnowledgeGraph.java
    • Added the main Java application demonstrating knowledge graph queries.
  • knowledge-graphs/queries/queries.sh
    • Added a shell script to execute various curl queries against the knowledge graph.
  • knowledge-graphs/setup.sh
    • Added a setup script to initialize the ArcadeDB database and load schema and data.
  • knowledge-graphs/sql/01-schema.sql
    • Added SQL schema definition for the knowledge graph database.
  • knowledge-graphs/sql/02-data.sql
    • Added SQL statements to populate the knowledge graph with sample data.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/knowledge-graphs.yml
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
  • One commit was co-authored by 'Claude Sonnet 4.6'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

robfrank and others added 10 commits February 26, 2026 00:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace SEARCH_CLASS() with SEARCH_INDEX('Paper[abstract]', ...) for
  full-text search (Q3): SEARCH_CLASS is not a known ArcadeDB function
- Replace Q5 MATCH subquery with hardcoded IDs ['p2','p8','p4'] as
  ArcadeDB does not support subqueries inside MATCH where clauses
- Fix docker-compose.yml healthcheck to use wget instead of curl since
  the arcadedb container image does not include curl

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-engine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.yml: switch healthcheck from wget/CMD-SHELL to curl/CMD,
  matching recommendation-engine and avoiding wget dependency in the image
- sql/02-data.sql: replace date() with fixed ISO 8601 datetime strings to
  correct the DATE/DATETIME type mismatch and make test data reproducible
- setup.sh: add retry counter (30 × 2s = 60s max) to the startup loop so
  ArcadeDB failures surface quickly instead of hanging until CI timeout
- queries/queries.sh: document that Query 5 paper IDs are hardcoded due to
  ArcadeDB not supporting SELECT subqueries inside MATCH where clauses
- remove redundant .gitkeep files from queries/ and java/examples/ dirs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@robfrank robfrank force-pushed the feat/knowledge-graphs branch from 521cf97 to a74cf18 Compare February 25, 2026 23:06

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new knowledge-graphs use case, including the application code in Java, query scripts, SQL schema and data, and associated documentation and implementation plans. The implementation is well-structured, mirroring the existing recommendation-engine use case, and the code is clean and correct. The use of fixed versions, robust shell scripting, and clear documentation is commendable.

My review focuses on improving the implementation plan documents to align them with the superior implementation, ensuring that the documentation remains accurate and maintainable. I've identified a few areas in docs/plans/2026-02-25-knowledge-graphs.md where the plan could be updated to reflect the more robust and reproducible code that was actually implemented.

I am having trouble creating individual review comments. Click here to see my feedback.

docs/plans/2026-02-25-knowledge-graphs.md (136-139)

medium

The setup.sh script in the implementation includes a timeout for waiting for ArcadeDB to start, which is more robust than the infinite loop in this plan. Consider updating the plan to reflect this improvement. The implemented version is:

retries=0
until curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
    "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
  (( retries++ >= 30 )) && { echo "ArcadeDB did not start after 60s"; exit 1; }
  sleep 2
done

docs/plans/2026-02-25-knowledge-graphs.md (340-349)

medium

The plan uses date() for timestamps in PaperActivity records. This makes time-series queries non-deterministic because the results would depend on when the setup script is run. The implementation correctly uses fixed timestamp strings (e.g., '2024-01-15T10:00:00'). The plan should be updated to use fixed timestamps for reproducibility.

docs/plans/2026-02-25-knowledge-graphs.md (441)

medium

The plan for Query 3 (Full-Text Search) uses SEARCH_CLASS. However, the implementation correctly uses SEARCH_INDEX('Paper[abstract]', '...') which is more specific and aligns with the created FULL_TEXT index on the abstract property. The plan (including the queries.sh, KnowledgeGraph.java, and README.md sections) should be updated to use SEARCH_INDEX for consistency and correctness.

WHERE SEARCH_INDEX('Paper[abstract]', 'distributed AND consensus') = true

@robfrank robfrank merged commit d33a9f0 into main Feb 25, 2026
6 checks passed
@claude

claude Bot commented Feb 25, 2026

Copy link
Copy Markdown

PR Review: knowledge-graphs use case

Overall this is a clean, well-structured addition that follows the recommendation-engine pattern closely. The fixes in this final batch of commits address the prior review feedback well. A few observations below.


What's working well

  • Healthcheck fix (docker-compose.yml): switching to ["CMD", "curl", "-sf", ...] is the right call and now matches recommendation-engine/docker-compose.yml exactly.
  • Retry timeout (setup.sh): the (( retries++ >= 30 )) pattern is correct — 30 iterations × 2 s = 60 s max wait, and the arithmetic expression is safely exempt from set -e because it is the first element in a && chain.
  • Fixed datetime literals (sql/02-data.sql): replacing date() with ISO 8601 strings eliminates the DATE/DATETIME type mismatch and makes test data fully reproducible.
  • Q5 comment (queries.sh): documenting the ArcadeDB subquery limitation honestly is better than silently hardcoding IDs.
  • .gitkeep cleanup: removing empty placeholder files is correct hygiene.

Issues worth addressing

1. queries.sh exits on the first failing query

The script runs under set -euo pipefail, so a single query error terminates the whole run with no output for subsequent queries. The Java counterpart wraps every query in tryRun() which catches and logs failures individually. Consider wrapping each query call similarly:

query "cypher" "..." || echo "[Query 1 FAILED]"

2. Researcher.embedding has no vector index

01-schema.sql creates LSM_VECTOR indexes on Paper(embedding) and Topic(embedding), but not on Researcher(embedding), even though embedding is populated for every researcher in 02-data.sql. Either:

  • add CREATE INDEX IF NOT EXISTS ON Researcher (embedding) LSM_VECTOR METADATA { dimensions: 4, similarity: 'COSINE' }; to the schema (enables researcher-similarity queries in the future), or
  • drop the embedding property from Researcher to avoid misleading readers.

3. Hardcoded Query 5 seed IDs are difficult to verify

The comment says the IDs ['p2', 'p8', 'p4'] are "the top-3 vector-similarity results" for [0.8, 0.2, 0.1, 0.1], but a reader running the queries in order has no easy way to confirm this without re-running Query 2 and cross-referencing the output. A brief note pointing to Query 2 output (or an inline comment listing the cosine scores) would help users validate the example.

4. apply_file silently skips multi-line statements

setup.sh processes SQL files one line at a time. Any statement that spans multiple lines would be split, mangled, and sent as broken fragments (the remainder would be silently dropped since blank/comment lines are skipped). All current statements fit on one line, so there is no immediate bug, but this is a fragile assumption. Worth a comment in the function:

# NOTE: each SQL statement must occupy exactly one line in the file.

Nits

  • queries.sh re-echoes the description of Q5 twice (once in the echo "Find topics..." line and again in the new note). Minor redundancy but not a problem.
  • PaperActivity uses paperId (camelCase) as a foreign-key string rather than an actual graph edge. This is intentional to demonstrate the document/time-series pillar, but a schema comment explaining why it is not an edge would help newcomers.

Summary

The blocking issues are minor — the most actionable ones are the queries.sh error-handling gap (#1) and the orphaned Researcher.embedding property (#2). Everything else is polish. The use case is a solid, runnable demo of ArcadeDB's multi-model capabilities.

@claude claude Bot mentioned this pull request Mar 2, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant