feat: add fraud detection use case#16
Conversation
Demonstrate ArcadeDB's multi-model fraud detection capabilities across four signal types: graph traversal, vector similarity, time-series analysis, and full-text fuzzy matching with 8 query patterns. Includes docker-compose, setup script, SQL schema/data, curl queries, Java runner, CI workflow, and design docs. Targets ArcadeDB 26.3.1-SNAPSHOT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, self-contained fraud detection use case for ArcadeDB. It showcases the database's multi-model strengths by combining graph, vector, time-series, and full-text capabilities to identify complex fraud patterns. The implementation includes a complete environment with Docker, sample data, and both command-line and Java application examples, ensuring a ready-to-run and verifiable demonstration of advanced fraud analytics. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Code Review Overall a well-structured addition that follows recommendation-engine conventions closely. A few issues worth addressing before merging. HIGH: Query 3 will become flaky over time. The circular-flow Cypher query uses datetime() - duration('P30D'). The transfer data is hardcoded to Feb 5-25 2026, so this passes today but returns zero rows after April 7 2026, making CI silently green on a broken query. HIGH: Java runner always exits 0 even if all queries fail. tryRun() catches exceptions but main() always returns normally, so the java CI matrix job can never fail. Track failures and call System.exit(1) if any query threw. MEDIUM: queries.sh has no result assertions. A zero-row result still exits 0. The test plan calls out specific expected matches (acct-B through acct-E, the F/G pair, acct-H) but these are never verified. MEDIUM: Confirm time_bucket() exists in ArcadeDB SQL. This is a TimescaleDB/PostgreSQL function. If ArcadeDB does not support it, Query 4 always throws, tryRun() swallows it, and CI stays green while the feature is completely broken. MEDIUM: Remove docs/plans/ files. The two files under docs/plans/ are internal implementation-planning artifacts with step-by-step notes and internal tooling instructions. They should not be committed to the public repo. LOW: SNAPSHOT Docker image is mutable. arcadedata/arcadedb:26.3.1-SNAPSHOT can change between CI runs. Pin to a stable release when available. LOW: Missing .java-version file. recommendation-engine/java/ has one with value 21; fraud-detection/java/ does not. Generated with Claude Code |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive fraud detection use case, showcasing ArcadeDB's multi-model capabilities. The implementation is well-structured, with clear separation of concerns between setup scripts, queries, and application code. My review focuses on improving security practices in the Docker configuration, ensuring consistency across documentation and code, and applying Java best practices. I've identified a few discrepancies in the design documents compared to the implementation, which should be aligned for clarity. Overall, this is a great addition to the use cases repository.
| ports: | ||
| - "2480:2480" | ||
| environment: | ||
| JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb" |
There was a problem hiding this comment.
Hardcoding credentials, even for a local demo, is a security risk and a bad practice. It's better to use a .env file to manage secrets and reference them here. This prevents credentials from being checked into version control and makes the configuration more flexible. You can use variable substitution with a default value to maintain ease of use.
You would then have the option to create a .env file in the same directory with content like:
ARCADEDB_PASS=arcadedb
And add .env to your .gitignore file.
JAVA_OPTS: "-Darcadedb.server.rootPassword=${ARCADEDB_PASS:-arcadedb}"| SELECT t.id, t.amount, t.merchant, | ||
| vectorDistance(t.behavior_embedding, c.profile_embedding) AS deviation | ||
| FROM Transaction t | ||
| JOIN Customer c ON t.customer_id = c.id |
There was a problem hiding this comment.
The join condition ON t.customer_id = c.id appears to be a typo. The Transaction type uses account_id to link to an account, and the implementation in the Java/shell files correctly uses t.account_id. The documentation should be updated for consistency.
| JOIN Customer c ON t.customer_id = c.id | |
| JOIN Customer c ON t.account_id = c.id |
| ```sql | ||
| SELECT account_id, rate(ts) AS current_tps | ||
| FROM Transaction | ||
| WHERE ts > now() - INTERVAL '5m' | ||
| GROUP BY account_id | ||
| HAVING current_tps > 2 | ||
| ``` |
There was a problem hiding this comment.
This query using now() - INTERVAL '5m' and rate() is inconsistent with the static dataset and the actual implementation. The implementation uses a fixed time window and count(*), which is more appropriate for this demo. The design document should reflect the implemented query.
Suggested update:
SELECT account_id, count(*) AS txn_count, min(ts) AS first_txn, max(ts) AS last_txn
FROM Transaction
WHERE ts BETWEEN '2026-03-01T13:00:00Z' AND '2026-03-01T13:05:00Z'
GROUP BY account_id
HAVING txn_count > 5| ```sql | ||
| SELECT correlate(a.amount, b.amount) AS correlation | ||
| FROM Transaction a, Transaction b | ||
| WHERE a.account_id = 'acct-A' AND b.account_id = 'acct-B' | ||
| AND a.ts > now() - INTERVAL '30d' | ||
| AND b.ts > now() - INTERVAL '30d' | ||
| ``` |
There was a problem hiding this comment.
This query is significantly different from the implementation in the Java and shell files. The documentation uses correlate() and a relative time window with now(), while the implementation calculates averages and counts over a fixed historical period. The documentation should be updated to match the code for clarity and accuracy.
Suggested update:
SELECT a.account_id AS account_a, b.account_id AS account_b,
avg(a.amount) AS avg_a, avg(b.amount) AS avg_b,
count(*) AS matching_txns
FROM Transaction a, Transaction b
WHERE a.account_id = 'acct-A' AND b.account_id = 'acct-B'
AND a.ts >= '2026-02-01T00:00:00Z'
AND b.ts >= '2026-02-01T00:00:00Z'|
|
||
| - Docker and Docker Compose | ||
| - `curl` and `jq` | ||
| - Java 17+ and Maven 3.x (for the Java demo) |
There was a problem hiding this comment.
The prerequisite for Java is listed as Java 17+, but the pom.xml file is configured for Java 21. To avoid confusion and potential build issues for users, this should be made consistent. It's best to specify Java 21 here.
| - Java 17+ and Maven 3.x (for the Java demo) | |
| - Java 21+ and Maven 3.x (for the Java demo) |
| import com.arcadedb.query.sql.executor.ResultSet; | ||
| import com.arcadedb.remote.RemoteDatabase; | ||
|
|
||
| public class FraudDetection { |
There was a problem hiding this comment.
This class consists only of static methods and is not designed for instantiation. It's a good practice to make such utility classes final and provide a private constructor to prevent accidental instantiation.
Example:
public final class FraudDetection {
private FraudDetection() { /* prevent instantiation */ }
// ...
}| public class FraudDetection { | |
| public final class FraudDetection { |
🧪 CI InsightsHere's what we observed from your CI run for f088c17. 🟢 All jobs passed!But CI Insights is watching 👀 |
- Fix typo in design doc: customer_id → account_id in Query 5 - Update design doc Queries 6 and 7 to match implementation (remove aspirational rate()/correlate()/now() syntax) - Fix README Java prerequisite: 17+ → 21+ to match pom.xml and CI - Add .java-version file for consistency with recommendation-engine Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed review feedback in bef3b4f. Here's the breakdown: Fixed:
Won't fix (with reasoning):
🤖 Generated with Claude Code |
PR Review: feat/fraud-detectionGood overall structure — this closely mirrors the Blocking Issues1. Missing snapshot repository in
Fix — add to <repositories>
<repository>
<id>sonatype-snapshots</id>
<url>https://s01.oss.sonatype.org/content/repositories/snapshots</url>
<snapshots><enabled>true</enabled></snapshots>
<releases><enabled>false</enabled></releases>
</repository>
</repositories>2. private static void tryRun(Runnable r, String name) {
try { r.run(); }
catch (Exception e) { System.err.println("[" + name + " FAILED] " + e.getMessage()); }
}Any query that throws prints to stderr but the program exits 0. The CI 3. Q3 (Circular Money Flow) is a time-bomb in CI WHERE all(t IN relationships(path)
WHERE t.ts > datetime() - duration('P30D'))The transfers are hardcoded to Feb 5–25, 2026. This query returns results only while CI runs within 30 days of those dates — i.e. before ~March 25, 2026. After that, Q3 silently returns empty. Since Fix: use a static lower bound matching the seed data, e.g. Important Issues4.
5. AI planning artifacts committed to the repo
These are internal development artifacts, not user documentation. They should be removed before merge. 6. SNAPSHOT Docker image — fragile CI
Minor / Non-blocking7. Missing indexes on query-critical columns
Negligible at 70 rows but worth adding as a reference implementation. 8.
9. Structuring deposits missing for accounts D and E
10. This bypasses CI for all Dependabot merges, meaning a dependency regression can land on main without validation. Summary
Fix the three blocking issues before merge; the rest can be addressed in follow-up. Generated with Claude Code |
Code Review — Fraud Detection Use CaseOverall this is a well-structured addition that follows the established repo conventions (same layout as Bug / CorrectnessQuery 7 — Cartesian product instead of correlation The query joins Build / Runtime
PerformanceNo index on Query 4 (structuring detection) filters and groups on both Query 5 — The function appears in both the Design
Query 6 — hardcoded timestamp window This only works for the seeded test data. A brief comment in Repo Hygiene
MinorUnsafe cast in ((Number) r.getProperty("amount")).doubleValue()If Summary
The CI structure, SHA-pinned actions, schema design, and breadth of fraud patterns (graph ring, synthetic identity, velocity, structuring, behavioral anomaly) are all solid. Fixing the Query 7 correctness issue and the missing Maven SNAPSHOT repository declaration are the highest-priority items before merging. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review — Fraud Detection Use CaseThis is a well-structured addition that follows the existing Bugs / Correctness1. Query 7 is a Cartesian product, not a correlation query
SELECT a.account_id AS account_a, b.account_id AS account_b,
avg(a.amount) AS avg_a, avg(b.amount) AS avg_b,
count(*) AS matching_txns
FROM Transaction a, Transaction b
WHERE a.account_id = 'acct-A' AND b.account_id = 'acct-B'
AND a.ts >= '2026-02-01T00:00:00Z'
AND b.ts >= '2026-02-01T00:00:00Z'This is a cross-join with no join predicate. 2.
private static void tryRun(Runnable r, String name) {
try {
r.run();
} catch (Exception e) {
System.err.println("[" + name + " FAILED] " + e.getMessage());
}
}All 8 queries are wrapped in 3. Database creation errors are silently suppressed
curl -sf ... -d '{"command": "create database ..."}' > /dev/null || trueThe 4. Inconsistent and fragile type cast in Java
((Number) r.getProperty("amount")).doubleValue()This is the only place in the file that explicitly casts a property to Performance5. Query 7 cross-join scales as O(n²) Related to bug #1 — the Cartesian product degrades quadratically as Transaction records grow. For the current demo data this is invisible, but the query should be rewritten before it becomes a pattern copied into production. Reliability / CI6. SNAPSHOT Docker image in CI
7. The Minor / Style8. Internal directive in
9. This file applies to the entire repository, not just the fraud-detection use case. The auto-merge condition on approved Dependabot PRs should be reviewed by repo maintainers independently of this PR. 10. The schema stores Summary
The most important fixes are #1 (Query 7 logic) and #2 (CI exit code) — both undermine CI's value as a correctness gate. The rest are polish. Generated with Claude Code |
Code Review — Fraud Detection Use CaseOverall this is a well-structured use case that mirrors the recommendation-engine layout and demonstrates ArcadeDB's multi-model capabilities clearly. A few items to address before merging: 🔴 Issues1. image: arcadedata/arcadedb:latest # ❌The existing image: arcadedata/arcadedb:26.3.1-SNAPSHOT # ✅2.
| curl -sf … -d @- > /dev/null # exit 0 on HTTP 2xx, but errors in body are lostArcadeDB returns HTTP 200 for a statement that hits a constraint violation or syntax error, with an send_sql() {
local stmt="$1"
local response
response=$(jq -cn --arg cmd "$stmt" '{"language":"sql","command":$cmd}' \
| curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
-X POST "${ARCADEDB_URL}/api/v1/command/${DB_NAME}" \
-H "Content-Type: application/json" \
-d @-)
if echo "$response" | jq -e '.error' > /dev/null 2>&1; then
echo "ERROR executing: $stmt" >&2
echo "$response" | jq '.error' >&2
return 1
fi
}3. Queries 4 (structuring), 5 (behavioral anomaly), and 6 (velocity) rely on full scans of
For a demo with 70 rows this is invisible; at any realistic scale these queries will be full scans. 4. In ((Number) r.getProperty("amount")).doubleValue() // NPE if amount is nullAdd a null guard or use 🟡 Improvements5. Query 3 (Circular Money Flow): no LIMIT on cycle traversal MATCH path = (origin:Account)-[:TRANSFERRED_TO*3..6]->(origin)With 6. Query 7 (Correlated Activity): implicit Cartesian product FROM Transaction a, Transaction b
WHERE a.account_id = 'acct-A' AND b.account_id = 'acct-B'
AND a.ts >= '2026-02-01T00:00:00Z'
AND b.ts >= '2026-02-01T00:00:00Z'This computes the full N×M cross-join between 7. Hardcoded timestamps in Queries 6 & 7 WHERE ts BETWEEN '2026-03-01T13:00:00Z' AND '2026-03-01T13:05:00Z'These match the sample data, so the queries work for the demo. A brief comment (or README note) explaining that the timestamp window is intentionally fixed to the seed data would prevent confusion. 8.
9. CI: no result-count assertion Both ✅ What's done well
|
ArcadeDB does not support JOIN, HAVING, comma-separated FROM tables, vectorDistance(), time_bucket(), or .similarity(). Rewrite all 7 failing queries using proven ArcadeDB patterns: - Q2: SEARCH_INDEX() instead of .similarity() cross-join - Q3: Explicit 5-hop Cypher path instead of all() predicate - Q4: Subquery wrapper instead of HAVING - Q5: vectorCosineSimilarity() instead of vectorDistance() JOIN - Q6: Subquery wrapper instead of HAVING - Q7: GROUP BY account_id instead of cross-join - Q8: Subquery IN filter instead of JOIN Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: Fraud Detection Use CaseOverall this is a well-structured, clearly documented use case that demonstrates ArcadeDB's multi-model capabilities across graph, vector, time-series, and full-text signals. The shell scripts are clean, the CI workflow is solid, and the Java code is idiomatic. Here are my findings: Bugs / Correctness Issues1. Silent CI failures — Java runner (
2. Silent CI failures — curl runner (
3. Docker image version mismatch (
4. Query 2 description inaccurate — SSN check is not programmatic The description says "Find accounts matching 'Smith' via full-text index, then check for shared SSN" but both the curl and Java implementations simply return matching rows — no SSN cross-check is performed by the query. The shared SSN is only visible to a human reading the output. Either update the description to "inspect for shared SSN" or extend the query to surface only accounts sharing an SSN (e.g. via a self-join or subquery grouping by Performance Considerations5. No indexes on commonly filtered columns Queries 4-7 filter on 6. Structuring query omits the time dimension (Query 4) The query groups deposits only by amount range and count, with no time window. True structuring involves multiple sub-threshold transactions within a short window (e.g., same day). The test data happens to place all deposits on the same day, but the query would also flag three deposits spread across a year, making the "time-series" characterisation slightly misleading. Scoping by Code Quality / Minor Issues7. The function reads one line = one SQL statement. Any statement split across multiple lines will be sent as separate invalid fragments. This works for the current single-line-per-statement files, but a comment noting this constraint would prevent accidental breakage when the SQL files are extended. 8. The intent is idempotency (DB may already exist), but 9. Query 8 "multi-model" label is misleading Query 8 is described as "Multi-Model Investigation (Combined)" but is a straightforward SQL subquery joining two vertex types via a scalar field filter. It does not combine graph traversal, vector search, or full-text lookups. Renaming it or extending it with one additional signal type would better justify the label. 10. The step pins Summary
The structural foundation is solid — fixing the silent-failure issues in the CI runners is the highest-priority change before merging. Generated with Claude Code |
SEARCH_INDEX() is not supported in WHERE clauses in ArcadeDB. Use a simple SSN equality filter instead for synthetic identity detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove FULL_TEXT index (no longer used by any query) - Update Query 2 description from Full-Text to Document signal type - Remove SEARCH_INDEX reference from version notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review — Fraud Detection Use CaseGood overall structure that mirrors the existing Bugs / Correctness1. The rest of the PR consistently references Fix: pin to 2. Missing The design doc, README, and query table all describe Query 2 as a full-text / fuzzy-matching operation. The schema file never creates the index: -- missing from fraud-detection/sql/01-schema.sql
CREATE INDEX IF NOT EXISTS ON Account (full_name) FULL_TEXT;Without it, any future query that relies on the full-text index (e.g. the 3. Query 2 doesn't actually do fuzzy name matching The design doc and PR description show Query 2 using SELECT id, full_name, ssn FROM Account WHERE ssn = '123-45-6789' ORDER BY idThis works for the demo data, but it doesn't demonstrate the full-text / fuzzy-matching capability advertised in the docs. Either implement 4. Query 5 uses a hardcoded embedding instead of joining the In both vectorCosineSimilarity(behavior_embedding, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])The vector happens to match SELECT t.id, t.amount, t.merchant, t.account_id,
vectorCosineSimilarity(t.behavior_embedding, c.profile_embedding) AS similarity
FROM Transaction AS t, Customer AS c
WHERE t.account_id = c.id
AND vectorCosineSimilarity(t.behavior_embedding, c.profile_embedding) < 0.5
ORDER BY similarity5. Query 8 doesn't demonstrate multi-model investigation The current implementation: SELECT id, name FROM Account
WHERE id IN (SELECT id FROM Customer WHERE recent_behavior IN ['suspicious', 'anomalous'])The design doc and README both describe Query 8 as blending graph traversal, vector deviation, and temporal patterns into a composite risk score. The current implementation uses none of those — it's weaker than several earlier queries. Either bring the implementation closer to the design, or update the description to match what's actually implemented. Code Quality6. Planning documents committed to the repo
7. A Mergify configuration is a repo-level change that belongs in a separate PR with its own review. It currently auto-merges Dependabot PRs with 8. curl -sf ... -d "{\"command\": \"create database ${DB_NAME}\"}" > /dev/null || true
9. until curl -sf ... "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
sleep 2
doneIf ArcadeDB never becomes healthy the script loops forever. The Docker Minor
Summary
The most important fix before merging is pinning the Docker image version to match the Java driver snapshot. The query correctness issues (3, 4, 5) are worth addressing to ensure the use case actually demonstrates what the README and docs promise. |
Code Review: feat: add fraud detection use caseOverall this is a well-structured addition that follows existing repo conventions. The 8-query multi-model demo is a compelling showcase for ArcadeDB. Below are my findings by area. Critical / Bugs
retries=0
until curl -sf -u "${ARCADEDB_USER}:${ARCADEDB_PASS}" \
"${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
retries=$((retries + 1))
[ "$retries" -ge 30 ] && { echo "ERROR: ArcadeDB not ready after 60s"; exit 1; }
sleep 2
doneAlternatively, use
private static boolean hadFailure = false;
private static void tryRun(Runnable r, String name) {
try { r.run(); }
catch (Exception e) {
hadFailure = true;
System.err.println("[" + name + " FAILED] " + e.getMessage());
}
}
// in main(), before "All queries complete.":
if (hadFailure) System.exit(1);
If these INSERTs are genuinely missing, Query 6 (velocity attack, requires Reproducibility / Version Pinning
The SNAPSHOT dependency in Missing IndexesThe schema creates indexes on primary ID columns and two LSM_VECTOR indexes, but several hot query paths have none:
For ~70 records this is harmless today, but as a best-practices demo it is worth modelling what production schemas would look like. Minor Issues
((Number) r.getProperty("amount")).doubleValue()If Design Note
Summary
The use case structure, CI workflow patterns, and multi-model query design are solid and consistent with the existing repo. The two most important items before merge are the silent Java exit-0 issue and confirming the data file is complete. |
- Pin docker-compose.yml to arcadedata/arcadedb:26.3.1-SNAPSHOT to match Maven dependency version - Rename Query 8 from "Multi-Model Investigation" to "Cross-Type Investigation" — the query is a SQL subquery across vertex types, not a combination of graph/vector/time-series signals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review responseFixed (2 items)3. Docker image version mismatch — Fixed. Pinned 9. Query 8 "multi-model" label misleading — Agreed, the query is a SQL subquery across two vertex types, not a combination of graph + vector + time-series. Renamed to "Cross-Type Investigation (SQL Subquery)". (726d4c7) Already resolved / stale (1 item)4. Query 2 description inaccurate — This was based on an earlier version. Query 2 was rewritten in 5f9858e to use Pushback (7 items)1. Java runner exits 0 on failure — The 2. curl runner exits 0 on empty results — Same as above — the 5. No indexes on commonly filtered columns — At demo scale with ~70 records, these indexes would add schema complexity without demonstrable benefit. The schema already includes 3 UNIQUE indexes and 2 LSM_VECTOR indexes. Adding a README note about production indexing would be over-documenting for a demo. 6. Structuring query omits time dimension — The query is intentionally simplified. The sample data is constructed so that the fraud ring deposits cluster on the same day, making the query return correct results. Adding 7. 8. 10. |
PR Review: Fraud Detection Use CaseGreat addition! The overall structure mirrors the existing recommendation-engine well, and the multi-model approach (graph + vector + time-series + full-text) is a compelling showcase for ArcadeDB. Here are my findings across several categories. Bugs / Correctness Issues1. FULL_TEXT index missing from schema ( The design doc and README both mention a CREATE INDEX IF NOT EXISTS ON Account (full_name) FULL_TEXT;Without it, Query 2's fuzzy-matching intent cannot be fulfilled by the index. 2. Query 2 does not use full-text / fuzzy matching The design doc specifies 3. Query 3 uses a fixed 5-hop pattern instead of variable-length cycle detection The design doc shows: MATCH path = (origin:Account)-[:TRANSFERRED_TO*3..6]->(origin)The implementation hard-codes exactly 5 hops (A->B->C->D->E->A). A variable-length pattern is both more general and a better demonstration of ArcadeDB's Cypher capabilities. 4. Query 4 drops the The design specifies grouping by day ( 5. Query 5 uses a hardcoded embedding instead of joining vectorCosineSimilarity(behavior_embedding, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])This vector happens to equal JOIN Customer c ON t.account_id = c.id
WHERE vectorDistance(t.behavior_embedding, c.profile_embedding) > 0.76. Query 7 is not a correlation query The description says "compare transfer patterns between two accounts to detect coordination", but the implementation is just a Performance Considerations7. Missing index on Queries 4-8 all filter or group by CREATE INDEX IF NOT EXISTS ON Transaction (account_id) NOTUNIQUE;
CREATE INDEX IF NOT EXISTS ON Deposit (account_id) NOTUNIQUE;would improve query performance and better illustrate ArcadeDB indexing. 8. The graph schema models account relationships (device, phone, address, transfers) as edges, yet transactions are linked to accounts via a plain string foreign key. This is inconsistent with the graph-centric design and forces SQL-style lookups instead of graph traversal. Consider modeling it as a CI / Reliability9. SNAPSHOT Docker image makes CI non-deterministic image: arcadedata/arcadedb:26.3.1-SNAPSHOTSNAPSHOT images can be overwritten at any time. CI builds will silently change behavior when a new snapshot is published. Pin to a stable release tag (or at least document that this is intentional and temporary). 10. until curl -sf ... /api/v1/ready > /dev/null 2>&1; do
sleep 2
doneIf ArcadeDB never becomes healthy, this loops forever until the 15-minute CI timeout kills the job. A counter (e.g. 50 retries x 2 s = 100 s) with an explicit error message would fail faster and give a better diagnostic. 11. No result validation — empty results silently pass Both Code Quality12. NPE risk in Java Query 5 ( ((Number) r.getProperty("amount")).doubleValue()If 13. curl ... -d "{\"command\": \"create database ${DB_NAME}\"}" > /dev/null || trueThis suppresses all errors, including network failures or auth problems. It would be better to tolerate only a "database already exists" response by inspecting the HTTP status code or response body. Scope / Repository Hygiene14.
These are internal scaffolding documents that have no value to end users and should not be committed to the repository. 15. This is a new repo-level Mergify configuration that affects all PRs and branches, not just the fraud-detection use case. It should be introduced in a separate, focused PR with explicit discussion from maintainers. Minor / Suggestions
Summary
The most impactful items to address before merging are #1 (missing FULL_TEXT index), #9 (SNAPSHOT image), #11 (no result validation), and #14-15 (repo hygiene). The query simplifications (#2-#6) are worth discussing — they work for the demo but diverge from the stated design goals. 🤖 Generated with Claude Code |
- Remove .mergify.yml (repo-wide config unrelated to fraud-detection, should be introduced in a separate PR) - Update design doc Query 2 to reflect SSN lookup implementation with note explaining why SEARCH_INDEX() was dropped - Update design doc Query 3 to reflect explicit 5-hop pattern with note explaining ArcadeDB Cypher limitations - Fix Query 2 signal type label: Full-Text → Document - Fix Query 8 label: Multi-Model → Cross-Type Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review responseFixed (3 items)9. Query 8 label misleading — Renamed to "Cross-Type Investigation (SQL Subquery)" in the previous commit (726d4c7). Design doc also updated. 15. Minor: README vs design doc inconsistency on Query 2 — Design doc updated to match the README ("Document" signal type) with a note explaining why SEARCH_INDEX() was dropped. (8095ca4) Already resolved / stale (2 items)1. FULL_TEXT index missing — Intentionally removed in 05d4a56. 2. Query 2 doesn't use full-text — Correct. Pushback (10 items)3. Query 3 fixed 5-hop vs variable-length — ArcadeDB's Cypher does not support 4. Query 4 missing 5. Query 5 hardcoded embedding — ArcadeDB does not support 6. Query 7 not a correlation query — ArcadeDB does not support comma-separated tables in FROM (cross-joins) or self-joins. The 7. Missing indexes on 8. 9 (CI). SNAPSHOT Docker image — This is intentional and documented in the README ("ArcadeDB Version Notes"). The features this demo uses ( 10. Wait loop no timeout — Same pattern as recommendation-engine ( 11. No result validation — Same pattern as recommendation-engine (both 12. NPE risk in Query 5 — The 13. 14. |
Code Review: Fraud Detection Use CaseGood addition to the repository - the structure mirrors the recommendation-engine pattern cleanly and the multi-model angle (graph + vector + time-series + document) is a strong showcase for ArcadeDB. A few issues to address before merging: Blockers1. SNAPSHOT image/artifact availability
Options:
2. Plan documents should not be committed
Bugs / Correctness3. Query 6 - hard-coded timestamp window WHERE ts BETWEEN '2026-03-01T13:00:00Z' AND '2026-03-01T13:05:00Z'This exact window must match the test data. If the data file ever changes dates, this query silently returns 0 rows and the test still passes (the curl script only checks for non-error exit code, not non-empty results). Either add a result-count assertion in 4. Java Query 5 - unsafe cast ((Number) r.getProperty("amount")).doubleValue()If Object amt = r.getProperty("amount");
double amount = amt instanceof Number n ? n.doubleValue() : 0.0;5. Query 8 - implicit Customer/Account ID coupling WHERE id IN (SELECT id FROM Customer WHERE recent_behavior IN ['suspicious', 'anomalous'])This assumes Design / Maintainability6. until curl -sf ... ; do sleep 2 ; doneNo timeout. If ArcadeDB never becomes healthy (e.g., image pull fails silently), this loops until the CI job's 15-minute timeout with no diagnostic output. A counter-based guard would improve this: attempts=0
until curl -sf ...; do
sleep 2
(( attempts++ )) && (( attempts >= 50 )) && { echo "ArcadeDB did not start"; exit 1; }
done(The same issue exists in recommendation-engine.) 7. curl ... -d '{"command": "create database ..."}' > /dev/null || trueThe 8.
-- IMPORTANT: each statement must fit on a single line (setup.sh applies one line at a time)Minor / Suggestions9. README - reference vector in Query 5 unexplained Query 5 uses 10. Velocity window in Query 6 is magic The specific window 11. CI - consider The Docker healthcheck is configured but the CI step does not wait for it - it relies on the What's working well
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review — PR #16: Fraud Detection Use CaseThis is a solid addition to the repo and follows the established Critical / High1. Query 5 uses a hardcoded reference vector instead of joining with
SELECT id, amount, merchant, account_id,
vectorCosineSimilarity(behavior_embedding, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]) AS profile_similarity
FROM Transaction
WHERE account_id = 'acct-H'
ORDER BY profile_similarityThe hardcoded vector SELECT t.id, t.amount, t.merchant,
vectorDistance(t.behavior_embedding, c.profile_embedding) AS deviation
FROM Transaction t
JOIN Customer c ON t.account_id = c.id
WHERE vectorDistance(t.behavior_embedding, c.profile_embedding) > 0.7
ORDER BY deviation DESCThe JOIN approach demonstrates genuine anomaly detection across all accounts dynamically, which is far more instructive. The hardcoded version also breaks the 2. Missing structuring deposits for
3.
Medium4. Query 4 loses time-bucketing semantics silently The design doc specifies per-day structuring detection via -- Design intent: flag accounts with 3+ deposits on the SAME day
-- Actual: flag accounts with 3+ deposits EVER matching the amount range
SELECT FROM (
SELECT account_id, count(*) AS deposit_count
FROM Deposit
WHERE amount BETWEEN 8000 AND 9999
GROUP BY account_id
) WHERE deposit_count >= 3The README notes that 5. Query 7 doesn't actually detect correlation The description says "Correlated Account Activity — Compare transfer patterns between two accounts to detect coordination," but the query just shows per-account aggregate stats independently (no cross-account comparison): SELECT account_id, avg(amount) AS avg_amount, count(*) AS txn_count
FROM Transaction
WHERE account_id IN ['acct-A', 'acct-B']
AND ts >= '2026-02-01T00:00:00Z'
GROUP BY account_idThis doesn't demonstrate correlation. Consider adding a comment explaining that a proper temporal correlation would require a self-join (which may have ArcadeDB limitations), or rephrase the query description to match what the query actually does. 6. README signal-type label mismatch The root It should be 7. The Low8. No FULL_TEXT index in The design doc's schema section lists:
and the top of the fraud-detection README mentions "full-text fuzzy matching" in the original design goals. The index is never created in the actual schema. If the intent changed (to SSN equality), remove the mention from the design doc to avoid confusion. 9. until curl -sf ... "${ARCADEDB_URL}/api/v1/ready" > /dev/null 2>&1; do
sleep 2
doneThis is consistent with 10. Missing
Positives
🤖 Generated with Claude Code |
Code Review — Fraud Detection Use CaseOverall this is a solid addition. The use case demonstrates ArcadeDB's multi-model capabilities well, the structure mirrors the existing use cases, and the CI workflow follows established patterns. Below are my findings, from most critical to minor. Critical: Java runner silently swallows failures — CI can pass with broken queries
Suggestion: Track a failure count and call // Add a counter field
private static int failures = 0;
// In tryRun catch block: failures++;
// At end of main:
if (failures > 0) {
System.err.println(failures + " query/queries failed.");
System.exit(1);
}Critical:
|
| Severity | Issue |
|---|---|
| Critical | Java tryRun() swallows failures — CI always exits 0 even when queries fail |
| Critical | queries.sh does not assert non-empty results — curl CI path always passes |
| Bug | Accounts D and E missing structuring deposits (Query 4 returns incomplete ring) |
| Schema | FULL_TEXT index on Account(full_name) described but not created |
| Design | Query 5 hardcodes the profile vector instead of reading from the Customer table |
| Robustness | setup.sh readiness loop has no timeout/retry bound |
| Clarity | Query 7 description promises correlation but delivers independent aggregates |
| Performance | No indexes on account_id columns used by SQL queries 4–7 |
The docs/plans/ artifacts follow the repo's established convention. The CI workflow structure, SHA-pinned action versions, and fail-fast: false matrix are all correct.
Summary
Test plan
docker compose up -dinfraud-detection/starts ArcadeDB./setup.shcreates FraudDetection database, applies schema + data without errors./queries/queries.shruns all 8 queries and returns non-empty resultscd java && mvn package -q && java -jar target/fraud-detection.jarruns all 8 queriesFraud Detection CIpasses bothtest (curl)andtest (java)matrix jobs🤖 Generated with Claude Code