Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions 3952-exists-match-subquery-relationship-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Fix #3952: EXISTS { MATCH } subquery returns false for relationship types containing Cypher keyword fragments

## Summary

`EXISTS { MATCH (p)-[:WORKS_WITH]->(:Person) }` returned `false` for all rows even when the pattern matched,
while the equivalent `WHERE (p)-[:WORKS_WITH]->()` predicate worked correctly.

## Root Cause

`ExistsExpression.matchesKeywordAt()` used `Character.isLetterOrDigit()` to detect word boundaries when
scanning the subquery string for Cypher clause keywords (WITH, WHERE, RETURN, ...).

Because underscore `_` is NOT a letter or digit in Java, the check incorrectly accepted it as a word
boundary. This caused "WITH" inside "WORKS_WITH" to be falsely recognised as the Cypher `WITH` clause
keyword, making `injectWhereConditions()` split the relationship type name and produce an invalid query
such as:

```
MATCH (p), (p)-[:WORKS_WHERE id(p) = $__exists_p WITH]->(:Person)
```

The invalid query threw an exception that was silently caught in `evaluate()`, which returned `false`.

The bug affected any relationship type whose name ends with a Cypher keyword after an underscore:
`_WITH`, `_WHERE`, `_RETURN`, `_ORDER`, `_SKIP`, `_LIMIT`, `_UNION`.

## Fix

`engine/src/main/java/com/arcadedb/query/opencypher/ast/ExistsExpression.java`

Replaced `Character.isLetterOrDigit(c)` in the boundary checks of `matchesKeywordAt()` with a new helper
`isCypherIdentifierChar(c)` that also returns `true` for `_`, matching Cypher identifier rules.

## Tests

New test class: `engine/src/test/java/com/arcadedb/query/opencypher/CypherExistsUnderscoreRelationshipTypeTest.java`

- `existsWithUnderscoreKeywordWithInRelationshipType` - reproduces issue #3952 (WORKS_WITH embeds "WITH")
- `existsWithUnderscoreKeywordWhereInRelationshipType` - KNOWS_WHERE embeds "WHERE"
- `existsWithSimpleRelationshipTypeStillWorks` - control: KNOWS (no embedded keyword) still works

All 3 new tests pass. All 5835 existing Cypher/OpenCypher tests continue to pass.
Original file line number Diff line number Diff line change
Expand Up @@ -200,22 +200,28 @@
/**
* Checks if the uppercase query string has a keyword at the given position,
* ensuring it's a word boundary (not part of a longer identifier).
* Underscore is treated as an identifier character so that relationship type names
* such as WORKS_WITH are not falsely split at embedded keyword fragments (e.g. "WITH").
*/
private static boolean matchesKeywordAt(final String upper, final int pos, final String keyword) {
if (pos + keyword.length() > upper.length())
return false;
if (!upper.startsWith(keyword, pos))
return false;
// Check word boundary before
if (pos > 0 && Character.isLetterOrDigit(upper.charAt(pos - 1)))
if (pos > 0 && isCypherIdentifierChar(upper.charAt(pos - 1)))
return false;
// Check word boundary after
final int end = pos + keyword.length();
if (end < upper.length() && Character.isLetterOrDigit(upper.charAt(end)))
if (end < upper.length() && isCypherIdentifierChar(upper.charAt(end)))

Check notice on line 216 in engine/src/main/java/com/arcadedb/query/opencypher/ast/ExistsExpression.java

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

engine/src/main/java/com/arcadedb/query/opencypher/ast/ExistsExpression.java#L216

Avoid unnecessary if..then..else statements when returning booleans
return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The word boundary check for Cypher keywords is still incomplete. While adding the underscore fixes the reported issue for relationship types like WORKS_WITH, the logic will still incorrectly match keywords when they are part of a label (e.g., :WITH), a property access (e.g., .WITH), or a parameter (e.g., $WITH).

To be more robust, the boundary check should also consider characters like :, ., and $ as part of a token, which should prevent a keyword match.

    if (pos > 0) {
      final char c = upper.charAt(pos - 1);
      if (isCypherIdentifierChar(c) || c == ':' || c == '.' || c == '$')
        return false;
    }
    // Check word boundary after
    final int end = pos + keyword.length();
    if (end < upper.length()) {
      final char c = upper.charAt(end);
      if (isCypherIdentifierChar(c) || c == ':' || c == '.' || c == '$')
        return false;
    }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b0dad27. Added ':' '.' '$' to the boundary checks in matchesKeywordAt alongside the existing isCypherIdentifierChar check, so [:WITH], n.with, and $with are no longer misidentified as standalone clause keywords.

return true;
}

private static boolean isCypherIdentifierChar(final char c) {
return Character.isLetterOrDigit(c) || c == '_';
}
Comment on lines +228 to +230

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new helper method is useful for identifier boundary detection. It should also be applied to the variableUsedInSubquery method (specifically at lines 104 and 108) to ensure that variables containing underscores are correctly identified and to prevent partial matches (e.g., matching var inside var_2).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b0dad27. variableUsedInSubquery now uses isCypherIdentifierChar (which includes '_') at both boundary positions, so outer variable 'p' no longer false-matches inside 'p_node' or similar names.


@Override
public boolean isAggregation() {
return false;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
/*
* Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com)
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com)
* SPDX-License-Identifier: Apache-2.0
*/
package com.arcadedb.query.opencypher;

import com.arcadedb.database.Database;
import com.arcadedb.database.DatabaseFactory;
import com.arcadedb.graph.MutableVertex;
import com.arcadedb.query.sql.executor.Result;
import com.arcadedb.query.sql.executor.ResultSet;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;

import java.util.HashMap;
import java.util.Map;

import static org.assertj.core.api.Assertions.assertThat;

/**
* Regression test for issue #3952: EXISTS { MATCH (p)-[:WORKS_WITH]->() } returned false for all rows because
* the word-boundary check in ExistsExpression.matchesKeywordAt() did not treat underscore as an identifier
* character, so "WITH" inside "WORKS_WITH" was falsely detected as the Cypher WITH clause keyword,
* corrupting the injected subquery and causing a silently-caught exception.
*/
class CypherExistsUnderscoreRelationshipTypeTest {
private Database database;

@BeforeEach
void setup() {
database = new DatabaseFactory("./target/databases/cypherexistsunderscore").create();

database.transaction(() -> {
database.getSchema().createVertexType("Person");
database.getSchema().createEdgeType("WORKS_WITH");
database.getSchema().createEdgeType("KNOWS");
database.getSchema().createEdgeType("KNOWS_WHERE");

final MutableVertex alice = database.newVertex("Person").set("name", "Alice").save();
final MutableVertex bob = database.newVertex("Person").set("name", "Bob").save();
final MutableVertex charlie = database.newVertex("Person").set("name", "Charlie").save();
final MutableVertex david = database.newVertex("Person").set("name", "David").save();

// Alice -[WORKS_WITH]-> Bob (type name embeds keyword "WITH")
alice.newEdge("WORKS_WITH", bob, new Object[0]).save();
// Charlie -[KNOWS]-> David (simple type, control)
charlie.newEdge("KNOWS", david, new Object[0]).save();
// Alice -[KNOWS_WHERE]-> Charlie (type name embeds keyword "WHERE")
alice.newEdge("KNOWS_WHERE", charlie, new Object[0]).save();
});
}

@AfterEach
void teardown() {
if (database != null)
database.drop();
}

/**
* Regression for issue #3952: relationship type containing "_WITH" (a Cypher keyword after underscore)
* must not break the EXISTS { MATCH ... } subquery injection.
*/
@Test
void existsWithUnderscoreKeywordWithInRelationshipType() {
final ResultSet results = database.query("opencypher", """
MATCH (p:Person)
RETURN p.name AS person,
EXISTS { MATCH (p)-[:WORKS_WITH]->(:Person) } AS worksWith
ORDER BY person""");

final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "worksWith");

assertThat(actual).containsEntry("Alice", true);
assertThat(actual).containsEntry("Bob", false);
assertThat(actual).containsEntry("Charlie", false);
assertThat(actual).containsEntry("David", false);
}

/**
* EXISTS with relationship type "KNOWS_WHERE" whose name embeds the keyword "WHERE".
*/
@Test
void existsWithUnderscoreKeywordWhereInRelationshipType() {
final ResultSet results = database.query("opencypher", """
MATCH (p:Person)
RETURN p.name AS person,
EXISTS { MATCH (p)-[:KNOWS_WHERE]->(:Person) } AS knowsWhere
ORDER BY person""");

final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "knowsWhere");

assertThat(actual).containsEntry("Alice", true);
assertThat(actual).containsEntry("Bob", false);
assertThat(actual).containsEntry("Charlie", false);
assertThat(actual).containsEntry("David", false);
}

/**
* Control: EXISTS with a simple relationship type (no embedded keyword) continues to work correctly.
*/
@Test
void existsWithSimpleRelationshipTypeStillWorks() {
final ResultSet results = database.query("opencypher", """
MATCH (p:Person)
RETURN p.name AS person,
EXISTS { MATCH (p)-[:KNOWS]->(:Person) } AS knows
ORDER BY person""");

final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "knows");

assertThat(actual).containsEntry("Alice", false);
assertThat(actual).containsEntry("Bob", false);
assertThat(actual).containsEntry("Charlie", true);
assertThat(actual).containsEntry("David", false);
}

private static Map<String, Boolean> collectBooleanColumn(final ResultSet results, final String keyCol,
final String valueCol) {
final Map<String, Boolean> map = new HashMap<>();
while (results.hasNext()) {
final Result row = results.next();
map.put(row.getProperty(keyCol), row.getProperty(valueCol));
}
results.close();
return map;
}
}
Loading