-
-
Notifications
You must be signed in to change notification settings - Fork 116
#3952 fix: Cypher -> EXISTS { MATCH (p)-[:WORKS_WITH]->() } returns false when relationship type embeds a Cypher keyword after underscore #3962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Fix #3952: EXISTS { MATCH } subquery returns false for relationship types containing Cypher keyword fragments | ||
|
|
||
| ## Summary | ||
|
|
||
| `EXISTS { MATCH (p)-[:WORKS_WITH]->(:Person) }` returned `false` for all rows even when the pattern matched, | ||
| while the equivalent `WHERE (p)-[:WORKS_WITH]->()` predicate worked correctly. | ||
|
|
||
| ## Root Cause | ||
|
|
||
| `ExistsExpression.matchesKeywordAt()` used `Character.isLetterOrDigit()` to detect word boundaries when | ||
| scanning the subquery string for Cypher clause keywords (WITH, WHERE, RETURN, ...). | ||
|
|
||
| Because underscore `_` is NOT a letter or digit in Java, the check incorrectly accepted it as a word | ||
| boundary. This caused "WITH" inside "WORKS_WITH" to be falsely recognised as the Cypher `WITH` clause | ||
| keyword, making `injectWhereConditions()` split the relationship type name and produce an invalid query | ||
| such as: | ||
|
|
||
| ``` | ||
| MATCH (p), (p)-[:WORKS_WHERE id(p) = $__exists_p WITH]->(:Person) | ||
| ``` | ||
|
|
||
| The invalid query threw an exception that was silently caught in `evaluate()`, which returned `false`. | ||
|
|
||
| The bug affected any relationship type whose name ends with a Cypher keyword after an underscore: | ||
| `_WITH`, `_WHERE`, `_RETURN`, `_ORDER`, `_SKIP`, `_LIMIT`, `_UNION`. | ||
|
|
||
| ## Fix | ||
|
|
||
| `engine/src/main/java/com/arcadedb/query/opencypher/ast/ExistsExpression.java` | ||
|
|
||
| Replaced `Character.isLetterOrDigit(c)` in the boundary checks of `matchesKeywordAt()` with a new helper | ||
| `isCypherIdentifierChar(c)` that also returns `true` for `_`, matching Cypher identifier rules. | ||
|
|
||
| ## Tests | ||
|
|
||
| New test class: `engine/src/test/java/com/arcadedb/query/opencypher/CypherExistsUnderscoreRelationshipTypeTest.java` | ||
|
|
||
| - `existsWithUnderscoreKeywordWithInRelationshipType` - reproduces issue #3952 (WORKS_WITH embeds "WITH") | ||
| - `existsWithUnderscoreKeywordWhereInRelationshipType` - KNOWS_WHERE embeds "WHERE" | ||
| - `existsWithSimpleRelationshipTypeStillWorks` - control: KNOWS (no embedded keyword) still works | ||
|
|
||
| All 3 new tests pass. All 5835 existing Cypher/OpenCypher tests continue to pass. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -200,22 +200,28 @@ | |
| /** | ||
| * Checks if the uppercase query string has a keyword at the given position, | ||
| * ensuring it's a word boundary (not part of a longer identifier). | ||
| * Underscore is treated as an identifier character so that relationship type names | ||
| * such as WORKS_WITH are not falsely split at embedded keyword fragments (e.g. "WITH"). | ||
| */ | ||
| private static boolean matchesKeywordAt(final String upper, final int pos, final String keyword) { | ||
| if (pos + keyword.length() > upper.length()) | ||
| return false; | ||
| if (!upper.startsWith(keyword, pos)) | ||
| return false; | ||
| // Check word boundary before | ||
| if (pos > 0 && Character.isLetterOrDigit(upper.charAt(pos - 1))) | ||
| if (pos > 0 && isCypherIdentifierChar(upper.charAt(pos - 1))) | ||
| return false; | ||
| // Check word boundary after | ||
| final int end = pos + keyword.length(); | ||
| if (end < upper.length() && Character.isLetterOrDigit(upper.charAt(end))) | ||
| if (end < upper.length() && isCypherIdentifierChar(upper.charAt(end))) | ||
|
Check notice on line 216 in engine/src/main/java/com/arcadedb/query/opencypher/ast/ExistsExpression.java
|
||
| return false; | ||
| return true; | ||
| } | ||
|
|
||
| private static boolean isCypherIdentifierChar(final char c) { | ||
| return Character.isLetterOrDigit(c) || c == '_'; | ||
| } | ||
|
Comment on lines
+228
to
+230
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new helper method is useful for identifier boundary detection. It should also be applied to the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in commit b0dad27. variableUsedInSubquery now uses isCypherIdentifierChar (which includes '_') at both boundary positions, so outer variable 'p' no longer false-matches inside 'p_node' or similar names. |
||
|
|
||
| @Override | ||
| public boolean isAggregation() { | ||
| return false; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| /* | ||
| * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| * | ||
| * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
| package com.arcadedb.query.opencypher; | ||
|
|
||
| import com.arcadedb.database.Database; | ||
| import com.arcadedb.database.DatabaseFactory; | ||
| import com.arcadedb.graph.MutableVertex; | ||
| import com.arcadedb.query.sql.executor.Result; | ||
| import com.arcadedb.query.sql.executor.ResultSet; | ||
| import org.junit.jupiter.api.AfterEach; | ||
| import org.junit.jupiter.api.BeforeEach; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| import java.util.HashMap; | ||
| import java.util.Map; | ||
|
|
||
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| /** | ||
| * Regression test for issue #3952: EXISTS { MATCH (p)-[:WORKS_WITH]->() } returned false for all rows because | ||
| * the word-boundary check in ExistsExpression.matchesKeywordAt() did not treat underscore as an identifier | ||
| * character, so "WITH" inside "WORKS_WITH" was falsely detected as the Cypher WITH clause keyword, | ||
| * corrupting the injected subquery and causing a silently-caught exception. | ||
| */ | ||
| class CypherExistsUnderscoreRelationshipTypeTest { | ||
| private Database database; | ||
|
|
||
| @BeforeEach | ||
| void setup() { | ||
| database = new DatabaseFactory("./target/databases/cypherexistsunderscore").create(); | ||
|
|
||
| database.transaction(() -> { | ||
| database.getSchema().createVertexType("Person"); | ||
| database.getSchema().createEdgeType("WORKS_WITH"); | ||
| database.getSchema().createEdgeType("KNOWS"); | ||
| database.getSchema().createEdgeType("KNOWS_WHERE"); | ||
|
|
||
| final MutableVertex alice = database.newVertex("Person").set("name", "Alice").save(); | ||
| final MutableVertex bob = database.newVertex("Person").set("name", "Bob").save(); | ||
| final MutableVertex charlie = database.newVertex("Person").set("name", "Charlie").save(); | ||
| final MutableVertex david = database.newVertex("Person").set("name", "David").save(); | ||
|
|
||
| // Alice -[WORKS_WITH]-> Bob (type name embeds keyword "WITH") | ||
| alice.newEdge("WORKS_WITH", bob, new Object[0]).save(); | ||
| // Charlie -[KNOWS]-> David (simple type, control) | ||
| charlie.newEdge("KNOWS", david, new Object[0]).save(); | ||
| // Alice -[KNOWS_WHERE]-> Charlie (type name embeds keyword "WHERE") | ||
| alice.newEdge("KNOWS_WHERE", charlie, new Object[0]).save(); | ||
| }); | ||
| } | ||
|
|
||
| @AfterEach | ||
| void teardown() { | ||
| if (database != null) | ||
| database.drop(); | ||
| } | ||
|
|
||
| /** | ||
| * Regression for issue #3952: relationship type containing "_WITH" (a Cypher keyword after underscore) | ||
| * must not break the EXISTS { MATCH ... } subquery injection. | ||
| */ | ||
| @Test | ||
| void existsWithUnderscoreKeywordWithInRelationshipType() { | ||
| final ResultSet results = database.query("opencypher", """ | ||
| MATCH (p:Person) | ||
| RETURN p.name AS person, | ||
| EXISTS { MATCH (p)-[:WORKS_WITH]->(:Person) } AS worksWith | ||
| ORDER BY person"""); | ||
|
|
||
| final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "worksWith"); | ||
|
|
||
| assertThat(actual).containsEntry("Alice", true); | ||
| assertThat(actual).containsEntry("Bob", false); | ||
| assertThat(actual).containsEntry("Charlie", false); | ||
| assertThat(actual).containsEntry("David", false); | ||
| } | ||
|
|
||
| /** | ||
| * EXISTS with relationship type "KNOWS_WHERE" whose name embeds the keyword "WHERE". | ||
| */ | ||
| @Test | ||
| void existsWithUnderscoreKeywordWhereInRelationshipType() { | ||
| final ResultSet results = database.query("opencypher", """ | ||
| MATCH (p:Person) | ||
| RETURN p.name AS person, | ||
| EXISTS { MATCH (p)-[:KNOWS_WHERE]->(:Person) } AS knowsWhere | ||
| ORDER BY person"""); | ||
|
|
||
| final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "knowsWhere"); | ||
|
|
||
| assertThat(actual).containsEntry("Alice", true); | ||
| assertThat(actual).containsEntry("Bob", false); | ||
| assertThat(actual).containsEntry("Charlie", false); | ||
| assertThat(actual).containsEntry("David", false); | ||
| } | ||
|
|
||
| /** | ||
| * Control: EXISTS with a simple relationship type (no embedded keyword) continues to work correctly. | ||
| */ | ||
| @Test | ||
| void existsWithSimpleRelationshipTypeStillWorks() { | ||
| final ResultSet results = database.query("opencypher", """ | ||
| MATCH (p:Person) | ||
| RETURN p.name AS person, | ||
| EXISTS { MATCH (p)-[:KNOWS]->(:Person) } AS knows | ||
| ORDER BY person"""); | ||
|
|
||
| final Map<String, Boolean> actual = collectBooleanColumn(results, "person", "knows"); | ||
|
|
||
| assertThat(actual).containsEntry("Alice", false); | ||
| assertThat(actual).containsEntry("Bob", false); | ||
| assertThat(actual).containsEntry("Charlie", true); | ||
| assertThat(actual).containsEntry("David", false); | ||
| } | ||
|
|
||
| private static Map<String, Boolean> collectBooleanColumn(final ResultSet results, final String keyCol, | ||
| final String valueCol) { | ||
| final Map<String, Boolean> map = new HashMap<>(); | ||
| while (results.hasNext()) { | ||
| final Result row = results.next(); | ||
| map.put(row.getProperty(keyCol), row.getProperty(valueCol)); | ||
| } | ||
| results.close(); | ||
| return map; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word boundary check for Cypher keywords is still incomplete. While adding the underscore fixes the reported issue for relationship types like
WORKS_WITH, the logic will still incorrectly match keywords when they are part of a label (e.g.,:WITH), a property access (e.g.,.WITH), or a parameter (e.g.,$WITH).To be more robust, the boundary check should also consider characters like
:,., and$as part of a token, which should prevent a keyword match.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit b0dad27. Added ':' '.' '$' to the boundary checks in matchesKeywordAt alongside the existing isCypherIdentifierChar check, so [:WITH], n.with, and $with are no longer misidentified as standalone clause keywords.