Skip to content

Conversation

@JinwooHwang
Copy link
Contributor

Summary

This PR resolves four nondeterminism warnings generated by ANTLR during the OQL (Object Query Language) grammar compilation process. These warnings indicated parser ambiguity that could lead to unpredictable parsing behavior.

Issue

Fixes GEODE-10508

Problem Description

During the generateGrammarSource task, ANTLR produced the following warnings:

> Task :geode-core:generateGrammarSource
/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:574:40: 
warning:nondeterminism between alts 1 and 2 of block upon
    k==1:"sum","avg","min","max","count"
    k==2:TOK_LPAREN

/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:578:13: 
warning:nondeterminism between alts 1 and 2 of block upon
    k==1:"sum","avg","min","max","count"
    k==2:TOK_LPAREN

/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:961:26: 
warning:nondeterminism between alts 1 and 2 of block upon
    k==1:"distinct"
    k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...

/home/runner/work/geode/geode/geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g:979:17: 
warning:nondeterminism between alts 1 and 2 of block upon
    k==1:"distinct"
    k==2:TOK_LPAREN,TOK_PLUS,TOK_MINUS,TOK_DOLLAR,...

Root Cause

Lines 574 & 578 (projection rule):

  • The parser could not distinguish between aggregateExpr and expr alternatives when encountering aggregate function keywords (sum, avg, min, max, count)
  • These keywords are valid both as:
    • Aggregate function identifiers: sum(field)
    • Regular identifiers in expressions: sum as a field name
  • Without lookahead, ANTLR could not deterministically choose which production rule to apply

Lines 961 & 979 (aggregateExpr rule):

  • The optional distinct keyword created ambiguity in aggregate function parsing
  • The parser could not decide whether to match the optional distinct keyword or skip it and proceed directly to the expression
  • Both paths were valid, but ANTLR's default behavior doesn't specify preference

Solution

1. Added Syntactic Predicates (Lines 574 & 578)

Added lookahead predicates to the projection rule:

// Before:
( tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})

// After:
( (("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> tok1:aggregateExpr{node = #tok1;} | tok2:expr{node = #tok2;})

Reasoning:
The predicate (("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=> instructs the parser to look ahead and check if an aggregate keyword is followed by a left parenthesis. If true, it chooses aggregateExpr; otherwise, it chooses expr. This provides explicit lookahead logic to resolve the ambiguity.

2. Added Greedy Option (Lines 961 & 979)

Added greedy option for optional distinct keywords:

// Before:
TOK_LPAREN ("distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN

// After:
TOK_LPAREN (options {greedy=true;}: "distinct"! {distinctOnly = true;} ) ? tokExpr1:expr TOK_RPAREN

Reasoning:
The greedy option tells the parser to greedily match the distinct keyword whenever it appears, rather than being ambiguous about whether to match or skip. This establishes clear matching priority and eliminates nondeterminism.

3. Updated Test to Use Token Constants

Modified AbstractCompiledValueTestJUnitTest.java:

// Before:
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2}, 89)

// After:
import org.apache.geode.cache.query.internal.parse.OQLLexerTokenTypes;
...
new CompiledJunction(new CompiledValue[] {compiledValue1, compiledValue2}, OQLLexerTokenTypes.LITERAL_or)

Reasoning:
Adding syntactic predicates changes ANTLR's token numbering in the generated lexer (LITERAL_or shifted from 89 to 94). Using the constant ensures test correctness regardless of future grammar changes. This is a best practice for maintaining test stability.

Changes Made

  • geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g

    • Added syntactic predicates to projection rule (2 locations)
    • Added greedy option to aggregateExpr rule (2 locations)
    • Added inline comments explaining the reasoning
  • geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java

    • Added import for OQLLexerTokenTypes
    • Replaced hardcoded token value with constant
    • Added comments explaining the change

Testing

Verification Steps

  • Grammar generation completes without warnings
  • All query parser tests pass
  • AbstractCompiledValueTestJUnitTest passes
  • No regressions in query functionality

Test Commands

./gradlew :geode-core:generateGrammarSource
./gradlew :geode-core:test --tests "org.apache.geode.cache.query.internal.parse.*"
./gradlew :geode-core:test --tests "org.apache.geode.cache.query.internal.Abstract*"

Impact Assessment

Category Impact
Risk Level Low - Changes only affect parser generation, not runtime behavior
Performance None - Modifications are compile-time only
Compatibility Fully maintained - No changes to OQL syntax or semantics
Backward Compatibility Yes - All existing queries work unchanged

Benefits

  • ✅ Zero nondeterminism warnings from ANTLR grammar generation
  • ✅ Improved parser determinism and predictability
  • ✅ Better maintainability with documented reasoning
  • ✅ More robust test suite using constants instead of magic numbers
  • ✅ No breaking changes or behavior modifications

Technical Details

  • Syntactic predicates (=>) are a standard ANTLR 2 feature for lookahead
  • Greedy option is a standard ANTLR feature for optional subrule disambiguation
  • Token constant usage follows best practices for generated code references
  • Changes are compile-time only with no runtime performance impact

Checklist

  • Changes are properly documented with inline comments
  • All tests pass
  • No nondeterminism warnings
  • Backward compatibility maintained
  • Commit message follows project guidelines
  • Code follows existing style and conventions

For all changes, please confirm:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
  • Has your PR been rebased against the latest commit within the target branch (typically develop)?
  • Is your initial contribution a single, squashed commit?
  • Does gradlew build run cleanly?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

This commit resolves four nondeterminism warnings generated by ANTLR during
the OQL grammar compilation process. These warnings indicated parser ambiguity
that could lead to unpredictable parsing behavior.

Problem Analysis:
-----------------
1. Lines 574 & 578 (projection rule):
   The parser could not distinguish between aggregateExpr and expr alternatives
   when encountering aggregate function keywords (sum, avg, min, max, count).
   These keywords are valid both as:
   - Aggregate function identifiers: sum(field)
   - Regular identifiers in expressions: sum as a field name

   Without lookahead, ANTLR could not deterministically choose which production
   rule to apply, resulting in nondeterminism warnings.

2. Lines 961 & 979 (aggregateExpr rule):
   Optional 'distinct' keyword created ambiguity in aggregate function parsing.
   The parser could not decide whether to:
   - Match the optional 'distinct' keyword, or
   - Skip it and proceed directly to the expression

   Both paths were valid, but ANTLR's default behavior doesn't specify
   preference, causing nondeterminism.

Solution Implemented:
--------------------
1. Added syntactic predicates to projection rule (lines 574, 578):
   Predicate: (('sum'|'avg'|'min'|'max'|'count') TOK_LPAREN)=>

   This instructs the parser to look ahead and check if an aggregate keyword
   is followed by a left parenthesis. If true, it chooses aggregateExpr;
   otherwise, it chooses expr. This resolves the ambiguity by providing
   explicit lookahead logic.

2. Added greedy option to aggregateExpr rule (lines 961, 979):
   Option: options {greedy=true;}

   This tells the parser to greedily match the 'distinct' keyword whenever
   it appears, rather than being ambiguous about whether to match or skip.
   The greedy option eliminates the nondeterminism by establishing clear
   matching priority.

3. Updated test to use token constants (AbstractCompiledValueTestJUnitTest):
   Changed: hardcoded value 89 -> OQLLexerTokenTypes.LITERAL_or

   Rationale: Adding syntactic predicates changes ANTLR's token numbering
   in the generated lexer (LITERAL_or shifted from 89 to 94). Using the
   constant ensures test correctness regardless of future grammar changes.
   This is a best practice for maintaining test stability.

Impact:
-------
- Zero nondeterminism warnings from ANTLR grammar generation
- No changes to OQL syntax or semantics (fully backward compatible)
- No runtime behavior changes (modifications only affect parser generation)
- All existing tests pass with updated token reference
- Improved parser determinism and maintainability

Technical Details:
-----------------
- Syntactic predicates (=>) are standard ANTLR 2 feature for lookahead
- Greedy option is standard ANTLR feature for optional subrule disambiguation
- Token constant usage follows best practices for generated code references
- Changes are compile-time only with no runtime performance impact

Files Modified:
--------------
- geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
- geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
Fix line length formatting for improved readability.
@JinwooHwang
Copy link
Contributor Author

Hi @sboorlagadda , all checks have passed. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant