Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Nov 26, 2025

Description

Current patterns uses wrong buffer_limit parameter value passed from max_sample_count. Also, this PR fixes the logic of returning agg result of logPatternAggFunction.

Related Issues

Resolves #4866

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for max_sample_count parameter to limit pattern search results.
  • Bug Fixes

    • Improved null-safety checks in pattern parser to prevent errors with invalid aggregate data.
    • Enhanced pattern buffer handling to properly track both patterns and log entries.
  • Tests

    • Added integration test validating pattern search with sample count limits.
    • Expanded test coverage for pattern aggregation with all parameters.

✏️ Tip: You can customize this high-level summary in your review settings.

@opensearch-project opensearch-project deleted a comment from coderabbitai bot Nov 26, 2025
@opensearch-project opensearch-project deleted a comment from coderabbitai bot Nov 26, 2025
@peterzhuamazon
Copy link
Member

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

This change fixes two bugs in the patterns command: incorrect parameter mapping where max_sample_count was passed to buffer_limit, and faulty aggregation result termination logic that returned empty results when the buffer size was evenly divisible by row count. Updates introduce proper buffer metric exposure and corrected null-checking logic.

Changes

Cohort / File(s) Summary
Aggregation Logic Fixes
core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java
Modified result() method to return null only when both patternGroupMap and log message buffer are empty (instead of only checking patternGroupMap). Added size() and logSize() public methods to expose pattern group count and log message count. Updated buffer-limit check to use logSize() instead of size().
Null-Check Enhancement
core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java
Added null-check for aggObject in guard condition alongside existing field blank check to prevent null pointer access.
Parameter Mapping Fix
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Changed default value mapping from "max_sample_count" to "buffer_limit" for PATTERN_BUFFER_LIMIT setting, correcting parameter name resolution.
Test Coverage
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4866.yml
New integration test validating patterns command with max_sample_count=2 and variable_count_threshold=3, verifying correct result structure and sample log collection.
Integration Test Updates
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java
Updated expected tokenization patterns in testBrainLabelMode_NotShowNumberedToken and testBrainLabelMode_ShowNumberedToken to reflect new token mapping and header structure.
Unit Test Addition
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLPatternsTest.java
Added test method testPatternsAggregationMode_SpecifyAllParameters_ForBrainMethod() validating Brain method with all parameters including max_sample_count, buffer_limit, show_numbered_token, variable_count_threshold, and frequency_threshold_percentage.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser as AstBuilder
    participant Agg as LogPatternAggFunction
    participant Buffer as LogParserAccumulator

    User->>Parser: patterns cmd with max_sample_count=2
    Note over Parser: BEFORE: maps to "max_sample_count"<br/>AFTER: maps to "buffer_limit"
    Parser->>Agg: add() with buffer limit check

    Agg->>Buffer: Check logSize() for limit
    Note over Buffer: BEFORE: check size() (patterns)<br/>AFTER: check logSize() (raw logs)
    Buffer-->>Agg: Buffer status

    Note over Agg: Accumulate patterns & logs

    Agg->>Agg: result() - termination logic
    Note over Agg: BEFORE: null if size() == 0<br/>AFTER: null if size() == 0 AND logSize() == 0
    Agg-->>User: Return patterns + samples
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • LogPatternAggFunction.java: Requires careful review of aggregation termination logic change and buffer metric semantics to ensure fix resolves issue #4866 without introducing regressions
  • AstBuilder.java parameter mapping: Verify that changing "max_sample_count" to "buffer_limit" correctly resolves parameter confusion without breaking existing command parsing
  • Integration test updates: Token mapping changes in CalcitePPLPatternsIT require validation that new tokenization scheme matches implementation intent
  • Duplicate method addition: CalcitePPLPatternsTest appears to have the new test method added twice—verify whether this is intentional or requires consolidation

Poem

🐰 The patterns once stumbled with mixed-up names,
And buffers miscleared in confusing frames,
But now we expose the true metrics so bright,
Both patterns and logs checked just right! ✨
The command hops forward with sample-count true!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the main fixes: correcting parameter mapping and return result logic in LogPatternAggFunction, which aligns with the core changes in the changeset.
Linked Issues check ✅ Passed The PR addresses both issues from #4866: fixes buffer_limit parameter passing [AstBuilder.java] and corrects the result logic to return non-null when logs exist [LogPatternAggFunction.java, PatternParserFunctionImpl.java].
Out of Scope Changes check ✅ Passed Test updates in CalcitePPLPatternsIT.java and CalcitePPLPatternsTest.java are validating the bug fixes, while the new YAML test validates the max_sample_count functionality; all align with the stated objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and fbb5611.

📒 Files selected for processing (6)
  • core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java (3 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java (2 hunks)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4866.yml (1 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLPatternsTest.java (1 hunks)
🔇 Additional comments (9)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)

964-965: LGTM! Correct fix for the parameter mapping bug.

The option key was incorrectly set to "max_sample_count" which caused buffer_limit to receive the wrong default value. Changing it to "buffer_limit" ensures the correct option is looked up from user-provided command options.

core/src/main/java/org/opensearch/sql/expression/function/PatternParserFunctionImpl.java (1)

108-110: LGTM! Proper null-safety guard added.

Adding the aggObject == null check prevents NPE when LogPatternAggFunction.result() returns null (which now occurs only when both pattern groups and log messages are empty). This defensive guard ensures graceful handling with EMPTY_RESULT.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/LogPatternAggFunction.java (3)

38-41: LGTM! Correct fix for the empty result bug.

The previous logic returned null when acc.size() == 0, but this was incorrect because:

  1. When buffer size evenly divides row count, partialMerge() and clearBuffer() are called
  2. The accumulated patterns exist in patternGroupMap, but logMessages is empty
  3. The old check only looked at patternGroupMap.size(), missing the case where patterns were already merged

Now correctly returns null only when both the pattern groups AND log messages are empty.


92-96: LGTM! Correct fix for buffer limit semantics.

The buffer limit should control when to trigger partial merging based on the raw log messages buffer size (logSize()), not the number of pattern groups (size()). This aligns with the parameter's intended meaning of limiting memory usage from buffered log entries.


154-160: LGTM! Clean API additions to expose accumulator state.

The size() and logSize() methods provide clear semantics for accessing the pattern groups count and log messages buffer size respectively. These are necessary for the corrected logic in result() and add().

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLPatternsTest.java (1)

329-360: LGTM! Good test coverage for the parameter mapping fix.

This test validates that all pattern parameters are correctly passed and ordered:

  • max_sample_count=2 → position 1
  • buffer_limit=1000 → position 2 (the fixed parameter)
  • show_numbered_token=false → position 3
  • frequency_threshold_percentage=0.1 → position 4
  • variable_count_threshold=3 → position 5

The expected Spark SQL pattern(\ENAME`, 2, 1000, FALSE, 0.1, 3)` confirms the fix is working correctly.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4866.yml (1)

1-65: LGTM! Good regression test for issue #4866.

This integration test:

  1. Creates an index with 4 documents (even count, which could trigger the divisibility bug)
  2. Uses max_sample_count=2 to limit samples per pattern
  3. Validates that 2 pattern groups are returned with correct counts and sample logs

This directly tests the scenario from issue #4866 where patterns with certain row counts previously returned empty results.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java (2)

233-234: Test expectation updates for updated tokenization behavior.

The expected pattern changed to use <*> for the "BLOCK*" prefix, reflecting updated tokenization semantics. The rest of the pattern uses appropriate variable placeholders for dynamic segments.


270-287: LGTM! Token mappings updated consistently.

The token map expectations are updated to reflect the new tokenization scheme where:

  • <token1> = "BLOCK*" (now treated as variable)
  • <token2> through <token7> = remaining dynamic segments in order

The updated expectations align with the pattern string changes and maintain consistency across the test assertions.


Comment @coderabbitai help to get the list of available commands and usage tips.

@qianheng-aws qianheng-aws merged commit 885230f into opensearch-project:main Nov 27, 2025
39 of 43 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 27, 2025
…#4868)

Signed-off-by: Songkan Tang <[email protected]>
(cherry picked from commit 885230f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
LantaoJin pushed a commit that referenced this pull request Nov 27, 2025
…#4868) (#4874)

(cherry picked from commit 885230f)

Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
@songkant-aws songkant-aws deleted the patterns-buffer-limit-fix branch December 11, 2025 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Patterns command returns empty result with wrong parameter

6 participants