Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Sep 29, 2025

Description

This PR will resolve one feature enhancement and a bug fix.

  1. Feature: Since 3.1.0, patterns command for brain algorithm outputs numbered token like <token1>.<token2>@<token3> instead of <*>.<*>@<*>. But it's not always wanted per customer request. So add a new flag option show_numbered_token=true|false in the patterns command to make it optional. By default, it will not show numbered token to be compatible with V2's format.
  2. Bug fix: In some cases, tokens are parsed incorrectly. The brain algorithm could generate pattern like: <*> failed to start for block<*><*><*>. It has continuous variable placeholder <*> but actually they can be merged into one. The parsing logic can't identify continuous variable placeholder. So at the final pattern generation part, we will merge continuous <*> placeholders.

Related Issues

Resolves #4364 , #4363 , #4366, #4362

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@LantaoJin LantaoJin added the bug Something isn't working label Sep 29, 2025
qianheng-aws
qianheng-aws previously approved these changes Sep 30, 2025
* mode: optional. Output mode: ``label`` (default) or ``aggregation``. The mode is configured by the setting ``plugins.ppl.pattern.mode``.
* max_sample_count: optional. Max sample logs returned per pattern in aggregation mode (default: 10). The max_sample_count is configured by the setting ``plugins.ppl.pattern.max.sample.count``.
* buffer_limit: optional. Safeguard parameter for ``brain`` algorithm to limit internal temporary buffer size (default: 100,000, min: 50,000). The buffer_limit is configured by the setting ``plugins.ppl.pattern.buffer.limit``.
* show_numbered_token: optional. The flag to turn on numbered token output format (default: false). The show_numbered_token is configured by the setting ``plugins.ppl.pattern.show.numbered.token``.
Copy link
Member

@LantaoJin LantaoJin Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems when version <3.1.0, the command does not show numbered tokens. 3.1.0 <= version < 3.3.0, this command shows numbered tokens by default. when version >=3.3.0, align with the behaviour of version <3.1.0, right?

Any thoughts to control the default value of show_numbered_token by plugins.ppl.syntax.legacy.preferred in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create the issue for it. Another suggestion from user is we could backport the change to older version. Not sure we need some minor version like 3.1.x.

New output format requires Calcite enabled.
PPL query::

PPL> source=apache | patterns message method=simple_pattern mode=aggregation | fields patterns_field, pattern_count | head 1 ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPL is meaning the query is not executable in doctest. Now almost of PPL rst files moved to calcite enabled queue since we have enabled calcite by default in main branch (3.3.0), please rebase and check if all queries can execute with calcite correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is controlled by docs/category.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed doctest after merging main. After Calcite is enabled by default for doctest, patterns command output has some format change in this fix. So I changed doctest for patterns.rst. It's expected change.

Grok and Parse doctest are actually not impacted.

@LantaoJin LantaoJin merged commit e251834 into opensearch-project:main Sep 30, 2025
35 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4402-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e2518341482d2ee94ae4cba9bc6b03c14c7385e4
# Push it to GitHub
git push --set-upstream origin backport/backport-4402-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4402-to-2.19-dev.

songkant-aws added a commit to songkant-aws/sql that referenced this pull request Sep 30, 2025
opensearch-project#4402)

* Fix patterns command incorrectly parsing token and add parameter option

Signed-off-by: Songkan Tang <[email protected]>

* Fix some cases after merge

Signed-off-by: Songkan Tang <[email protected]>

* Fix failed tests

Signed-off-by: Songkan Tang <[email protected]>

* Fix UTs and modify docs

Signed-off-by: Songkan Tang <[email protected]>

* Add Brain parameter explanation to the doc

Signed-off-by: Songkan Tang <[email protected]>

* Modify patterns doctest after calcite default engine enabled

Signed-off-by: Songkan Tang <[email protected]>

* Rephrase some words in patterns.rst

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
penghuo pushed a commit that referenced this pull request Sep 30, 2025
#4402) (#4414)

* Fix patterns command incorrectly parsing token and add parameter option



* Fix some cases after merge



* Fix failed tests



* Fix UTs and modify docs



* Add Brain parameter explanation to the doc



* Modify patterns doctest after calcite default engine enabled



* Rephrase some words in patterns.rst



---------

Signed-off-by: Songkan Tang <[email protected]>
@songkant-aws songkant-aws deleted the patterns-command-fix branch October 9, 2025 02:45
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Oct 9, 2025
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 10, 2025
* main-apple: (218 commits)
  Add ignorePrometheus Flag for integTest and docTest (opensearch-project#4442)
  Create fab-radar.yml
  PPL `fillnull` command enhancement (opensearch-project#4421)
  reverting to _doc + _id (opensearch-project#4435)
  Support `multisearch` command in calcite (opensearch-project#4332)
  Add 3.3 release notes (opensearch-project#4422) (opensearch-project#4423)
  [SQL/PPL] Fix the `count(*)` and `dc(field)` to be capped at MAX_INTEGER opensearch-project#4416 (opensearch-project#4418)
  Change the default search sort tiebreaker to `_shard_doc` for PIT search (opensearch-project#4378)
  [Enhancement] Add error handling for known limitation of sql `JOIN` (opensearch-project#4344)
  Bugfix: SQL type mapping for legacy JDBC output (opensearch-project#3613)
  Version bump: 3.3 (opensearch-project#4417)
  Add max/min eval functions (opensearch-project#4333)
  Support time modifiers in search command  (opensearch-project#4224)
  Fix numbered token bug and make it optional output in patterns command (opensearch-project#4402)
  refactor span (opensearch-project#4334)
  Move release notes categories (opensearch-project#3818)
  [Doc] Enable doctest with Calcite (opensearch-project#4379)
  Mod function should return decimal instead of float when handle the operands are decimal literal (opensearch-project#4407)
  Scale of decimal literal should always be positive in Calcite (opensearch-project#4401)
  Enable Calcite by default and implicit fallback the unsupported commands (opensearch-project#4372)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] [ENHANCEMENT] Allow users to specify if they want to see tokens or standard delimiters on calcite patterns command

3 participants