Skip to content

Conversation

@RyanL1997
Copy link
Collaborator

@RyanL1997 RyanL1997 commented Aug 20, 2025

Description

Implementation of regex Command In PPL

Details:

  • Core logic of regex
<source> | [commands] | regex <field>=<pattern> | [commands]
<source> | [commands] | regex <field>!=<pattern> | [commands]  
  • field: The field name to apply regex matching against
  • pattern: Java regex pattern string (supports standard regex metacharacters)
  • !=: Negated matching - returns records that do NOT match the pattern
  • =: Positive matching - returns records that match the pattern

For the handling of default field (_source / _raw):

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@RyanL1997 RyanL1997 added PPL Piped processing language feature calcite calcite migration releated backport 2.19-dev labels Aug 20, 2025
@RyanL1997 RyanL1997 force-pushed the regex-cmd-java-version branch from 0ae9a4d to cb2fbc7 Compare August 20, 2025 08:38
@RyanL1997 RyanL1997 changed the title [WIP][Feature] Implementation of regex Command In PPL [Feature] Implementation of regex Command In PPL Aug 20, 2025
@RyanL1997 RyanL1997 marked this pull request as ready for review August 20, 2025 16:53
@RyanL1997 RyanL1997 force-pushed the regex-cmd-java-version branch from a9e65c7 to 5723a9f Compare August 29, 2025 02:19
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Jialiang Liang <[email protected]>
dai-chen
dai-chen previously approved these changes Aug 29, 2025
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Signed-off-by: Jialiang Liang <[email protected]>
@Swiddis Swiddis merged commit 9b88f23 into opensearch-project:main Aug 30, 2025
23 checks passed
@github-project-automation github-project-automation bot moved this to Done in PPL 2025 Aug 30, 2025
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4083-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9b88f23f343c774a1b8c57f80b6e2943cb602f4f
# Push it to GitHub
git push --set-upstream origin backport/backport-4083-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4083-to-2.19-dev.

@RyanL1997
Copy link
Collaborator Author

Will manual backport it.

ykmr1224 pushed a commit to ykmr1224/sql that referenced this pull request Sep 2, 2025
…t#4083)

* implement regex cmd with calcite support by suing java library

Signed-off-by: Jialiang Liang <[email protected]>

* code hygiene fix

Signed-off-by: Jialiang Liang <[email protected]>

* comment clean up

Signed-off-by: Jialiang Liang <[email protected]>

* implement explain it

Signed-off-by: Jialiang Liang <[email protected]>

* disable regex when calcite is disable and add a test in analyzer

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless check

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] refactor some regex fn into a util class for re-usage

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] revert filter query builder cuz we do not need it anymore

Signed-off-by: Jialiang Liang <[email protected]>

* add rst docs for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for calcite no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* fix regex exp behavior for non string val

Signed-off-by: Jialiang Liang <[email protected]>

* style - remove some verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove string convertion

Signed-off-by: Jialiang Liang <[email protected]>

* use existing operator of REGEXP_CONTAINS

Signed-off-by: Jialiang Liang <[email protected]>

* fix integ test of rgex with pushdown after operator commit

Signed-off-by: Jialiang Liang <[email protected]>

* remove some verbose comments and fix some style

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain it in no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* comment - remove unused fn for string converting

Signed-off-by: Jialiang Liang <[email protected]>

* remove duplicated regex match operator alias

Signed-off-by: Jialiang Liang <[email protected]>

* unit test - initail commit

Signed-off-by: Jialiang Liang <[email protected]>

* anonymizer with test

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotlessApply

Signed-off-by: Jialiang Liang <[email protected]>

* add cross cluster IT

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless apply

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix operator constant

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix regex java doc

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - field and pattern handling fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix LRUCache

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - remove unnecessary delegation layer

Signed-off-by: Jialiang Liang <[email protected]>

* rst doc fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix comments

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD related change

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD - fix anonymizer tests

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - add unit test for regex util class

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove code for legacy engine

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove stalled logic for spcified field

Signed-off-by: Jialiang Liang <[email protected]>

* chen - merge into 1 grammar in parser

Signed-off-by: Jialiang Liang <[email protected]>

* properly handle non-string field

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* address commetns

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc test for regex

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc

Signed-off-by: Jialiang Liang <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4083-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9b88f23f343c774a1b8c57f80b6e2943cb602f4f
# Push it to GitHub
git push --set-upstream origin backport/backport-4083-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4083-to-2.19-dev.

@ykmr1224
Copy link
Collaborator

ykmr1224 commented Sep 3, 2025

Can you do manual backport or see what is conflicting?
It is blocking #4214

RyanL1997 added a commit to RyanL1997/sql that referenced this pull request Sep 3, 2025
…t#4083)

* implement regex cmd with calcite support by suing java library

Signed-off-by: Jialiang Liang <[email protected]>

* code hygiene fix

Signed-off-by: Jialiang Liang <[email protected]>

* comment clean up

Signed-off-by: Jialiang Liang <[email protected]>

* implement explain it

Signed-off-by: Jialiang Liang <[email protected]>

* disable regex when calcite is disable and add a test in analyzer

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless check

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] refactor some regex fn into a util class for re-usage

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] revert filter query builder cuz we do not need it anymore

Signed-off-by: Jialiang Liang <[email protected]>

* add rst docs for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for calcite no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* fix regex exp behavior for non string val

Signed-off-by: Jialiang Liang <[email protected]>

* style - remove some verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove string convertion

Signed-off-by: Jialiang Liang <[email protected]>

* use existing operator of REGEXP_CONTAINS

Signed-off-by: Jialiang Liang <[email protected]>

* fix integ test of rgex with pushdown after operator commit

Signed-off-by: Jialiang Liang <[email protected]>

* remove some verbose comments and fix some style

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain it in no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* comment - remove unused fn for string converting

Signed-off-by: Jialiang Liang <[email protected]>

* remove duplicated regex match operator alias

Signed-off-by: Jialiang Liang <[email protected]>

* unit test - initail commit

Signed-off-by: Jialiang Liang <[email protected]>

* anonymizer with test

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotlessApply

Signed-off-by: Jialiang Liang <[email protected]>

* add cross cluster IT

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless apply

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix operator constant

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix regex java doc

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - field and pattern handling fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix LRUCache

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - remove unnecessary delegation layer

Signed-off-by: Jialiang Liang <[email protected]>

* rst doc fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix comments

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD related change

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD - fix anonymizer tests

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - add unit test for regex util class

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove code for legacy engine

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove stalled logic for spcified field

Signed-off-by: Jialiang Liang <[email protected]>

* chen - merge into 1 grammar in parser

Signed-off-by: Jialiang Liang <[email protected]>

* properly handle non-string field

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* address commetns

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc test for regex

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc

Signed-off-by: Jialiang Liang <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>
ykmr1224 added a commit that referenced this pull request Sep 3, 2025
* [Feature] Implementation of `regex` Command In PPL (#4083)

* implement regex cmd with calcite support by suing java library

Signed-off-by: Jialiang Liang <[email protected]>

* code hygiene fix

Signed-off-by: Jialiang Liang <[email protected]>

* comment clean up

Signed-off-by: Jialiang Liang <[email protected]>

* implement explain it

Signed-off-by: Jialiang Liang <[email protected]>

* disable regex when calcite is disable and add a test in analyzer

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless check

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] refactor some regex fn into a util class for re-usage

Signed-off-by: Jialiang Liang <[email protected]>

* [refactor] revert filter query builder cuz we do not need it anymore

Signed-off-by: Jialiang Liang <[email protected]>

* add rst docs for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for regex cmd

Signed-off-by: Jialiang Liang <[email protected]>

* add IT for calcite no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* fix regex exp behavior for non string val

Signed-off-by: Jialiang Liang <[email protected]>

* style - remove some verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove string convertion

Signed-off-by: Jialiang Liang <[email protected]>

* use existing operator of REGEXP_CONTAINS

Signed-off-by: Jialiang Liang <[email protected]>

* fix integ test of rgex with pushdown after operator commit

Signed-off-by: Jialiang Liang <[email protected]>

* remove some verbose comments and fix some style

Signed-off-by: Jialiang Liang <[email protected]>

* fix explain it in no pushdown

Signed-off-by: Jialiang Liang <[email protected]>

* comment - remove unused fn for string converting

Signed-off-by: Jialiang Liang <[email protected]>

* remove duplicated regex match operator alias

Signed-off-by: Jialiang Liang <[email protected]>

* unit test - initail commit

Signed-off-by: Jialiang Liang <[email protected]>

* anonymizer with test

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotlessApply

Signed-off-by: Jialiang Liang <[email protected]>

* add cross cluster IT

Signed-off-by: Jialiang Liang <[email protected]>

* fix spotless apply

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix operator constant

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix regex java doc

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - field and pattern handling fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix LRUCache

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - remove unnecessary delegation layer

Signed-off-by: Jialiang Liang <[email protected]>

* rst doc fix

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - fix comments

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD related change

Signed-off-by: Jialiang Liang <[email protected]>

* DEFAULT FIELD - fix anonymizer tests

Signed-off-by: Jialiang Liang <[email protected]>

* tomo - add unit test for regex util class

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove code for legacy engine

Signed-off-by: Jialiang Liang <[email protected]>

* chen - remove stalled logic for spcified field

Signed-off-by: Jialiang Liang <[email protected]>

* chen - merge into 1 grammar in parser

Signed-off-by: Jialiang Liang <[email protected]>

* properly handle non-string field

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* remove verbose comments

Signed-off-by: Jialiang Liang <[email protected]>

* address commetns

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc test for regex

Signed-off-by: Jialiang Liang <[email protected]>

* fix doc

Signed-off-by: Jialiang Liang <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>

* Fix merge issue (#4214)

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix AnalyzerTest (#4215)

Signed-off-by: Tomoyuki Morita <[email protected]>

---------

Signed-off-by: Jialiang Liang <[email protected]>
Signed-off-by: Tomoyuki Morita <[email protected]>
Co-authored-by: Tomoyuki MORITA <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated feature PPL Piped processing language

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants