generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 180
[Feature] Implementation of regex Command In PPL
#4083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Swiddis
merged 44 commits into
opensearch-project:main
from
RyanL1997:regex-cmd-java-version
Aug 30, 2025
Merged
Changes from 43 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
e230341
implement regex cmd with calcite support by suing java library
RyanL1997 f215222
code hygiene fix
RyanL1997 64d7f7b
comment clean up
RyanL1997 b011b50
implement explain it
RyanL1997 39819f6
disable regex when calcite is disable and add a test in analyzer
RyanL1997 c3840d5
fix spotless check
RyanL1997 dbe6004
[refactor] refactor some regex fn into a util class for re-usage
RyanL1997 5667dcf
[refactor] revert filter query builder cuz we do not need it anymore
RyanL1997 2f91b40
add rst docs for regex cmd
RyanL1997 6698eb3
add IT for regex cmd
RyanL1997 f5fe7c2
add IT for calcite no pushdown
RyanL1997 dda1872
fix regex exp behavior for non string val
RyanL1997 64d5820
style - remove some verbose comments
RyanL1997 63c9e9d
remove string convertion
RyanL1997 cfd37a8
use existing operator of REGEXP_CONTAINS
RyanL1997 4aa435f
fix integ test of rgex with pushdown after operator commit
RyanL1997 470ccbc
remove some verbose comments and fix some style
RyanL1997 31b0c15
fix explain it in no pushdown
RyanL1997 8eea7e6
comment - remove unused fn for string converting
RyanL1997 281cb52
remove duplicated regex match operator alias
RyanL1997 b398932
unit test - initail commit
RyanL1997 f75ed98
anonymizer with test
RyanL1997 d1b4d81
fix spotlessApply
RyanL1997 88cce00
add cross cluster IT
RyanL1997 b6b1b9d
fix spotless apply
RyanL1997 0431c64
tomo - fix operator constant
RyanL1997 8203a63
tomo - fix regex java doc
RyanL1997 ed1380d
tomo - field and pattern handling fix
RyanL1997 c881a75
tomo - fix LRUCache
RyanL1997 34052b7
tomo - remove unnecessary delegation layer
RyanL1997 29d6afa
rst doc fix
RyanL1997 83594ae
tomo - fix comments
RyanL1997 fc92846
DEFAULT FIELD related change
RyanL1997 30f32f4
DEFAULT FIELD - fix anonymizer tests
RyanL1997 9fa8aaf
tomo - add unit test for regex util class
RyanL1997 9ee6818
chen - remove code for legacy engine
RyanL1997 d9e29a5
chen - remove stalled logic for spcified field
RyanL1997 5723a9f
chen - merge into 1 grammar in parser
RyanL1997 9061987
properly handle non-string field
RyanL1997 ee2f10c
remove verbose comments
RyanL1997 531600a
remove verbose comments
RyanL1997 64432e8
address commetns
RyanL1997 aebacdb
fix doc test for regex
RyanL1997 7b7f717
fix doc
RyanL1997 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| /* | ||
| * Copyright OpenSearch Contributors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package org.opensearch.sql.ast.tree; | ||
|
|
||
| import com.google.common.collect.ImmutableList; | ||
| import java.util.List; | ||
| import lombok.EqualsAndHashCode; | ||
| import lombok.Getter; | ||
| import lombok.Setter; | ||
| import lombok.ToString; | ||
| import org.opensearch.sql.ast.AbstractNodeVisitor; | ||
| import org.opensearch.sql.ast.expression.Literal; | ||
| import org.opensearch.sql.ast.expression.UnresolvedExpression; | ||
|
|
||
| @Getter | ||
| @ToString | ||
| @EqualsAndHashCode(callSuper = false) | ||
| public class Regex extends UnresolvedPlan { | ||
| public static final String EQUALS_OPERATOR = "="; | ||
|
|
||
| public static final String NOT_EQUALS_OPERATOR = "!="; | ||
|
|
||
| private final UnresolvedExpression field; | ||
|
|
||
| private final boolean negated; | ||
|
|
||
| private final Literal pattern; | ||
|
|
||
| @Setter private UnresolvedPlan child; | ||
|
|
||
| public Regex(UnresolvedExpression field, boolean negated, Literal pattern) { | ||
| this.field = field; | ||
| this.negated = negated; | ||
| this.pattern = pattern; | ||
| } | ||
|
|
||
| @Override | ||
| public Regex attach(UnresolvedPlan child) { | ||
| this.child = child; | ||
| return this; | ||
| } | ||
|
|
||
| @Override | ||
| public List<UnresolvedPlan> getChild() { | ||
| return this.child == null ? ImmutableList.of() : ImmutableList.of(this.child); | ||
| } | ||
|
|
||
| @Override | ||
| public <T, C> T accept(AbstractNodeVisitor<T, C> nodeVisitor, C context) { | ||
| return nodeVisitor.visitRegex(this, context); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
108 changes: 108 additions & 0 deletions
108
core/src/main/java/org/opensearch/sql/expression/parse/RegexCommonUtils.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| /* | ||
| * Copyright OpenSearch Contributors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package org.opensearch.sql.expression.parse; | ||
|
|
||
| import com.google.common.collect.ImmutableList; | ||
| import java.util.Collections; | ||
| import java.util.LinkedHashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.regex.Matcher; | ||
| import java.util.regex.Pattern; | ||
| import java.util.regex.PatternSyntaxException; | ||
|
|
||
| /** | ||
| * Common utilities for regex operations. Provides pattern caching and consistent matching behavior. | ||
| */ | ||
| public class RegexCommonUtils { | ||
|
|
||
| private static final Pattern NAMED_GROUP_PATTERN = | ||
| Pattern.compile("\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>"); | ||
|
|
||
| private static final int MAX_CACHE_SIZE = 1000; | ||
|
|
||
| private static final Map<String, Pattern> patternCache = | ||
| Collections.synchronizedMap( | ||
| new LinkedHashMap<>(MAX_CACHE_SIZE + 1, 0.75f, true) { | ||
| @Override | ||
| protected boolean removeEldestEntry(Map.Entry<String, Pattern> eldest) { | ||
| return size() > MAX_CACHE_SIZE; | ||
| } | ||
| }); | ||
|
|
||
| /** | ||
| * Get compiled pattern from cache or compile and cache it. | ||
| * | ||
| * @param regex The regex pattern string | ||
| * @return Compiled Pattern object | ||
| * @throws PatternSyntaxException if the regex is invalid | ||
| */ | ||
| public static Pattern getCompiledPattern(String regex) { | ||
| Pattern pattern = patternCache.get(regex); | ||
| if (pattern == null) { | ||
| pattern = Pattern.compile(regex); | ||
| patternCache.put(regex, pattern); | ||
| } | ||
| return pattern; | ||
| } | ||
|
|
||
| /** | ||
| * Extract list of named group candidates from a regex pattern. | ||
| * | ||
| * @param pattern The regex pattern string | ||
| * @return List of named group names found in the pattern | ||
| */ | ||
| public static List<String> getNamedGroupCandidates(String pattern) { | ||
| ImmutableList.Builder<String> namedGroups = ImmutableList.builder(); | ||
| Matcher m = NAMED_GROUP_PATTERN.matcher(pattern); | ||
| while (m.find()) { | ||
| namedGroups.add(m.group(1)); | ||
| } | ||
| return namedGroups.build(); | ||
| } | ||
|
|
||
| /** | ||
| * Match using find() for partial match semantics with string pattern. | ||
| * | ||
| * @param text The text to match against | ||
| * @param patternStr The pattern string | ||
| * @return true if pattern is found anywhere in the text | ||
| * @throws PatternSyntaxException if the regex is invalid | ||
| */ | ||
| public static boolean matchesPartial(String text, String patternStr) { | ||
| if (text == null || patternStr == null) { | ||
| return false; | ||
| } | ||
| Pattern pattern = getCompiledPattern(patternStr); | ||
| return pattern.matcher(text).find(); | ||
| } | ||
|
|
||
| /** | ||
| * Extract a specific named group from text using the pattern. Used by parse command regex method. | ||
| * | ||
| * @param text The text to extract from | ||
| * @param pattern The compiled pattern with named groups | ||
| * @param groupName The name of the group to extract | ||
| * @return The extracted value or null if not found | ||
| */ | ||
| public static String extractNamedGroup(String text, Pattern pattern, String groupName) { | ||
| if (text == null || pattern == null || groupName == null) { | ||
| return null; | ||
| } | ||
|
|
||
| Matcher matcher = pattern.matcher(text); | ||
|
|
||
| if (matcher.matches()) { | ||
| try { | ||
| return matcher.group(groupName); | ||
| } catch (IllegalArgumentException e) { | ||
| return null; | ||
| } | ||
| } | ||
|
|
||
| return null; | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.