-
Notifications
You must be signed in to change notification settings - Fork 181
Perform RexNode expression standardization for script push down. #4795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform RexNode expression standardization for script push down. #4795
Conversation
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…ardization # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml # integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.json # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
| SerializationWrapper.wrapWithLangType( | ||
| ScriptEngineType.CALCITE, serializer.serialize(rexNode, rowType, fieldTypes)); | ||
| ScriptEngineType.CALCITE, | ||
| serializer.serialize(rexNode, rowType, fieldTypes, sources, digests, literals)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why include sources, digests, literals as paramater in serialize() function and client create empty array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It is required when create Script on L1504.
I also found sources, digests, literals exposed been used in multiple place without encapsulation. e.g. ScriptDataContext and standardizeRexNodeExpression.
can we encapsulate our script protocol in a class? e.g.
class ParameterBindings {
void putValue(String name, Object value)
Object getValue(String name)
}
|
|
||
| final int[] currentIndex = {0}; | ||
| final RexShuttle rexShuttle = | ||
| new RexShuttle() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it a singleton instance? Maybe leverage RexBiVisitorImpl?
| import static com.google.common.base.Preconditions.checkState; | ||
| import static java.lang.String.format; | ||
| import static java.util.Objects.requireNonNull; | ||
| import static javax.swing.UIManager.put; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit question: Is it a right import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will remove
| } | ||
|
|
||
| // For filter script, this method will be called after planning phase; | ||
| // For the agg-script, this will be called in planning phase to generate agg builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to this PR. It would be nice to be lazy as well. Does the exception come from script generation or else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Our push down operations after aggregation have strongly dependance on the exception throwing mechanism, so it can prevent such push down easily if exception happens.
We'd better extract all cases which is actually not our targets out of the transformation lambda function. But they are too many and too complex to refactor at once in this PR.
| OpenSearchTypeFactory.ExprUDT udt = OpenSearchTypeFactory.ExprUDT.valueOf((String) udtName); | ||
| return ((OpenSearchTypeFactory) typeFactory).createUDT(udt); | ||
| // View IP as string to avoid using a value of customized java type in the script. | ||
| if (udt == ExprUDT.EXPR_IP) return super.toType(typeFactory, o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing that comes up in my mind, how about the sorting by ExprIpValue? It was one of blocker to prevent last change made by yuanchuan to directly convert IP to string. cc @yuancu
Another discussion I remember is whether IP sorting is necessary. Forgot the result of discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luckily seems we don't have any function/script return ExprIPValue for now, so I think at least it won't block any script push down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good for now
Signed-off-by: Heng Qian <[email protected]>
| return switch (sources.get(index)) { | ||
| case DOC_VALUE -> getFromDocValue((String) digests.get(index)); | ||
| case SOURCE -> getFromSource((String) digests.get(index)); | ||
| case LITERAL -> getFromLiteral((Integer) digests.get(index)); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
Can ScriptParameterHelper been used to hide protocol?
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
Show resolved
Hide resolved
…ardization # Conflicts: # opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java # opensearch/src/main/java/org/opensearch/sql/opensearch/util/OpenSearchRelOptUtil.java
yuancu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflicts exist
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
Show resolved
Hide resolved
| OpenSearchTypeFactory.ExprUDT udt = OpenSearchTypeFactory.ExprUDT.valueOf((String) udtName); | ||
| return ((OpenSearchTypeFactory) typeFactory).createUDT(udt); | ||
| // View IP as string to avoid using a value of customized java type in the script. | ||
| if (udt == ExprUDT.EXPR_IP) return super.toType(typeFactory, o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good for now
Signed-off-by: Heng Qian <[email protected]>
|
The latest change: f74600f includes:
|
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4795-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a7c56870d1dcbb0709246768369ef119ad9bf4cd
# Push it to GitHub
git push --set-upstream origin backport/backport-4795-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4795-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a7c56870d1dcbb0709246768369ef119ad9bf4cd
# Push it to GitHub
git push --set-upstream origin backport/backport-4795-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
…nsearch-project#4795) * RexNode standardization Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 2 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 3 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 4 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 5 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 6 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 7 Signed-off-by: Heng Qian <[email protected]> * Refine code and add doc about script Signed-off-by: Heng Qian <[email protected]> * Add intro-scripts.md Signed-off-by: Heng Qian <[email protected]> * Fix IT Signed-off-by: Heng Qian <[email protected]> * Refine code Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit a7c5687) Signed-off-by: Heng Qian <[email protected]>
…ript push down. (#4795) (#4849) * Perform RexNode expression standardization for script push down. (#4795) * RexNode standardization Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 2 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 3 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 4 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 5 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 6 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 7 Signed-off-by: Heng Qian <[email protected]> * Refine code and add doc about script Signed-off-by: Heng Qian <[email protected]> * Add intro-scripts.md Signed-off-by: Heng Qian <[email protected]> * Fix IT Signed-off-by: Heng Qian <[email protected]> * Refine code Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit a7c5687) Signed-off-by: Heng Qian <[email protected]> * Change java style to 11 Signed-off-by: Heng Qian <[email protected]> * Fix IT Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]>
…nsearch-project#4795) * RexNode standardization Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 2 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 3 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 4 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 5 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 6 Signed-off-by: Heng Qian <[email protected]> * RexNode standardization 7 Signed-off-by: Heng Qian <[email protected]> * Refine code and add doc about script Signed-off-by: Heng Qian <[email protected]> * Add intro-scripts.md Signed-off-by: Heng Qian <[email protected]> * Fix IT Signed-off-by: Heng Qian <[email protected]> * Refine code Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> * Address comments Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]>
Description
This PR includes changes:
ROW_TYPEandEXPR_MAPin our script. Then the average script size can be reduced by 2 to 5 times than before.OpenSearchRequestBuilderwhen computing digest forOpenSearchIndexScanOperator, while keep it when generating explain plan.OpenSearchRequestBuilderinPushDownContextand make the related action lazy perform. Since we have change 3, it's less valuable to hold that object in eachPushDownContext.Related Issues
Partly resolves #4757
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.