Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Nov 13, 2025

Description

This PR includes changes:

  1. Implement the step 1&2(Replace field and literal with parameters) described in the RFC:[RFC] RexNode standardization for script push down #4757. This will enhance our script cache to get higher hitting ratio.
  2. Remove ROW_TYPE and EXPR_MAP in our script. Then the average script size can be reduced by 2 to 5 times than before.
  3. Remove OpenSearchRequestBuilder when computing digest for OpenSearchIndexScanOperator, while keep it when generating explain plan.
  4. Remove OpenSearchRequestBuilder in PushDownContext and make the related action lazy perform. Since we have change 3, it's less valuable to hold that object in each PushDownContext.
  5. Tiny enhancement on the parameter of SORT_EXPR, see Pushdown sort by complex expressions to scan #4750 (comment)

Related Issues

Partly resolves #4757

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
…ardization

# Conflicts:
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.json
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@yuancu yuancu requested a review from penghuo November 14, 2025 07:21
Signed-off-by: Heng Qian <[email protected]>
SerializationWrapper.wrapWithLangType(
ScriptEngineType.CALCITE, serializer.serialize(rexNode, rowType, fieldTypes));
ScriptEngineType.CALCITE,
serializer.serialize(rexNode, rowType, fieldTypes, sources, digests, literals));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why include sources, digests, literals as paramater in serialize() function and client create empty array?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It is required when create Script on L1504.
I also found sources, digests, literals exposed been used in multiple place without encapsulation. e.g. ScriptDataContext and standardizeRexNodeExpression.

can we encapsulate our script protocol in a class? e.g.

class ParameterBindings {
   void putValue(String name, Object value)
   Object getValue(String name)
}


final int[] currentIndex = {0};
final RexShuttle rexShuttle =
new RexShuttle() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it a singleton instance? Maybe leverage RexBiVisitorImpl?

import static com.google.common.base.Preconditions.checkState;
import static java.lang.String.format;
import static java.util.Objects.requireNonNull;
import static javax.swing.UIManager.put;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit question: Is it a right import?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will remove

}

// For filter script, this method will be called after planning phase;
// For the agg-script, this will be called in planning phase to generate agg builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR. It would be nice to be lazy as well. Does the exception come from script generation or else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Our push down operations after aggregation have strongly dependance on the exception throwing mechanism, so it can prevent such push down easily if exception happens.

We'd better extract all cases which is actually not our targets out of the transformation lambda function. But they are too many and too complex to refactor at once in this PR.

OpenSearchTypeFactory.ExprUDT udt = OpenSearchTypeFactory.ExprUDT.valueOf((String) udtName);
return ((OpenSearchTypeFactory) typeFactory).createUDT(udt);
// View IP as string to avoid using a value of customized java type in the script.
if (udt == ExprUDT.EXPR_IP) return super.toType(typeFactory, o);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing that comes up in my mind, how about the sorting by ExprIpValue? It was one of blocker to prevent last change made by yuanchuan to directly convert IP to string. cc @yuancu

Another discussion I remember is whether IP sorting is necessary. Forgot the result of discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily seems we don't have any function/script return ExprIPValue for now, so I think at least it won't block any script push down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for now

Signed-off-by: Heng Qian <[email protected]>
Comment on lines 207 to 211
return switch (sources.get(index)) {
case DOC_VALUE -> getFromDocValue((String) digests.get(index));
case SOURCE -> getFromSource((String) digests.get(index));
case LITERAL -> getFromLiteral((Integer) digests.get(index));
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
Can ScriptParameterHelper been used to hide protocol?

penghuo
penghuo previously approved these changes Nov 18, 2025
…ardization

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
#	opensearch/src/main/java/org/opensearch/sql/opensearch/util/OpenSearchRelOptUtil.java
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicts exist

OpenSearchTypeFactory.ExprUDT udt = OpenSearchTypeFactory.ExprUDT.valueOf((String) udtName);
return ((OpenSearchTypeFactory) typeFactory).createUDT(udt);
// View IP as string to avoid using a value of customized java type in the script.
if (udt == ExprUDT.EXPR_IP) return super.toType(typeFactory, o);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for now

Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws
Copy link
Collaborator Author

The latest change: f74600f includes:

  1. an enhancement on the parameter from SORT_EXPR: Pushdown sort by complex expressions to scan #4750 (comment)
  2. Apply comments: Perform RexNode expression standardization for script push down. #4795 (comment)

@qianheng-aws qianheng-aws merged commit a7c5687 into opensearch-project:main Nov 19, 2025
68 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4795-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a7c56870d1dcbb0709246768369ef119ad9bf4cd
# Push it to GitHub
git push --set-upstream origin backport/backport-4795-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4795-to-2.19-dev.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4795-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a7c56870d1dcbb0709246768369ef119ad9bf4cd
# Push it to GitHub
git push --set-upstream origin backport/backport-4795-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4795-to-2.19-dev.

qianheng-aws added a commit to qianheng-aws/sql that referenced this pull request Nov 24, 2025
…nsearch-project#4795)

* RexNode standardization

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 2

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 3

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 4

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 5

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 6

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 7

Signed-off-by: Heng Qian <[email protected]>

* Refine code and add doc about script

Signed-off-by: Heng Qian <[email protected]>

* Add intro-scripts.md

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

* Refine code

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

(cherry picked from commit a7c5687)
Signed-off-by: Heng Qian <[email protected]>
qianheng-aws added a commit that referenced this pull request Nov 24, 2025
…ript push down. (#4795) (#4849)

* Perform RexNode expression standardization for script push down. (#4795)

* RexNode standardization

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 2

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 3

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 4

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 5

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 6

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 7

Signed-off-by: Heng Qian <[email protected]>

* Refine code and add doc about script

Signed-off-by: Heng Qian <[email protected]>

* Add intro-scripts.md

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

* Refine code

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

(cherry picked from commit a7c5687)
Signed-off-by: Heng Qian <[email protected]>

* Change java style to 11

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Nov 24, 2025
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
…nsearch-project#4795)

* RexNode standardization

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 2

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 3

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 4

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 5

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 6

Signed-off-by: Heng Qian <[email protected]>

* RexNode standardization 7

Signed-off-by: Heng Qian <[email protected]>

* Refine code and add doc about script

Signed-off-by: Heng Qian <[email protected]>

* Add intro-scripts.md

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

* Refine code

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] RexNode standardization for script push down

5 participants