Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Oct 10, 2025

Description

Add two configurable limitations for PPL.

  1. plugins.ppl.subsearch.maxout (default value 10000, similar to maxout in [subsearch], ref)
  2. plugins.ppl.join.subsearch_maxout (default value 50000, similar to subsearch_maxout in [join], ref)

Related Issues

Resolves #3731 and #4430

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
children.forEach(c -> analyze(c, context));
// add join.subsearch_maxout limit to subsearch side
if (context.sysLimit.joinSubsearchLimit() >= 0) {
replaceTop(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, is it possible to avoid access private method?

2cents, Add a frame in CalcitePlanContext, frame is boundary of subsearch, and define limit on frame. When visit subsearch, append LogicalSystemLimit to subsearch on each frame.

Copy link
Member Author

@LantaoJin LantaoJin Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, is it possible to avoid access private method?

I don't think so.

When visit the subsearch side (right in join for example), the right plan was pushed to stack.

public RelNode analyze(UnresolvedPlan unresolved, CalcitePlanContext context) {
    return unresolved.accept(this, context);
  }

RelBuilder.pop() is private either. So we don't have a way to replace it.

Here was my previous try code for join

  public RelNode visitJoin(Join node, CalcitePlanContext context) {
    // visit the main side
    analyze(node.getLeft(), context);
    if (context.sysLimit.joinSubsearchLimit() >= 0) {
      // add join.subsearch_maxout limit to subsearch side
      RelNode withLimit = context.relBuilder.with(
          analyze(node.getRight(), context),
          r -> LogicalSystemLimit.create(
            SystemLimitType.JOIN_SUBSEARCH_MAXOUT,
            r.peek(),
            r.literal(context.sysLimit.joinSubsearchLimit())));
      context.relBuilder.push(withLimit); // push the new subsearch plan
    } else {
      // visit the subsearch side
      analyze(node.getRight(), context);
    }

The code use relBuilder.with(), but the first parameter analyze(node.getRight(), context) will push the subsearch to stack, and the with() method push it twice.

  /** Evaluates an expression with a relational expression temporarily on the
   * stack. */
  public <E> E with(RelNode r, Function<RelBuilder, E> fn) {
    try {
      push(r);
      return fn.apply(this);
    } finally {
      stack.pop();
    }
  }
  1. push left plan by analyze(node.getLeft(), context), stack size is 1
  2. push right plan by the first parameter of with(analyze(node.getRight(), context)), stack size is 2
  3. push duplicated right plan by push in with, stack size is 3
  4. pop duplicated right plan by pop in with, stack size is 2
  5. push new right plan by context.relBuilder.push(withLimit), stack size is 3 (incorrect)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work by using relbuilder.build() + relbuilder.push(newTop)? relbuilder.build() will do pop while public.

Copy link
Member Author

@LantaoJin LantaoJin Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync offline. We still cannot use relbuilder.build() + relbuilder.push(newTop) since it will empty the fields of Frame.

  private void replaceTop(RelNode node) {
    final Frame frame = stack.pop();
    stack.push(new Frame(node, frame.fields));    // <--- frame.fields will be kept all the time
  }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does SQL Join translate to RelNode? It use the private method?

Description
-----------

The size configures the maximum of rows to return from subsearch. The default value is: ``10000``. A value of ``-1`` indicates that the restriction is unlimited.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if set to 0? Join/Subquery will be optimzied by Calcite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I followed the standard behaviour in database:

select * from t_outer where exists (select 1 from t_inner where t_outer.id = t_inner.id limit 0);
select * from t_outer where id in (select id from t_inner limit 0);
select * from t_outer where id = (select id from t_inner limit 0);
select * from t_outer where id = (select count(*) from t_inner limit 0);

All above queries return empty in SQL (postgresql).

The implementation is here https://github.com/opensearch-project/sql/pull/4501/files#diff-e5198d773af75bf3173ef25676a2803a0091cb51e32d6ae30241273519d30261R601-R605

Copy link
Member Author

@LantaoJin LantaoJin Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your thoughts, set both 0 and negative value to unlimited? @penghuo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, 0 and -1 means unlimited.

Copy link
Member Author

@LantaoJin LantaoJin Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, 0 and -1 means unlimited.

sure, let me update the code and doc to
0 means unlimited, and minValue=0 in Settings

children.forEach(c -> analyze(c, context));
// add join.subsearch_maxout limit to subsearch side
if (context.sysLimit.joinSubsearchLimit() >= 0) {
replaceTop(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work by using relbuilder.build() + relbuilder.push(newTop)? relbuilder.build() will do pop while public.

@Override
public RelNode visit(RelNode other) {
RelNode newInput =
other.getInputs().isEmpty() ? null : other.getInput(0).accept(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]Will there be case that there is join or union in subsearch? In those case there will be more than 1 input for the specific operators? If so, the current code will construct incorrect plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit. For BiRel or SetOp, just return.

});
planVisitor.replaceTop(context.relBuilder, replacement);
}
if (subqueryExpression instanceof InSubquery) {
Copy link
Collaborator

@qianheng-aws qianheng-aws Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL support correlate condition for in subquery or scalar subquery. So Calcite should support them as well.
e.g.

SELECT * FROM EMPLOYEE WHERE location in (select location from DEPART where EMPLOYEE.dept = DEPART.name) limit 1

If there is correlate condition for in or scalar subsearch, shall we do similar operation like above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the same logic for correlated in-subquery. For correlated scalar-subquery, since there is always an aggregation will be perform in subquery, sysLimit is not necessary.

Signed-off-by: Lantao Jin <[email protected]>
@LantaoJin LantaoJin requested review from penghuo and yuancu October 13, 2025 12:37
@qianheng-aws qianheng-aws merged commit fddbb70 into opensearch-project:main Oct 14, 2025
33 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4501-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fddbb705a6aeae138915e2174d5d7ea3ccbd3e9e
# Push it to GitHub
git push --set-upstream origin backport/backport-4501-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4501-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Oct 14, 2025
…opensearch-project#4501)

* Add configurable sytem limitations for subsearch and join command

Signed-off-by: Lantao Jin <[email protected]>

* Fix IT

Signed-off-by: Lantao Jin <[email protected]>

* typo

Signed-off-by: Lantao Jin <[email protected]>

* fix IT

Signed-off-by: Lantao Jin <[email protected]>

* remove rollback in doc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* fix typo

Signed-off-by: Lantao Jin <[email protected]>

* Fix IT

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit fddbb70)
Signed-off-by: Lantao Jin <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Oct 14, 2025
LantaoJin added a commit that referenced this pull request Oct 15, 2025
…` and `join` command (#4501) (#4535)

* Add configurable sytem limitations for `subsearch` and `join` command (#4501)

* Add configurable sytem limitations for subsearch and join command

Signed-off-by: Lantao Jin <[email protected]>

* Fix IT

Signed-off-by: Lantao Jin <[email protected]>

* typo

Signed-off-by: Lantao Jin <[email protected]>

* fix IT

Signed-off-by: Lantao Jin <[email protected]>

* remove rollback in doc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* fix typo

Signed-off-by: Lantao Jin <[email protected]>

* Fix IT

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit fddbb70)
Signed-off-by: Lantao Jin <[email protected]>

* migrate java 21 to 11

Signed-off-by: Lantao Jin <[email protected]>

* Fix conflicts

Signed-off-by: Lantao Jin <[email protected]>

* Fix IT

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
ykmr1224 added a commit to ykmr1224/sql that referenced this pull request Oct 15, 2025
commit cba8d02
Author: Tomoyuki MORITA <[email protected]>
Date:   Wed Oct 15 13:08:05 2025 -0700

    Add MAP_APPEND internal function to Calcite PPL (opensearch-project#4515)

    * Add MAP_APPEND internal function to Calcite PPL

    Signed-off-by: Tomoyuki Morita <[email protected]>

    * Minor fix

    Signed-off-by: Tomoyuki Morita <[email protected]>

    * Address comment

    Signed-off-by: Tomoyuki Morita <[email protected]>

    * Rebase and fix IT issue

    Signed-off-by: Tomoyuki Morita <[email protected]>

    ---------

    Signed-off-by: Tomoyuki Morita <[email protected]>

commit 3388dc7
Author: Lantao Jin <[email protected]>
Date:   Thu Oct 16 01:45:29 2025 +0800

    Use `_doc` + `_shard_doc` as sort tiebreaker to get better performance (opensearch-project#4569)

    * Use _shard_doc as sort tiebreaker

    Signed-off-by: Lantao Jin <[email protected]>

    * _doc as a part of tie-breaker have better performance

    Signed-off-by: Lantao Jin <[email protected]>

    ---------

    Signed-off-by: Lantao Jin <[email protected]>

commit 5630119
Author: qianheng <[email protected]>
Date:   Wed Oct 15 16:40:41 2025 +0800

    Fix sort push down into agg after project already pushed (opensearch-project#4546)

    * Fix sort push down into agg

    Signed-off-by: Heng Qian <[email protected]>

    * Change some json files to yaml format

    Signed-off-by: Heng Qian <[email protected]>

    ---------

    Signed-off-by: Heng Qian <[email protected]>

commit 1e62fba
Author: Tomoyuki MORITA <[email protected]>
Date:   Tue Oct 14 17:20:38 2025 -0700

    Fix JsonExtractAllFunctionIT failure (opensearch-project#4556)

    Signed-off-by: Tomoyuki Morita <[email protected]>

commit 02ee33e
Author: Kai Huang <[email protected]>
Date:   Tue Oct 14 14:28:53 2025 -0700

    Add more examples to the `where` command doc (opensearch-project#4457)

    Co-authored-by: Manasvini B S <[email protected]>

commit 0b7e86c
Author: Jialiang Liang <[email protected]>
Date:   Tue Oct 14 10:46:01 2025 -0700

    [Enhancement] Error handling for illegal character usage in java regex named capture group (opensearch-project#4434)

    Co-authored-by: Simeon Widdis <[email protected]>

commit 9c97cfb
Author: Tomoyuki MORITA <[email protected]>
Date:   Tue Oct 14 08:36:43 2025 -0700

    Add JSON_EXTRACT_ALL internal function for Calcite PPL (opensearch-project#4489)

    * Add JSON_EXTRACT_ALL internal function for Calcite PPL

    Signed-off-by: Tomoyuki Morita <[email protected]>

    * Address comments

    Signed-off-by: Tomoyuki Morita <[email protected]>

    * Minor fix

    Signed-off-by: Tomoyuki Morita <[email protected]>

    ---------

    Signed-off-by: Tomoyuki Morita <[email protected]>

commit 89dbc31
Author: Lantao Jin <[email protected]>
Date:   Tue Oct 14 18:24:52 2025 +0800

    Check server status before starting Prometheus (opensearch-project#4537)

    * Check server status before starting Prometheus

    Signed-off-by: Lantao Jin <[email protected]>

    * Change to func call

    Signed-off-by: Lantao Jin <[email protected]>

    * Fix doc

    Signed-off-by: Lantao Jin <[email protected]>

    ---------

    Signed-off-by: Lantao Jin <[email protected]>

commit fe62472
Author: Lantao Jin <[email protected]>
Date:   Tue Oct 14 18:10:27 2025 +0800

    Update request builder after pushdown sort into agg buckets (opensearch-project#4541)

    Signed-off-by: Lantao Jin <[email protected]>

commit 42a415f
Author: qianheng <[email protected]>
Date:   Tue Oct 14 17:42:45 2025 +0800

    Including metadata fields type when doing agg/filter script push down (opensearch-project#4522)

    * Including metadata fields type when doing agg/filter script push down

    Signed-off-by: Heng Qian <[email protected]>

    * Fix IT

    Signed-off-by: Heng Qian <[email protected]>

    ---------

    Signed-off-by: Heng Qian <[email protected]>

commit 8de0386
Author: Xinyuan Lu <[email protected]>
Date:   Tue Oct 14 16:41:08 2025 +0800

    Fix percentile bug (opensearch-project#4539)

    * fix percentile bug

    Signed-off-by: xinyual <[email protected]>

    * add IT

    Signed-off-by: xinyual <[email protected]>

    * optimize it

    Signed-off-by: xinyual <[email protected]>

    ---------

    Signed-off-by: xinyual <[email protected]>

commit de2fdc8
Author: Lantao Jin <[email protected]>
Date:   Tue Oct 14 12:29:03 2025 +0800

    [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited (opensearch-project#4534)

    * [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited

    Signed-off-by: Lantao Jin <[email protected]>

    * fix doctest

    Signed-off-by: Lantao Jin <[email protected]>

    * Fix conflicts

    Signed-off-by: Lantao Jin <[email protected]>

    ---------

    Signed-off-by: Lantao Jin <[email protected]>

commit 977b7ab
Author: Simeon Widdis <[email protected]>
Date:   Mon Oct 13 20:23:10 2025 -0700

    Update stalled action (opensearch-project#4485)

commit fddbb70
Author: Lantao Jin <[email protected]>
Date:   Tue Oct 14 10:23:12 2025 +0800

    Add configurable sytem limitations for `subsearch` and `join` command (opensearch-project#4501)

    * Add configurable sytem limitations for subsearch and join command

    Signed-off-by: Lantao Jin <[email protected]>

    * Fix IT

    Signed-off-by: Lantao Jin <[email protected]>

    * typo

    Signed-off-by: Lantao Jin <[email protected]>

    * fix IT

    Signed-off-by: Lantao Jin <[email protected]>

    * remove rollback in doc

    Signed-off-by: Lantao Jin <[email protected]>

    * address comments

    Signed-off-by: Lantao Jin <[email protected]>

    * fix typo

    Signed-off-by: Lantao Jin <[email protected]>

    * Fix IT

    Signed-off-by: Lantao Jin <[email protected]>

    ---------

    Signed-off-by: Lantao Jin <[email protected]>

Signed-off-by: Tomoyuki Morita <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Set operator limitation for data-intensive operators

4 participants