Skip to content

Conversation

@seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Aug 1, 2022

Description

#703 for non aggregation queries.

With this feature, we can

  • run source=index and get back query.size_limit results
  • run source=index | head x where x <= index.max_result_window and get back x results by regular OpenSearch query request
  • run source=index | head x where x > index.max_result_window and get back x results by scrolling, bypassing the OpenSearch restriction, essentially

This can be combined with other commands that post-processes the data rows fetched, e.g.

  • source=index | head 100000 | top <field>
  • source=index | head 100000 | <ml command>

Similar also works for SQL LIMIT operator.

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

public Integer getMaxResultWindow() {
// system index doesn't need this function
// the magic number is the number of fields of the mapping table
return 27;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a better way to do this

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.83%. Comparing base (792d3ff) to head (52ba001).
⚠️ Report is 1063 commits behind head on main.

❌ Your project status has failed because the head coverage (94.83%) is below the target coverage (99.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #716      +/-   ##
============================================
- Coverage     97.76%   94.83%   -2.94%     
- Complexity     2880     2898      +18     
============================================
  Files           276      287      +11     
  Lines          7077     7802     +725     
  Branches        447      568     +121     
============================================
+ Hits           6919     7399     +480     
- Misses          157      349     +192     
- Partials          1       54      +53     
Flag Coverage Δ
query-workbench 62.76% <ø> (?)
sql-engine 97.78% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SearchSourceBuilder sourceBuilder = request.getSourceBuilder();
if (limit + offset > maxResultWindow) {
limit = maxResultWindow - offset;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for scrolling.
OpenSearch requires that from(offset) + size(limit) <= index.max_result_window. A limit operator with limit+offset > index.max_result_window will invoke scroll request multiple times, each with batch size index.max_result_window

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if condition limit + offset > maxResultWindow is used in different place, could we simply it?

Comment on lines 150 to 155
Integer limit = node.getLimit();
Integer offset = node.getOffset();
PlanContext planContext = context.getPlanContext();
if (limit + offset > planContext.getMaxResultWindow()) {
planContext.setIndexScanType(PlanContext.IndexScanType.SCROLL);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to reduce duplicate code. visitLimit and visitHead

when(response.isEmpty()).thenReturn(true);
} else {
fail("Search request after empty response returned already");
when(response.isEmpty()).thenReturn(true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now fetch results in batches, additional search requests can be made when calling hasNext()

}

@Override
public Integer getIndexMaxResultWindow(String... indexExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments to explain the min(index...).

*/
Map<String, IndexMapping> getIndexMappings(String... indexExpression);

Integer getIndexMaxResultWindow(String... indexExpression);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs.

this.request = new OpenSearchQueryRequest(indexName,
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory);
this.maxResultWindow = context.getMaxResultWindow();
switch (context.getIndexScanType()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i not sure how the PlanContext help in this case. seems we could build OpenSerachRequest when visit the logical plan tree.


@Override
public LogicalPlan apply(LogicalLimit plan, Captures captures) {
public LogicalPlan apply(LogicalLimit plan, Captures captures, PlanContext context) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why keeping the Limit node?


@Override
public boolean hasNext() {
if (isAggregation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregation should also support scroll, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I'll work on it next. Aggregation works quite different from non-agg queries. If there's a lot to change I'll consider making a separate PR for it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could open a issue if it is not covered in this PR?

@seankao-az seankao-az marked this pull request as ready for review August 8, 2022 23:10
@seankao-az seankao-az requested a review from a team as a code owner August 8, 2022 23:10
@seankao-az seankao-az requested a review from penghuo August 9, 2022 21:14
}

@Override
public Map<String, Integer> getIndexMaxResultWindows(String... indexExpression) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need mapping or just single maxResultWindow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approach this in a way similar to how getIndexMappings handles it.
For the client, getIndexMappings and getIndexMaxResultWindows return a Mapping from the index name to the corresponding result, without extra business logic.
In OpenSearchDescribeIndexRequest, getMaxResultWindow chooses the minimum of these values. This is specific to our need: if there's multiple indices, get the minimum max_result_window. Similarly, getFieldTypes unions the mappings of multiple indices here.

Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: Sean Kao <[email protected]>
@seankao-az seankao-az requested a review from penghuo August 11, 2022 18:50
Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seankao-az seankao-az merged commit f3c9c29 into opensearch-project:main Aug 12, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-716-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f3c9c29538abdf93780e156edba3558fd43479da
# Push it to GitHub
git push --set-upstream origin backport/backport-716-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-716-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants