Skip to content

Conversation

@Yury-Fridlyand
Copy link

@Yury-Fridlyand Yury-Fridlyand commented Feb 14, 2023

Description

https://github.com/Bit-Quill/opensearch-project-sql/blob/61767f2b200b2a7f8c8b2df32de209b3c30caa61/docs/dev/Pagination-v2.md

All credits to @MaxKsyunz.

You can use attached script for testing as well. Command line: ./cursor_test.sh <table> <page size>. Requires jq. Rename it before use.
cursor_test.sh.txt

Scroll API usage doc:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

TODOs

Follow-up TODOs

Decided to do out of scope of this PR:

  • Remove duplicating entities SqlRequest and SQLQueryRequest or at least fix processing json content (if cursor detected, content isn't kept)
  • Rework cursor reading/writing:
    • Convert cursor parsing to deserialization to make it scalable
    • Move cursor serialization and deserialization to one place (see 1a83941)
    • Rework cursor serialization - binary objects are being converted to string twice - in DefaultExpressionSerializer and in compress function; keep cursor as a binary object.
    • Optional: Encode/encrypt cursor to avoid it being modified/hacked
  • Should we return cursor automatically for big tables?
curl -s -XPOST http://localhost:9200/_plugins/_sql -H 'Content-Type: application/json' -d '{"query": "SELECT * FROM online" }' | jq .total,.cursor

Issues Resolved

opensearch-project#656

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Bit-Quill Bit-Quill deleted a comment from codecov bot Feb 15, 2023
@codecov

This comment was marked as spam.

return Collections.emptyList();
}

default TableScanOperator getTableScan(String indexName, String scrollId) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why get TableScanOperator in StorageEngine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in cursor deserialization/parsing, this happens in :core module. All implementation of StorageEngine are defined outside of :core and unavailable and that point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penghuo does this design make sense? Would love to know what you think.

/**
* Create {@link LogicalPlanOptimizer} with pre-defined rules.
*/
public static LogicalPlanOptimizer paginationCreate() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

difference is CreatePagingTableScanBuilder()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PushPageSize and CreatePagingTableScanBuilder instead of CreateTableScanBuilder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of overlap here. You could create a common function create() and private functions that push the transformation and push_down capabilities separately. Something like:

  public static LogicalPlanOptimizer create() {
    return create(false);
  }

  public static LogicalPlanOptimizer create(boolean isPagingRequired) {
    return new LogicalPlanOptimizer(Arrays.asList(
        /*
         * Phase 1: Transformations that rely on relational algebra equivalence
         */
        getFilterTransformations(),
        /*
         * Phase 2: Transformations that rely on data source push down capability
         */
        isPagingRequired ? getPaginatedTableScanBuilder() : getTableScanBuilder(),
        getPushDownTransformations(),
        getTableWriteBuilder()
    ));
  }

  private static List getFilterTransformations() {
    return Arrays.asList(
        new MergeFilterAndFilter(),
        new PushFilterUnderSort()
    );
  }

  private static List getTableScanBuilder() {
    return Arrays.asList(
        new CreateTableScanBuilder(),
        );
  }

  private static List getPaginatedTableScanBuilder() {
    return Arrays.asList(
        new PushPageSize(),
        new CreatePagingTableScanBuilder(),
        );
  }

MaxKsyunz and others added 15 commits March 3, 2023 14:35
Implement PaginatedPlanCache.convertToPlan for second page to work.

Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
* Added push down page size from `LogicalPaginate` to `LogicalRelation`.
* Improved cursor encoding and decoding.
* Added cursor compression.
* Fixed issuing `SearchScrollRequest`.
* Fixed returning last empty page.
* Minor code grooming/commenting.

Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Copy link

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yury-Fridlyand please have a look at my comments so far. There will be more to come.

import org.opensearch.sql.planner.logical.LogicalRelation;
import org.opensearch.sql.planner.optimizer.Rule;

public class PushPageSize

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you implement this similar to PUSH_DOWN_* rules:

  • Add a pushDownPageSize method to TableScanBuilder
  • Add a PUSH_DOWN_PAGE_SIZE similar to others that matches LogicalPaginate of scanBuilder and calls pushDownPageSize.

Doing so would remove the need for PushPageSize and LogicalRelation.pageSize property, be less code, and follow a pattern already used in the code base, like PUSH_DOWN_LIMIT.

return Collections.emptyList();
}

default TableScanOperator getTableScan(String indexName, String scrollId) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@penghuo does this design make sense? Would love to know what you think.


/** Default scroll context timeout in minutes. */
public static final TimeValue DEFAULT_SCROLL_TIMEOUT = TimeValue.timeValueMinutes(1L);
public static final TimeValue DEFAULT_SCROLL_TIMEOUT = TimeValue.timeValueMinutes(100L);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is such a long time out necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it for debugging purposes, but didn't revert, because 1 min is too small.
TBD.
@penghuo, @dai-chen,
What should be the default scroll lifetime?
We're looking forward to make it configurable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Test
public void hasNext_a_page() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test where hasNext returns true then false based on pagedSize and numReturned?

* Add javadocs
* Renames
* Cleaning up some comments
* Remove unused code
* Speed up IT

Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
SearchResponse openSearchResponse = scrollAction.apply(new SearchScrollRequest(initialScrollId)
.scroll(DEFAULT_SCROLL_TIMEOUT));

// TODO if terminated_early - something went wrong, e.g. no scroll returned.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO (and other TODOs in the PR) meant to be implemented in Phase 2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good practice is to leave no TODOs behind.
I'll do my best to resolve them ASAP, but some of them are related to design of existing code and can't be fixed easily.
This very TODO related to a terminated search query - plugin doesn't handle such case for any queries.

/**
* Create {@link LogicalPlanOptimizer} with pre-defined rules.
*/
public static LogicalPlanOptimizer paginationCreate() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of overlap here. You could create a common function create() and private functions that push the transformation and push_down capabilities separately. Something like:

  public static LogicalPlanOptimizer create() {
    return create(false);
  }

  public static LogicalPlanOptimizer create(boolean isPagingRequired) {
    return new LogicalPlanOptimizer(Arrays.asList(
        /*
         * Phase 1: Transformations that rely on relational algebra equivalence
         */
        getFilterTransformations(),
        /*
         * Phase 2: Transformations that rely on data source push down capability
         */
        isPagingRequired ? getPaginatedTableScanBuilder() : getTableScanBuilder(),
        getPushDownTransformations(),
        getTableWriteBuilder()
    ));
  }

  private static List getFilterTransformations() {
    return Arrays.asList(
        new MergeFilterAndFilter(),
        new PushFilterUnderSort()
    );
  }

  private static List getTableScanBuilder() {
    return Arrays.asList(
        new CreateTableScanBuilder(),
        );
  }

  private static List getPaginatedTableScanBuilder() {
    return Arrays.asList(
        new PushPageSize(),
        new CreatePagingTableScanBuilder(),
        );
  }

@github-advanced-security
Copy link

You have successfully added a new CodeQL configuration .github/workflows/codeql-analysis.yml:analyze/language:java. As part of the setup process, we have scanned this repository and found 5 existing alerts. Please check the repository Security tab to see all alerts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants