-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Disable Approximation when dealing with multiple sorts #18763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Adding @kkewwei @msfroh @harshavamsi @andrross to take look and let me know your thoughts (while I try to add some tests), I have identified this while trying to implement approximation for |
868db60 to
649efd6
Compare
Signed-off-by: Prudhvi Godithi <[email protected]>
649efd6 to
958cf7c
Compare
|
❌ Gradle check result for 649efd6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
I don't think that's true, unless you're talking about the collection that's done during the approximation phase. (Lucene's collectors will just use a priority queue that applies the secondary sort as a tie-breaker after the first sort.) But if you are referring to the approximation collection, the other option is to keep collecting documents for the primary sort as long as the primary value remains the same as the last value. That is, if I'm sorting by timestamp descending and my 10,000th hit corresponds to a a document with time You should be able to confirm with a unit test that fails without that fix. From a sequencing standpoint, I think it's easier to address the composite sort problem first, and then address the |
Yes Froh I was in context with Approximation.
So collect normally until we hit our target lets says as 10,000. Now When we reach documents 10,000 compare the next doc if it has same value continue collecting until the value is not same. So even if its |
Moving this to draft as I will change this PR to ensure Approximation Framework supports multi (tie-breaker) sort https://docs.opensearch.org/docs/latest/search-plugins/searching-data/sort/ and then take up the search_after support. |
I was able to add a logic and test with multiple sorts but I can see the performance impact. I updated the visit with iterator input to compare the value and if its different return as Updated the
@msfroh am I missing anything here as seeing this I feel there is performance impact when dealing with multiple sorts. In the flame graph I see too many tree traversals |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
I have tested few attempts and looks like there is a small complexity attempting to compare the
@msfroh I will create a separate issue with some proposals to support multiple sorts and since we are close to 3.2 release lets gets the bug fix merged to disable the multiple sorts WDYT? Thanks |
|
@msfroh can I get your approval to move forward with this PR? Thanks |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
❌ Gradle check result for 22e4ae4: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 22e4ae4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…oject#18763) * Approximation Framework bug fix Signed-off-by: Prudhvi Godithi <[email protected]> * Add tests Signed-off-by: Prudhvi Godithi <[email protected]> * Fix spacing Signed-off-by: Prudhvi Godithi <[email protected]> --------- Signed-off-by: Prudhvi Godithi <[email protected]> Signed-off-by: sunqijun.jun <[email protected]>
…oject#18763) * Approximation Framework bug fix Signed-off-by: Prudhvi Godithi <[email protected]> * Add tests Signed-off-by: Prudhvi Godithi <[email protected]> * Fix spacing Signed-off-by: Prudhvi Godithi <[email protected]> --------- Signed-off-by: Prudhvi Godithi <[email protected]>
…oject#18763) * Approximation Framework bug fix Signed-off-by: Prudhvi Godithi <[email protected]> * Add tests Signed-off-by: Prudhvi Godithi <[email protected]> * Fix spacing Signed-off-by: Prudhvi Godithi <[email protected]> --------- Signed-off-by: Prudhvi Godithi <[email protected]>
Description
Disable approximation when using multiple sorts because the secondary sort field is only applied after collection, not during the collection phase. With early termination at 10k documents, the query may miss documents with better secondary sort values that appear later in the index. Therefore, when multiple sorts are present, we must examine all documents to ensure correct results
Coming from suggestion here #18763 (comment), its worth for approximation framework to add support when query dealing with multiple sorts.NOTE: Here are more details on the attempt #18763 (comment) to support multiple sorts.
Related Issues
Part of #18619
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.