Disable Approximation when dealing with multiple sorts #18763

prudhvigodithi · 2025-07-15T20:56:05Z

Description

Disable approximation when using multiple sorts because the secondary sort field is only applied after collection, not during the collection phase. With early termination at 10k documents, the query may miss documents with better secondary sort values that appear later in the index. Therefore, when multiple sorts are present, we must examine all documents to ensure correct results

~~Coming from suggestion here #18763 (comment), its worth for approximation framework to add support when query dealing with multiple sorts~~.

NOTE: Here are more details on the attempt #18763 (comment) to support multiple sorts.

curl -X POST "localhost:9200/big5/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  },
  "sort": [
    { "@timestamp": "desc" },
    { "agent.name": "desc" }
  ]
}
'

Related Issues

Part of #18619

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

prudhvigodithi · 2025-07-15T20:58:29Z

Adding @kkewwei @msfroh @harshavamsi @andrross to take look and let me know your thoughts (while I try to add some tests), I have identified this while trying to implement approximation for search_after queries #18546. Essentially there has to be two search_after values when there are two sorts. This PR attempts to disable approximation when there are two sorts.

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-07-15T21:08:08Z

❌ Gradle check result for 649efd6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

msfroh · 2025-07-21T23:22:22Z

because the secondary sort field is only applied after collection

I don't think that's true, unless you're talking about the collection that's done during the approximation phase. (Lucene's collectors will just use a priority queue that applies the secondary sort as a tie-breaker after the first sort.) But if you are referring to the approximation collection, the other option is to keep collecting documents for the primary sort as long as the primary value remains the same as the last value.

That is, if I'm sorting by timestamp descending and my 10,000th hit corresponds to a a document with time t, then (if there's a composite sort) I should continue collecting hits until I find a document with timestamp less than t. So, maybe instead of collecting 10,000 hits, I potentially collect 11,000. The good news is that it's probably still cheaper than running the full query that may match millions of hits.

You should be able to confirm with a unit test that fails without that fix.

From a sequencing standpoint, I think it's easier to address the composite sort problem first, and then address the search_after problem, since (I think) most search_after use-cases use a composite sort to ensure a unique document ordering.

prudhvigodithi · 2025-07-23T04:38:51Z

I don't think that's true, unless you're talking about the collection that's done during the approximation phase.

Yes Froh I was in context with Approximation.

That is, if I'm sorting by timestamp descending and my 10,000th hit corresponds to a a document with time t, then (if there's a composite sort) I should continue collecting hits until I find a document with timestamp less than t. So, maybe instead of collecting 10,000 hits, I potentially collect 11,000. The good news is that it's probably still cheaper than running the full query that may match millions of hits.

So collect normally until we hit our target lets says as 10,000. Now When we reach documents 10,000 compare the next doc if it has same value continue collecting until the value is not same. So even if its CELL_INSIDE_QUERY we should continue to use visitDocValues in that case if the query has tie breaker sort.

prudhvigodithi · 2025-07-23T16:54:29Z

From a sequencing standpoint, I think it's easier to address the composite sort problem first, and then address the search_after problem, since (I think) most search_after use-cases use a composite sort to ensure a unique document ordering.

Moving this to draft as I will change this PR to ensure Approximation Framework supports multi (tie-breaker) sort https://docs.opensearch.org/docs/latest/search-plugins/searching-data/sort/ and then take up the search_after support.
Adding @getsaurabh02

prudhvigodithi · 2025-07-29T17:31:29Z

I was able to reproduce the bug when dealing with multiple sorts where the current code does not honor the secondary sort and can see the issue with docID assertion.

 public void testApproximationBreaksWithMultiSort() throws Exception {
        try (Directory dir = newDirectory();
             RandomIndexWriter w = new RandomIndexWriter(random(), dir, new WhitespaceAnalyzer())) {
            String field      = numericType.fieldName;     // primary numeric field
            String secondary  = " secondary";               // secondary sort key
            /* ───────────────────────── 1. UNIQUE docs 0‑999 ───────────────────────── */
            for (int i = 0; i < 1_000; i++) {
                Document d = new Document();
                numericType.addField(d, field, i);
                numericType.addDocValuesField(d, field, i);
                d.add(new NumericDocValuesField(secondary, i % 100));
                w.addDocument(d);
            }

            /* ──────────────────────── 2. 500 TIES, worst‑first ───────────────────────
             * Insert ties *in descending secondary order* (499 … 0)
             * so the first 50 the walker meets are the "bad" ones.
             */
            for (int i = 0; i < 500; i++) {
                int sec = 499 - i;                     // 499, 498, … 0
                Document d = new Document();
                numericType.addField(d, field, 1000);
                numericType.addDocValuesField(d, field, 1000);
                d.add(new NumericDocValuesField(secondary, sec));
                w.addDocument(d);
            }
            w.flush();
            for (int i = 1001; i <= 1100; i++) {
                Document d = new Document();
                numericType.addField(d, field, i);
                numericType.addDocValuesField(d, field, i);
                d.add(new NumericDocValuesField(secondary, i % 100));
                w.addDocument(d);
            }
            w.forceMerge(1);
            try (IndexReader r = DirectoryReader.open(w.w)) {
                IndexSearcher s = new IndexSearcher(r);
                long lower = 900, upper = 1100;
                int size   = 150;
                Sort sort = new Sort(
                    new SortField(numericType.getSortFieldName(), numericType.getSortFieldType(), true), // DESC primary
                    new SortField(secondary, SortField.Type.LONG, true)                                  // DESC secondary
                );
                Query exact = numericType.rangeQuery(field, lower, upper);
                ApproximatePointRangeQuery approx = new ApproximatePointRangeQuery(
                    field,
                    numericType.encode(lower),
                    numericType.encode(upper),
                    /* dims */ 1,
                    /* size */ size,
                    SortOrder.DESC,
                    numericType.format
                );
                SearchContext mockContext = mock(SearchContext.class);
                ShardSearchRequest mockRequest = mock(ShardSearchRequest.class);
                SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
                sourceBuilder.sort(new FieldSortBuilder(field).order(SortOrder.DESC));
                sourceBuilder.sort(new FieldSortBuilder(secondary).order(SortOrder.DESC));
                when(mockContext.request()).thenReturn(mockRequest);
                when(mockRequest.source()).thenReturn(sourceBuilder);
                approx.setContext(mockContext);
                TopDocs exactHits  = s.search(exact, size, sort);
                TopDocs approxHits = s.search(approx, size, sort);
                for (int i = 0; i < size; i++) {
                    assertEquals("doc at pos=" + i + " differs!",
                        exactHits.scoreDocs[i].doc,
                        approxHits.scoreDocs[i].doc);
                }
            }
        }
    }

Now coming with the following Idea:

That is, if I'm sorting by timestamp descending and my 10,000th hit corresponds to a a document with time t, then (if there's a composite sort) I should continue collecting hits until I find a document with timestamp less than t. So, maybe instead of collecting 10,000 hits, I potentially collect 11,000. The good news is that it's probably still cheaper than running the full query that may match millions of hits.

I was able to add a logic and test with multiple sorts but I can see the performance impact. I updated the visit with iterator input to compare the value and if its different return as CELL_OUTSIDE_QUERY

                    @Override
                    public void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
                        if (matches(packedValue)) {
                            if (hasMultipleSorts && docCount[0] >= size) {
                                if (tieBreakValue == null) {
                                    tieBreakValue = packedValue.clone();
                                } else {
                                    if (comparator.compare(packedValue, 0, tieBreakValue, 0) != 0) {
                                        foundDifferentValue = true;
                                        return;
                                    }
                                }
                            }
                            adder.add(iterator);
                        }
                    }

                    @Override
                    public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
                        if (foundDifferentValue) {
                            return PointValues.Relation.CELL_OUTSIDE_QUERY;
                        }
                        return relate(minPackedValue, maxPackedValue);
                    }

Updated the intersectRight to test with desc sorts perforamce

            // custom intersect visitor to walk the right of tree (from rightmost leaf going left)
            public void intersectRight(PointValues.IntersectVisitor visitor, PointValues.PointTree pointTree, long[] docCount)
                throws IOException {
                if (docCount[0] >= size && hasMultipleSorts) {
                    PointValues.Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue());
                    if (r == PointValues.Relation.CELL_OUTSIDE_QUERY) {
                        return;
                    }
                } else if (docCount[0] >= size) {
                    return;
                }
                PointValues.Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue());
                if (r == PointValues.Relation.CELL_OUTSIDE_QUERY) {
                    return;
                }
                // Handle leaf nodes
                if (pointTree.moveToChild() == false) {
                    if (r == PointValues.Relation.CELL_INSIDE_QUERY) {
                        pointTree.visitDocIDs(visitor);
                    } else {
                        // CELL_CROSSES_QUERY
                        pointTree.visitDocValues(visitor);
                    }
                    return;
                }
                // Internal node - get left child reference (we're at left child initially)
                PointValues.PointTree leftChild = pointTree.clone();
                // Move to right child if it exists
                boolean hasRightChild = pointTree.moveToSibling();
                // For CELL_INSIDE_QUERY, check if we can skip left child
                if (!hasMultipleSorts && r == PointValues.Relation.CELL_INSIDE_QUERY && hasRightChild) {
                    long rightSize = pointTree.size();
                    long needed = size - docCount[0];
                    if (rightSize >= needed) {
                        // Right child has all we need - only process right
                        intersectRight(visitor, pointTree, docCount);
                        pointTree.moveToParent();
                        return;
                    }
                }
                if (hasMultipleSorts && docCount[0] >= size) {
                    if (hasRightChild) {
                        PointValues.Relation rightRelation = visitor.compare(
                            pointTree.getMinPackedValue(),
                            pointTree.getMaxPackedValue()
                        );
                        if (rightRelation != PointValues.Relation.CELL_OUTSIDE_QUERY) {
                            intersectRight(visitor, pointTree, docCount);
                        }
                    }
                    // Then check left child
                    PointValues.Relation leftRelation = visitor.compare(
                        leftChild.getMinPackedValue(),
                        leftChild.getMaxPackedValue()
                    );
                    if (leftRelation != PointValues.Relation.CELL_OUTSIDE_QUERY) {
                        intersectRight(visitor, leftChild, docCount);
                    }
                } else {
                    // Normal processing when we haven't reached size yet
                    if (hasRightChild) {
                        intersectRight(visitor, pointTree, docCount);
                    }
                    // Process left child
                    if (hasMultipleSorts || docCount[0] < size) {
                        intersectRight(visitor, leftChild, docCount);
                    }
                }
                pointTree.moveToParent();
            }

Tested nyc_taxis dataset and I can the following curl command took 959ms (too much tree traversal was done) vs without approximation is around 350ms

curl -X POST "localhost:9200/nyc_taxis/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": {}
  },
  "sort": [
    { "pickup_datetime": "desc" },
    { "passenger_count": "desc" }
  ]
}
'

@msfroh am I missing anything here as seeing this I feel there is performance impact when dealing with multiple sorts. In the flame graph I see too many tree traversals

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-07-29T23:47:10Z

✅ Gradle check result for 25f5b8b: SUCCESS

prudhvigodithi · 2025-07-30T00:09:04Z

I have tested few attempts and looks like there is a small complexity attempting to compare the packedValue with collected docs. The main issue is:

When handling multi-sort queries, the approximation needs to collect all documents with the same primary sort value as the last document in the result set. Document Counting Happens Everywhere: The docCount[0] is incremented in multiple visitor methods:
- visit(int docID) - increments by 1
- visit(IntsRef ref) - increments by ref.length
- visit(DocIdSetIterator iterator) - doesn't increment (this is because for desc sort even though we traverse from right to left, the docs inside the leaf nodes are in asc order, in order to retain highest docs (not just higher docs) and avoid again flipping and adding from backward we dont increment and directly add the iterator)
Value Access is Limited: The actual document values (packedValue) to compare and set the tie-break value are only available in the following methods which should only get called when we collecting a full leaf:
- visit(int docID, byte[] packedValue)
- visit(DocIdSetIterator iterator, byte[] packedValue)
When docCount[0] reaches size through the basic visit methods (without packedValue), the traversal logic stops because:
- docCount[0] >= size (we have "enough" docs)
- tieBreakValue == null (we haven't seen a value to set it)
- Now the condition docCount[0] < size || (tieBreakValue != null && !foundDifferentValue) evaluates to false and tree traversal stops before we can set the tie-break value.
Now without a tie-break value we can't properly collect all tied documents and the approximation returns incorrect results for multi-sort queries if we early terminate or force to traverse the entire tree which cause query latency and does not get the early termination advantage.

@msfroh I will create a separate issue with some proposals to support multiple sorts and since we are close to 3.2 release lets gets the bug fix merged to disable the multiple sorts WDYT?

Thanks

prudhvigodithi · 2025-07-30T16:16:35Z

@msfroh can I get your approval to move forward with this PR? Thanks

Signed-off-by: Prudhvi Godithi <[email protected]>

github-actions · 2025-07-30T19:08:56Z

❌ Gradle check result for 22e4ae4: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-07-30T19:49:36Z

❌ Gradle check result for 22e4ae4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-07-30T21:28:01Z

✅ Gradle check result for 22e4ae4: SUCCESS

…oject#18763) * Approximation Framework bug fix Signed-off-by: Prudhvi Godithi <[email protected]> * Add tests Signed-off-by: Prudhvi Godithi <[email protected]> * Fix spacing Signed-off-by: Prudhvi Godithi <[email protected]> --------- Signed-off-by: Prudhvi Godithi <[email protected]> Signed-off-by: sunqijun.jun <[email protected]>

…oject#18763) * Approximation Framework bug fix Signed-off-by: Prudhvi Godithi <[email protected]> * Add tests Signed-off-by: Prudhvi Godithi <[email protected]> * Fix spacing Signed-off-by: Prudhvi Godithi <[email protected]> --------- Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, VachaShah, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners July 15, 2025 20:56

prudhvigodithi requested a review from harshavamsi July 15, 2025 20:56

prudhvigodithi self-assigned this Jul 15, 2025

prudhvigodithi added this to Performance Roadmap Jul 15, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Jul 15, 2025

prudhvigodithi force-pushed the workflow-bug branch from 868db60 to 649efd6 Compare July 15, 2025 21:04

Approximation Framework bug fix

958cf7c

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi force-pushed the workflow-bug branch from 649efd6 to 958cf7c Compare July 15, 2025 21:05

prudhvigodithi changed the title ~~Disable Approximation when dealing with multiple sorts~~ Support Approximation with multiple sorts Jul 23, 2025

prudhvigodithi marked this pull request as draft July 23, 2025 16:58

prudhvigodithi changed the title ~~Support Approximation with multiple sorts~~ Support Approximation framework with multiple sorts Jul 23, 2025

prudhvigodithi moved this from Todo to In Progress in Performance Roadmap Jul 28, 2025

Fetch Upstream

25f5b8b

Signed-off-by: Prudhvi Godithi <[email protected]>

prudhvigodithi added the bug Something isn't working label Jul 30, 2025

prudhvigodithi changed the title ~~Support Approximation framework with multiple sorts~~ Disable Approximation when dealing with multiple sorts Jul 30, 2025

prudhvigodithi marked this pull request as ready for review July 30, 2025 00:11

Fix spacing

22e4ae4

Signed-off-by: Prudhvi Godithi <[email protected]>

opensearch-ci-bot mentioned this pull request Jul 30, 2025

[AUTOCUT] Gradle Check Flaky Test Report for TransferManagerRemoteDirectoryReaderTests #16676

Open

cwperks approved these changes Jul 31, 2025

View reviewed changes

cwperks merged commit 4b90da1 into opensearch-project:main Jul 31, 2025
32 of 37 checks passed

github-project-automation bot moved this from In Progress to Done in Performance Roadmap Jul 31, 2025

This was referenced Jul 31, 2025

[AUTOCUT] Gradle Check Flaky Test Report for IndexStatsIT #15836

Open

[AUTOCUT] Gradle Check Flaky Test Report for RemoteStoreIT #16145

Open

opensearch-ci-bot mentioned this pull request Aug 7, 2025

[AUTOCUT] Gradle Check Flaky Test Report for NodeJoinLeftIT #18972

Open

BrewTestBot mentioned this pull request Aug 20, 2025

opensearch 3.2.0 Homebrew/homebrew-core#234146

Merged

Disable Approximation when dealing with multiple sorts #18763

Disable Approximation when dealing with multiple sorts #18763

Uh oh!

Conversation

prudhvigodithi commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

prudhvigodithi commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

msfroh commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prudhvigodithi commented Jul 23, 2025

Uh oh!

prudhvigodithi commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prudhvigodithi commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

prudhvigodithi commented Jul 30, 2025

Uh oh!

prudhvigodithi commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

prudhvigodithi commented Jul 15, 2025 •

edited

Loading

prudhvigodithi commented Jul 15, 2025 •

edited

Loading

msfroh commented Jul 21, 2025 •

edited

Loading

prudhvigodithi commented Jul 23, 2025 •

edited

Loading

prudhvigodithi commented Jul 29, 2025 •

edited

Loading