Skip to content

Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.#12526

Closed
jpountz wants to merge 6 commits intoapache:mainfrom
jpountz:bootstrap_min_score
Closed

Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.#12526
jpountz wants to merge 6 commits intoapache:mainfrom
jpountz:bootstrap_min_score

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 29, 2023

Currently, our dynamic pruning logic for disjunctions updates the minimum competitive score as it sees more and more competitive hits. However this process can take time if some of the high-scoring clauses don't have many hits, or are very sparse at the beginning of the doc ID space. It is possible to do better by trying to estimate a lower bound of the score of the k-th top hit up-front in order to bootstrap the minimum competitive score to a value that will immediately enable efficient dynamic pruning.

The proposed approach computes this initial minimum score by only using clauses that have not evaluated 2*k hits yet to drive iteration.

…th top hit up-front.

Currently, our dynamic pruning logic for disjunctions updates the minimum
competitive score as it sees more and more competitive hits. However this
process can take time if some of the high-scoring clauses don't have many hits,
or are very sparse at the beginning of the doc ID space. It is possible to do
better by trying to estimate a lower bound of the score of the k-th top hit
up-front in order to bootstrap the minimum competitive score to a value that
will immediately enable efficient dynamic pruning.

The proposed approach computes this initial minimum score by only using clauses
that have not evaluated k hits yet to drive iteration.
@jpountz
Copy link
Contributor Author

jpountz commented Aug 29, 2023

Here are results on wikimedium10m. OrHighHigh and OrHighMed don't get a speedup because their minimum competitive scores compute pretty quickly anyway, but OrHighHigh sees a major speedup:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      AndHighLow     1452.84      (2.0%)     1410.03      (3.8%)   -2.9% (  -8% -    2%) 0.017
                          Fuzzy1       98.38      (1.6%)       96.51      (1.1%)   -1.9% (  -4% -    0%) 0.001
            HighIntervalsOrdered        6.24      (5.8%)        6.15      (4.4%)   -1.4% ( -11% -    9%) 0.494
                      OrHighHigh       61.69      (6.4%)       60.87      (5.4%)   -1.3% ( -12% -   11%) 0.585
             MedIntervalsOrdered       44.82      (5.1%)       44.23      (3.9%)   -1.3% (  -9% -    8%) 0.476
             LowIntervalsOrdered       57.23      (5.4%)       56.48      (4.0%)   -1.3% ( -10% -    8%) 0.497
                       OrHighMed      190.42      (3.8%)      188.10      (3.8%)   -1.2% (  -8% -    6%) 0.430
                      AndHighMed      236.92      (4.1%)      234.25      (4.1%)   -1.1% (  -8% -    7%) 0.500
                    OrHighNotMed      425.77      (6.6%)      421.99      (5.2%)   -0.9% ( -11% -   11%) 0.715
                         MedTerm      788.26      (7.2%)      781.68      (3.4%)   -0.8% ( -10% -   10%) 0.716
                   OrHighNotHigh      317.53      (6.6%)      314.90      (5.5%)   -0.8% ( -12% -   12%) 0.738
                        HighTerm      593.70      (7.6%)      589.22      (3.9%)   -0.8% ( -11% -   11%) 0.760
                          Fuzzy2       73.16      (1.3%)       72.68      (1.3%)   -0.7% (  -3% -    1%) 0.206
                   OrNotHighHigh      413.61      (6.0%)      411.20      (5.2%)   -0.6% ( -11% -   11%) 0.798
                       LowPhrase       43.15      (2.9%)       42.90      (1.4%)   -0.6% (  -4% -    3%) 0.526
                    OrNotHighMed      425.13      (4.4%)      422.86      (3.3%)   -0.5% (  -7% -    7%) 0.735
                HighSloppyPhrase       12.59      (4.7%)       12.53      (5.6%)   -0.5% ( -10% -   10%) 0.808
                     LowSpanNear       28.72      (2.1%)       28.57      (2.2%)   -0.5% (  -4% -    3%) 0.559
                    OrHighNotLow      475.44      (7.1%)      473.03      (5.2%)   -0.5% ( -11% -   12%) 0.841
                        PKLookup      245.49      (3.5%)      244.36      (3.8%)   -0.5% (  -7% -    7%) 0.759
                 LowSloppyPhrase       67.32      (2.7%)       67.06      (2.8%)   -0.4% (  -5% -    5%) 0.730
                         LowTerm     1124.64      (6.8%)     1120.58      (3.5%)   -0.4% (  -9% -   10%) 0.870
                        Wildcard      172.10      (2.7%)      171.49      (2.4%)   -0.4% (  -5% -    4%) 0.735
                       MedPhrase       59.34      (3.1%)       59.16      (1.4%)   -0.3% (  -4% -    4%) 0.765
           HighTermDayOfYearSort      457.23      (1.2%)      456.10      (1.2%)   -0.2% (  -2% -    2%) 0.611
                     MedSpanNear       29.71      (3.0%)       29.64      (2.6%)   -0.2% (  -5% -    5%) 0.859
                    OrNotHighLow     1283.05      (2.7%)     1282.59      (1.9%)   -0.0% (  -4% -    4%) 0.971
               HighTermMonthSort     4728.97      (2.8%)     4729.28      (1.9%)    0.0% (  -4% -    4%) 0.995
                     AndHighHigh       63.31      (4.8%)       63.31      (4.8%)    0.0% (  -9% -   10%) 0.997
                         Prefix3      346.29      (4.3%)      346.43      (3.9%)    0.0% (  -7% -    8%) 0.980
                      TermDTSort      192.60      (1.1%)      192.76      (0.9%)    0.1% (  -1% -    2%) 0.830
                         Respell       96.59      (1.6%)       96.73      (1.3%)    0.2% (  -2% -    3%) 0.798
               HighTermTitleSort      161.71      (3.7%)      162.22      (4.8%)    0.3% (  -7% -    9%) 0.860
            HighTermTitleBDVSort       15.55      (3.3%)       15.61      (2.5%)    0.4% (  -5% -    6%) 0.748
                      HighPhrase       97.18      (4.1%)       97.75      (2.1%)    0.6% (  -5% -    7%) 0.659
                    HighSpanNear        6.86      (6.0%)        6.91      (6.6%)    0.8% ( -11% -   14%) 0.754
                 MedSloppyPhrase       38.05      (5.0%)       38.38      (4.0%)    0.9% (  -7% -   10%) 0.646
                          IntNRQ       94.63     (21.2%)       98.52     (21.0%)    4.1% ( -31% -   58%) 0.634
                       OrHighLow      430.69      (5.4%)      617.92      (4.7%)   43.5% (  31% -   56%) 0.000

@jpountz
Copy link
Contributor Author

jpountz commented Aug 30, 2023

I added a few tasks that I'm adding here for reference to see how it plays with disjunctions that have more terms or different document frequencies:

OrHighVeryLow: 2005 mousehole # freq=835460 freq=123
OrHighVeryLow: until motorboats # freq=425389 freq=128
OrHighVeryLow: made monceau # freq=742313 freq=126
OrHighVeryLow: do bush's # freq=511178 freq=2681
OrHighVeryLow: 10 mikup # freq=918339 freq=119
OrHighMedLow: international chris valois
OrHighMedLow: right million universalist
OrHighMedLow: known created forays
OrHighMedLow: its universal bush's
OrHighMedLow: 9 network racedetail.html
OrHighHighHigh: 2005 until made
OrHighHighHigh: do 10 international
OrHighHighHigh: right known its
OrHighHighHigh: until 10 known
OrHighHighHigh: made international its
OrHighMedMed: international chris million
OrHighMedMed: right million created
OrHighMedMed: known created universal
OrHighMedMed: its universal network
OrHighMedMed: 9 network chris
OrHighHighLow: several following valois
OrHighHighLow: publisher end universalist
OrHighHighLow: 2009 film forays
OrHighHighLow: http known bush's
OrHighHighLow: south county racedetail.html
OrHighHighMed: international right million
OrHighHighMed: right known created
OrHighMighMed: known its universal
OrHighHighMed: its 9 network
OrHighHighMed: 9 international chris
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    OrHighMedMed      158.53      (3.6%)      155.92      (4.4%)   -1.7% (  -9% -    6%) 0.193
                  OrHighHighHigh       53.97      (5.0%)       53.13      (4.9%)   -1.6% ( -10% -    8%) 0.324
                   OrHighHighMed      106.81      (4.0%)      105.37      (4.3%)   -1.3% (  -9% -    7%) 0.306
                      OrHighHigh       64.42      (5.6%)       63.64      (4.0%)   -1.2% ( -10% -    8%) 0.433
                   OrHighMighMed      201.12      (3.7%)      198.74      (3.5%)   -1.2% (  -8% -    6%) 0.298
                    OrHighMedLow      323.10      (3.7%)      319.32      (4.2%)   -1.2% (  -8% -    6%) 0.349
                       OrHighMed      227.13      (3.9%)      225.41      (3.0%)   -0.8% (  -7% -    6%) 0.487
                        HighTerm      652.70      (4.2%)      659.51      (5.3%)    1.0% (  -8% -   11%) 0.491
                        PKLookup      248.57      (3.4%)      251.38      (1.9%)    1.1% (  -4% -    6%) 0.198
                         MedTerm     1060.67      (4.5%)     1076.33      (5.4%)    1.5% (  -8% -   11%) 0.350
                         LowTerm     1639.65      (7.0%)     1667.48      (4.9%)    1.7% (  -9% -   14%) 0.377
                   OrHighVeryLow      172.35      (8.2%)      196.54      (8.4%)   14.0% (  -2% -   33%) 0.000
                   OrHighHighLow      449.76      (3.0%)      633.61      (3.5%)   40.9% (  33% -   48%) 0.000
                       OrHighLow      546.08      (5.4%)     1187.98      (5.1%)  117.5% ( 101% -  135%) 0.000

While it tends to help queries that are already fast, it also helped OrHighVeryLow above, which is not among the fastest. I also like that none of the queries is getting a major slowdown.

@msokolov
Copy link
Contributor

OrHighHigh sees a major speedup:

I think you meant OrHighLow, which is indeed very nicely improved

@jpountz
Copy link
Contributor Author

jpountz commented Aug 30, 2023

Oops, yes indeed OrHighLow.

@mikemccand
Copy link
Member

Wow, impressive! Maybe we should add OrHighVeryLow to nightly benchy too?

@jpountz
Copy link
Contributor Author

jpountz commented Sep 11, 2023

We could. These tasks are a bit malicious as the doc freq is slightly greater than the value of k=100 so it takes lots of collected matches to find k documents that have this term. I suspect that another interesting value for the document frequency is when it is a bit less than k.

I still need to figure out a way to avoid referencing readers in weight, I think we had issues with that in the past though I can't remember exactly what the issue was.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 14, 2023

FYI there was an interesting observation on another benchmark that took advantage of recursive graph bisection: https://jpountz.github.io/lucene-9.7-vs-9.8/. One query (the incredibles) became more than 7x (!) slower because recursive graph bisection had moved matches of the term with the highest score weight towards the end of the doc ID space. This should get addressed by a change like this PR.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 22, 2023

Maybe we should add OrHighVeryLow to nightly benchy too?

@mikemccand I started looking into this, but my enwiki (enwiki-20120502-lines-with-random-label.txt) seems to have slightly different frequencies compared to frequencies reported in wikinightly.tasks, are nightly benchmarks using the same export or a different one? I think it could make sense to have two new tasks OrHighLow110 where the low-frequency term always has a frequency of 110 >k and OrHighLow90 where the low-frequency term always has a frequency of 90<k. These two cases are interesting because in one case it takes very long to collect k matches of the highest scoring clause, and in the other case this never happens.

@jpountz jpountz marked this pull request as ready for review September 22, 2023 10:32
@jpountz
Copy link
Contributor Author

jpountz commented Nov 1, 2023

@mikemccand FYI I gave a try at adding some interesting boolean queries to nightly benchmarks at mikemccand/luceneutil#240.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Jan 8, 2024

I'll reopen when I have time to get back to this, this could be a useful optimization, though the benefit has become lower thanks to other optimizations to disjunctions.

@jpountz jpountz closed this Jan 8, 2024
@mikemccand
Copy link
Member

Maybe we should add OrHighVeryLow to nightly benchy too?

@mikemccand I started looking into this, but my enwiki (enwiki-20120502-lines-with-random-label.txt) seems to have slightly different frequencies compared to frequencies reported in wikinightly.tasks, are nightly benchmarks using the same export or a different one? I think it could make sense to have two new tasks OrHighLow110 where the low-frequency term always has a frequency of 110 >k and OrHighLow90 where the low-frequency term always has a frequency of 90<k. These two cases are interesting because in one case it takes very long to collect k matches of the highest scoring clause, and in the other case this never happens.

Very late answer (sorry!): hmm indeed the frequencies reported in those task files (as comments) are likely from a different (older?) enwiki snapshot. It looks like you muscled through this and added the new atsks to nightly tasks, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants