Remove k1+1 from the numerator of  BM25Similarity [LUCENE-8563]

Our current implementation of BM25 does

```java
boost * IDF * (k1+1) * tf / (tf + norm)
```

As (k1+1) is a constant, it is the same for every term and doesn't modify ordering. It is often omitted and I found out that the "The Probabilistic Relevance Framework: BM25 and Beyond" paper by Robertson (BM25's author) and Zaragova even describes adding (k1+1) to the numerator as a variant whose benefit is to be more comparable with Robertson/Sparck-Jones weighting, which we don't care about.
> A common variant is to add a (k1 + 1) component to the
> numerator of the saturation function\. This is the same for all
> terms, and therefore does not affect the ranking produced\.
> The reason for including it was to make the final formula
> more compatible with the RSJ weight used on its own

Should we remove it from BM25Similarity as well?

A side-effect that I'm interested in is that integrating other score contributions (eg. via oal.document.FeatureField) would be a bit easier to reason about. For instance a weight of 3 in FeatureField#newSaturationQuery would have a similar impact as a term whose IDF is 3 (and thus docFreq \~= 5%) rather than a term whose IDF is 3/(k1 + 1).



---
Migrated from [LUCENE-8563](https://issues.apache.org/jira/browse/LUCENE-8563) by Adrien Grand (@jpountz), 1 vote, resolved Nov 30 2018
Linked issues:
 - [SOLR-13025](https://issues.apache.org/jira/browse/SOLR-13025)

Pull requests: https://github.com/apache/lucene-solr/pull/511


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove k1+1 from the numerator of BM25Similarity [LUCENE-8563] #9609

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove k1+1 from the numerator of BM25Similarity [LUCENE-8563] #9609

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions