Skip to content

Conversation

@harrisonengel
Copy link
Contributor

Description

When querying across multiple text fields, for example a title and a body, you can often achieve better relevance by treating them as a single 'combined field' and scoring them by BM25F. Lucene provides the CombinedFieldQuery for this purpose. It effectively treats all fields as combined into one for matching and ranking purposes, where each field can be weighted higher or lower.

Related Issues

Resolves #3996

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the
best of my knowledge, is covered under an appropriate open
source license and I have the right under that license to
submit that work with modifications, whether created in whole
or in part by me, under the same open source license (unless
I am permitted to submit under a different license), as
Indicated in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including
all personal information I submit with it, including my
sign-off) is maintained indefinitely and may be redistributed
consistent with this project or the open source license(s)
involved.

Signed-off-by: Harrison Engel <[email protected], [email protected]>

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request feature New feature or request help wanted Extra attention is needed Search Search query, autocomplete ...etc labels Jul 10, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 3ea57e2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 7bbc948: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for ea81f54: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 74cd236: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❕ Gradle check result for 4189d76: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@prudhvigodithi prudhvigodithi self-requested a review July 16, 2025 22:26
Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harrisonengel, This is looking good to me.
@msfroh @prudhvigodithi would you guys like to take a pass here as you had prior context on combined fields.

@github-actions
Copy link
Contributor

❌ Gradle check result for b080ba3: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for b080ba3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@prudhvigodithi
Copy link
Member

LGTM, while reviewing this example, I was initially curious why CombinedFieldsQueryBuilder (part of this PR) uses a BooleanQuery internally, rather than adding multiple terms directly to a single CombinedFieldQuery.Builder using addTerm. Since Lucene's CombinedFieldQuery supports multiple terms. However, it now makes sense creating a separate CombinedFieldQuery for each term and wrapping them in a BooleanQuery as it provides explicit operator control making it easy to switch between AND and OR choice.

@harrisonengel also another reason to wrap in a BooleanQuery is to have also have per-term boosting using the boost field and also support of minimum_should_match?

@github-actions
Copy link
Contributor

✅ Gradle check result for b080ba3: SUCCESS

@harrisonengel
Copy link
Contributor Author

@harrisonengel also another reason to wrap in a BooleanQuery is to have also have per-term boosting using the boost field and also support of minimum_should_match?

Also my hands were kind of tied, Lucene stopped supporting multi-term CombinedFieldQuery.Builder recently-ish. Probably to promote this pattern since multi-term CombinedFieldsQuery is just a less flexible version of BooleanQuery with CombinedFieldQuery.

@mch2 mch2 merged commit 4c48e34 into opensearch-project:main Jul 17, 2025
33 of 37 checks passed
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Jul 21, 2025
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Jul 22, 2025
mch2 pushed a commit to mch2/OpenSearch that referenced this pull request Jul 22, 2025
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
atris added a commit to atris/OpenSearch that referenced this pull request Aug 8, 2025
Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 8, 2025
Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>
rishabhmaurya pushed a commit that referenced this pull request Aug 11, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes #18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 11, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes #18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
(cherry picked from commit 86ac3ab)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
RajatGupta02 pushed a commit to RajatGupta02/OpenSearch that referenced this pull request Aug 18, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
* Fix flaky ExistsQueryBuilderTests.testToQuery

Test was generating field patterns that matched raw.derived_keyword,
which doesnt support exists queries. Fixed by replacing problematic
patterns with TEXT_FIELD_NAME.

Fixes opensearch-project#18724

Signed-off-by: Atri Sharma <[email protected]>

* Revert CHANGELOG changes

Signed-off-by: Atri Sharma <[email protected]>

---------

Signed-off-by: Atri Sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request feature New feature or request help wanted Extra attention is needed lucene Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for combined_fields (BM25F)

4 participants