-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add combined_fields query to utilize Lucene's CombinedField (BM25F Text) #18724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add combined_fields query to utilize Lucene's CombinedField (BM25F Text) #18724
Conversation
|
❌ Gradle check result for 3ea57e2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 7bbc948: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for ea81f54: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 74cd236: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Harrison Engel <[email protected]>
…om/harrisonengel/OpenSearch into dropbox/hengel/combinedFieldsQuery
|
❕ Gradle check result for 4189d76: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @harrisonengel, This is looking good to me.
@msfroh @prudhvigodithi would you guys like to take a pass here as you had prior context on combined fields.
Signed-off-by: Harrison Engel <[email protected]>
server/src/main/java/org/opensearch/index/query/CombinedFieldsQueryBuilder.java
Show resolved
Hide resolved
|
❌ Gradle check result for b080ba3: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for b080ba3: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
LGTM, while reviewing this example, I was initially curious why @harrisonengel also another reason to wrap in a |
Also my hands were kind of tied, Lucene stopped supporting multi-term CombinedFieldQuery.Builder recently-ish. Probably to promote this pattern since multi-term CombinedFieldsQuery is just a less flexible version of BooleanQuery with CombinedFieldQuery. |
…xt) (opensearch-project#18724) (cherry picked from commit 4c48e34)
Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]>
Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes #18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes #18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]> (cherry picked from commit 86ac3ab) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <[email protected]> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <[email protected]> --------- Signed-off-by: Atri Sharma <[email protected]>
Description
When querying across multiple text fields, for example a title and a body, you can often achieve better relevance by treating them as a single 'combined field' and scoring them by BM25F. Lucene provides the CombinedFieldQuery for this purpose. It effectively treats all fields as combined into one for matching and ranking purposes, where each field can be weighted higher or lower.
Related Issues
Resolves #3996
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the
best of my knowledge, is covered under an appropriate open
source license and I have the right under that license to
submit that work with modifications, whether created in whole
or in part by me, under the same open source license (unless
I am permitted to submit under a different license), as
Indicated in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including
all personal information I submit with it, including my
sign-off) is maintained indefinitely and may be redistributed
consistent with this project or the open source license(s)
involved.
Signed-off-by: Harrison Engel <[email protected], [email protected]>