Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support script score when doc value is disabled and fix misusing DISI #1696

Merged
merged 3 commits into from
May 20, 2024

Conversation

bugmakerrrrrr
Copy link
Contributor

Description

The function was initially introduced in #1573, but was later reverted due to #1647. The root cause of the issue can be found in this comment. This PR reintroduces the function and fixes misusing DISI.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr can you add the entry for this feature in changelog.

@navneet1v
Copy link
Collaborator

@bugmakerrrrrr couple of things:

  1. Can you add PR comment on the piece code that you have added to fix the bug. This will help me review the PR faster.
  2. Did we add the IT which replicate the bug scenario to see the bug is not happening again. If possible can you add a comment in the PR for that too.

@ryanbogan
Copy link
Member

Code looks good to me if we can add that integration test. For the IT, the exact scenario was two vectors with data ingested, then run script score on either.

@navneet1v
Copy link
Collaborator

Code looks good to me if we can add that integration test. For the IT, the exact scenario was two vectors with data ingested, then run script score on either.

can you paste the query and the index mapping @ryanbogan for better understanding

private final String fieldName;
@Getter
private final VectorDataType vectorDataType;
private boolean docExists = false;
private int lastDocID = -1;

@Override
public void setNextDocId(int docId) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method fix the bug in #1647

@@ -669,7 +689,8 @@ private void testKNNScriptScore(SpaceType spaceType) throws Exception {
final float[] queryVector = randomVector(dims);
final BiFunction<float[], float[], Float> scoreFunction = getScoreFunction(spaceType, queryVector);
for (String mapper : createMappers(dims)) {
createIndexAndAssertScriptScore(mapper, spaceType, scoreFunction, dims, queryVector);
createIndexAndAssertScriptScore(mapper, spaceType, scoreFunction, dims, queryVector, true);
createIndexAndAssertScriptScore(mapper, spaceType, scoreFunction, dims, queryVector, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue in #1647 can be replicated using the non-dense test case.

ryanbogan
ryanbogan previously approved these changes May 13, 2024
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution!

jmazanec15
jmazanec15 previously approved these changes May 15, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmazanec15
Copy link
Member

@bugmakerrrrrr can you fix conflict? Also, I have some good benchmarks for script scoring between lucene and plugin dv, Ill share those over on the issue.

@bugmakerrrrrr
Copy link
Contributor Author

outcome

@bugmakerrrrrr can you fix conflict? Also, I have some good benchmarks for script scoring between lucene and plugin dv, Ill share those over on the issue.

Fixed. Expecting to see these results.

Signed-off-by: panguixin <[email protected]>
@jmazanec15
Copy link
Member

@bugmakerrrrrr Posted here: #1709.

@jmazanec15 jmazanec15 requested a review from ryanbogan May 20, 2024 21:29
@jmazanec15 jmazanec15 merged commit 4d59d4c into opensearch-project:main May 20, 2024
49 of 51 checks passed
@jmazanec15 jmazanec15 added backport 2.x Enhancements Increases software capabilities beyond original client specifications labels May 20, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 20, 2024
…#1696)

* Revert "Revert 'Support script score when doc value is disabled' (#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>
(cherry picked from commit 4d59d4c)
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request May 22, 2024
…opensearch-project#1696)

* Revert "Revert 'Support script score when doc value is disabled' (opensearch-project#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request May 22, 2024
…opensearch-project#1696)

* Revert "Revert 'Support script score when doc value is disabled' (opensearch-project#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>
navneet1v added a commit that referenced this pull request Jun 1, 2024
* Fix flaky test in Faiss JNI range search (#1705)

Signed-off-by: Junqiu Lei <[email protected]>

* Support script score when doc value is disabled and fix misusing DISI (#1696)

* Revert "Revert 'Support script score when doc value is disabled' (#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>

* --- (#1712)

updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update threshold value after new result is added (#1715)

Signed-off-by: Heemin Kim <[email protected]>

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search (#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Add validation for pq m parameter before training starts (#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Updating the BWC test config after 2.14 release (#1724)

Signed-off-by: Navneet Verma <[email protected]>

---------

Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: panguixin <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Co-authored-by: Junqiu Lei <[email protected]>
Co-authored-by: panguixin <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Heemin Kim <[email protected]>
Co-authored-by: Ryan Bogan <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants