Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for pq m parameter before training starts #1713

Merged
merged 15 commits into from
May 30, 2024

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented May 21, 2024

Description

Currently, if the dimension of a model is not divisible by the pq m parameter (code count) provided in the training request, training fails with a detailed message in the logs. In addition, the model state is update to failed and the model error is set to "Failed to execute training. May be caused by an invalid method definition or not enough memory to perform training.".

This PR adds a validation check for the above case before training starts or any memory is allocated. The model state is still set to failed, but the error message is more specific.

Issues Resolved

#1075

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ryanbogan added 7 commits May 20, 2024 11:27
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
);
}

ValidationException methodValidation = methodComponent.validateWithData(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to validate further even if we don't support space type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though validation will ultimately fail if space type is not supported, more error messages can be added to the errors based on any potential problems with the method component context

Copy link

codecov bot commented May 28, 2024

Codecov Report

Attention: Patch coverage is 78.57143% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 84.96%. Comparing base (9a52b2b) to head (844f1ef).
Report is 6 commits behind head on main.

Current head 844f1ef differs from pull request most recent head a12022b

Please upload reports for the commit a12022b to get more accurate results.

Files Patch % Lines
.../main/java/org/opensearch/knn/index/Parameter.java 77.61% 12 Missing and 3 partials ⚠️
.../main/java/org/opensearch/knn/index/KNNMethod.java 68.75% 4 Missing and 1 partial ⚠️
...java/org/opensearch/knn/index/MethodComponent.java 88.23% 1 Missing and 1 partial ⚠️
...main/java/org/opensearch/knn/index/util/Faiss.java 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1713      +/-   ##
============================================
- Coverage     85.02%   84.96%   -0.07%     
- Complexity     1463     1486      +23     
============================================
  Files           178      178              
  Lines          5898     6026     +128     
  Branches        598      626      +28     
============================================
+ Hits           5015     5120     +105     
- Misses          632      649      +17     
- Partials        251      257       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -109,9 +109,6 @@ class Faiss extends NativeLibrary {
.build()
);

// TODO: To think about in future: for PQ, if dimension is not divisible by code count, PQ will fail. Right now,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - finally getting rid of it

@ryanbogan ryanbogan merged commit 3701d19 into opensearch-project:main May 30, 2024
54 of 57 checks passed
@ryanbogan ryanbogan deleted the pq_validation branch May 30, 2024 19:57
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 30, 2024
* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 3701d19)
ryanbogan added a commit that referenced this pull request May 30, 2024
* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 3701d19)

Co-authored-by: Ryan Bogan <[email protected]>
navneet1v added a commit that referenced this pull request Jun 1, 2024
* Fix flaky test in Faiss JNI range search (#1705)

Signed-off-by: Junqiu Lei <[email protected]>

* Support script score when doc value is disabled and fix misusing DISI (#1696)

* Revert "Revert 'Support script score when doc value is disabled' (#1662)"

This reverts commit bd2f403.

Signed-off-by: panguixin <[email protected]>

* fix misusing doc value

Signed-off-by: panguixin <[email protected]>

* add changelog

Signed-off-by: panguixin <[email protected]>

---------

Signed-off-by: panguixin <[email protected]>

* --- (#1712)

updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update threshold value after new result is added (#1715)

Signed-off-by: Heemin Kim <[email protected]>

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search (#1699)

* Use the Lucene Distance Calculation Function in Script Scoring for doing exact search

Signed-off-by: Ryan Bogan <[email protected]>

* Add Changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Fix failing test

Signed-off-by: Ryan Bogan <[email protected]>

* fix test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix test bug and remove unnecessary validation

Signed-off-by: Ryan Bogan <[email protected]>

* Remove cosineSimilOptimized

Signed-off-by: Ryan Bogan <[email protected]>

* Revert "Remove cosineSimilOptimized"

This reverts commit f872d83.

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Add validation for pq m parameter before training starts (#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>

* Updating the BWC test config after 2.14 release (#1724)

Signed-off-by: Navneet Verma <[email protected]>

---------

Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: panguixin <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Heemin Kim <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Navneet Verma <[email protected]>
Co-authored-by: Junqiu Lei <[email protected]>
Co-authored-by: panguixin <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Heemin Kim <[email protected]>
Co-authored-by: Ryan Bogan <[email protected]>
Co-authored-by: Navneet Verma <[email protected]>
shatejas pushed a commit to shatejas/k-NN that referenced this pull request Jun 26, 2024
…project#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request Jul 7, 2024
…project#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request Jul 7, 2024
…project#1713)

* Add validation for pq code count before training starts

Signed-off-by: Ryan Bogan <[email protected]>

* Add integration test

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Clean up code

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unnecessary lines

Signed-off-by: Ryan Bogan <[email protected]>

* Add changelog entry

Signed-off-by: Ryan Bogan <[email protected]>

* Change framework to add validation with data

Signed-off-by: Ryan Bogan <[email protected]>

* Remove unused error message

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change space type check name for readability

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Modify validation error wording and add json structure to tests

Signed-off-by: Ryan Bogan <[email protected]>

* Change TrainingDataSpec to VectorSpaceInfo

Signed-off-by: Ryan Bogan <[email protected]>

* Add unit tests

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants