Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport Main] Added more detailed error messages for KNN model training #2440

Conversation

anntians
Copy link
Contributor

Description

Previously, a consistent feedback we get around PQ and IVF is that there is limited visibility into the failure cases. Part of this is because the errors are thrown on the Faiss side and we don't return stack traces in Rest response. So, this makes it difficult to use PQ and IVF. Thus, this PR provides improved error messages by adding explicit checks for the most common errors:

[ ] For PQ, explicitly check in OpenSearch an invalid configuration where m does not divide dimension
[ ] For PQ/IVF, check the number of training points matches the minimum clustering criteria defined in faiss
[ ] If there is not enough memory, explicitly say that there is not enough memory.

Adding these 3 checks will cover 90% of the training failures that occur.

Related Issues

Resolves #2268

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

AnnTian Shao and others added 7 commits December 20, 2024 16:00
Signed-off-by: Tommy Shao <[email protected]>
Signed-off-by: AnnTian Shao <[email protected]>
…-project#2378)

* Add more detailed error messages for KNN model training

Signed-off-by: AnnTian Shao <[email protected]>

* Add validation check for training parameters in engine method abstraction

Signed-off-by: AnnTian Shao <[email protected]>

* Fixes for bwc and IT tests

Signed-off-by: AnnTian Shao <[email protected]>

---------

Signed-off-by: AnnTian Shao <[email protected]>
Co-authored-by: AnnTian Shao <[email protected]>
(cherry picked from commit 4058a53)
@anntians anntians force-pushed the backport/backport-2378-to-main branch from 6d7d62a to 783eed7 Compare January 25, 2025 18:01
Signed-off-by: AnnTian Shao <[email protected]>
@anntians anntians closed this Jan 25, 2025
@anntians anntians deleted the backport/backport-2378-to-main branch January 25, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error messaging/validation for faiss training
1 participant