Add index for storing template models#34
Add index for storing template models#34jmazanec15 merged 21 commits intoopensearch-project:faiss-developfrom
Conversation
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
| public static final String MODEL_BLOB_PARAMETER = "model_blob"; | ||
|
|
||
| public static final String MODEL_INDEX_MAPPING_PATH = "mappings/model-index.json"; | ||
| public static final String MODEL_INDEX_NAME = ".knn-model-index"; |
There was a problem hiding this comment.
Should the file name start with opensearch? May be confirm other plugins system indices naming convention with openSearch changes.
There was a problem hiding this comment.
In Conventions, the only requirement is prefixed with ".".
That being said, I think ".opensearch-knn-model-index" is good. I will update.
| Setting.Property.NodeScope, | ||
| Setting.Property.Dynamic); | ||
|
|
||
| public static final Setting<Integer> MODEL_INDEX_NUMBER_OF_REPLICAS_SETTING = Setting.intSetting( |
| * @throws IOException thrown when get mapping fails | ||
| */ | ||
| public void create(ActionListener<CreateIndexResponse> actionListener) throws IOException { | ||
| if (isCreated()) { |
There was a problem hiding this comment.
Should we log message here and return?
There was a problem hiding this comment.
I am a little worried this may blow up the logs if isCreated is not called by caller. Id prefer not to log here.
|
|
||
| private String getMapping() throws IOException { | ||
| URL url = ModelIndex.class.getClassLoader().getResource(MODEL_INDEX_MAPPING_PATH); | ||
| assert url != null; |
There was a problem hiding this comment.
asserts are sometimes disabled on production hosts. Can we do manual check and throw exception?
| Object blob = getResponse.getSourceAsMap().get(KNNConstants.MODEL_BLOB_PARAMETER); | ||
|
|
||
| if (blob == null) { | ||
| throw new IllegalArgumentException("ModelID: \"" + modelId + "\" is not present in index"); |
There was a problem hiding this comment.
Should this message be more user focused? Something like There is no model available with the provided Id?
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Description
This PR adds a hidden model index that will be used to stored serialized template indices used during index creation. Some of faiss's index types require a training step before indexing can begin. In order to support these index types, we need a way for a user to train a model index template and serialize it. This template will then be retrieved during segment creation to initialize the faiss index.
The mapping is fairly straightforward:
The engine is required to identify which engine the model is intended to be used with. The model_blob stores the binary representation of the model.
1 model maps to 1 OpenSearch document in this index. models are identified by the OpenSearch id field. The index allows users to provide a custom id or allow OpenSearch to generate one for them. Documents cannot be updated in this index, however, they can be removed.
For implementing the index, I referred to:
Additionally, I added test cases for each operation.
Note -- changes related to jni can be ignored as they are reviewed in #28. The files to be reviewed are:
Issues Resolved
#27
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.