Introduce compression and mode mapping parms#2019
Introduce compression and mode mapping parms#2019jmazanec15 merged 9 commits intoopensearch-project:mainfrom
Conversation
d0ae053 to
302006e
Compare
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com>
302006e to
722b7f1
Compare
src/main/java/org/opensearch/knn/index/mapper/CompressionLevel.java
Outdated
Show resolved
Hide resolved
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
src/main/java/org/opensearch/knn/index/mapper/CompressionLevel.java
Outdated
Show resolved
Hide resolved
| } | ||
| }); | ||
|
|
||
| protected final Parameter<String> mode = Parameter.restrictedStringParam( |
There was a problem hiding this comment.
Can we define these variable as static so that we don't need to create them for every instances?
There was a problem hiding this comment.
I don't think so it can be static. As these parameters are getting parsed. So for every mapper instance the values depend on the provided input.
There was a problem hiding this comment.
I thought @heemin32 meant just the last stream
| } | ||
|
|
||
| if (resolvedKNNMethodContext == null) { | ||
| if (originalParameters.getResolvedKnnMethodContext() == null) { |
There was a problem hiding this comment.
Just see if this make sense otherwise please ignore.
I think originalParameters is customer provided. Then, I would wrap it another another class.
InternalParameter internalParameters = new InternalParameter(originalParameter);
internalParameters.knnMethodContext();
Then, you can have some of separation between user provided parameters and internal parameters which might also provide better flexibility.
There was a problem hiding this comment.
I like this - but, might take in another review
| } | ||
| }); | ||
|
|
||
| protected final Parameter<String> mode = Parameter.restrictedStringParam( |
There was a problem hiding this comment.
I don't think so it can be static. As these parameters are getting parsed. So for every mapper instance the values depend on the provided input.
| put(KNNConstants.MODE_PARAMETER, modelMetadata.getMode().toString()); | ||
| } | ||
| if (CompressionLevel.isConfigured(modelMetadata.getCompressionLevel())) { | ||
| put(KNNConstants.COMPRESSION_LEVEL_PARAMETER, modelMetadata.getCompressionLevel().toString()); |
There was a problem hiding this comment.
@jmazanec15 if you are putting the enums in the objects, I would recommend going with value of the enum rather than toString. This will ensure that even when toString representation is changed value of enum is something that will be consistent and will ensure BWC.
There was a problem hiding this comment.
I was a little bit concerned here that we are ingesting the value into the system index and was concerned on having an object indexed. Do you think thats fine though?
| builder.field(KNNConstants.MODE_PARAMETER, mode.toString()); | ||
| } | ||
| if (CompressionLevel.isConfigured(compressionLevel)) { | ||
| builder.field(KNNConstants.COMPRESSION_LEVEL_PARAMETER, compressionLevel.toString()); |
There was a problem hiding this comment.
why not use ENUM.value here rather than toString.
There was a problem hiding this comment.
For toXContent method, this will be potentially exposed to user. So, I was thinking it would be better to return what they would put in
There was a problem hiding this comment.
In that case, we should have another parameter in enum which tells what is user provided value. having toString here is not what I would recommend
| * | ||
| * @return number of bits to represent a float at this compression level | ||
| */ | ||
| public int numBitsForFloat() { |
There was a problem hiding this comment.
?
| public int numBitsForFloat() { | |
| public int numBitsForFloat32() { |
| */ | ||
| @AllArgsConstructor | ||
| public enum CompressionLevel { | ||
| NOT_CONFIGURED(-1), |
There was a problem hiding this comment.
nit: be careful with -1 here, as long as its just for syntactic sugar and not being used in any computation we should be good
There was a problem hiding this comment.
Right. The value isnt exposed outside of this class and is handled below
| this.mode = Mode.NOT_CONFIGURED; | ||
| this.compressionLevel = CompressionLevel.NOT_CONFIGURED; |
There was a problem hiding this comment.
nit: could have defaulted to in_memory and 1x here. They are still valid values for before 2.17
There was a problem hiding this comment.
I prefer NOT_CONFIGURED just because I want to differentiate when someone actually supplies as config "1x" or "in_memory" and handle that case a bit differently.
There was a problem hiding this comment.
handle that case a bit differently
Would appreciate more context on this part. I was of the opinion it will execute the same code path as the current one
| + "\"<KNNEngine>,<SpaceType>,<Dimension>,<ModelState>,<Timestamp>,<Description>,<Error>,<NodeAssignment>,<MethodContext>\" or " | ||
| + "\"<KNNEngine>,<SpaceType>,<Dimension>,<ModelState>,<Timestamp>,<Description>,<Error>,<NodeAssignment>,<MethodContext>,<VectorDataType>\"." | ||
| + "\"<KNNEngine>,<SpaceType>,<Dimension>,<ModelState>,<Timestamp>,<Description>,<Error>,<NodeAssignment>,<MethodContext>,<VectorDataType>\". or " | ||
| + "\"<KNNEngine>,<SpaceType>,<Dimension>,<ModelState>,<Timestamp>,<Description>,<Error>,<NodeAssignment>,<MethodContext>,<VectorDataType>,<WorkloadConfig>,<CompressionConfig>\"." |
There was a problem hiding this comment.
nit: <Mode>,<CompressionLevel>
Yes. By having NotConfigured, we should be able to assume that the enum won't be a null for it to be useful. |
@navneet1v jmazanec15#1 (comment) Ideally we shouldn't have placeholders, I think we can get away with it unless I am missing something. We can revisit removing as long as its not a one way door (make sure we are not passing |
shatejas
left a comment
There was a problem hiding this comment.
Overall no concerns with functionality
- Please make sure NOT_CONFIGURED is not passed in streams for bwc in any case
ee6d75c to
c0e84bd
Compare
Signed-off-by: John Mazanec <jmazane@amazon.com>
c0e84bd to
f0896f2
Compare
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
d31ad23 to
f85826d
Compare
Signed-off-by: John Mazanec <jmazane@amazon.com>
9912899 to
a7c09f8
Compare
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit 920c819)
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit 920c819)
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit 920c819)
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Akash Shankaran <akash.shankaran@intel.com>
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution. Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility. Signed-off-by: John Mazanec <jmazane@amazon.com>
Description
Introduces new params for mapping and training, called compression_level and mode. These parameters are high level parameters that give the plugin a hint as to what the user wants to configure their system like without exposing algorithmic details. This change just adds these parameters to the plugin as noops. In future change, we will add the functionality for parameter resolution.
Along with this, I added a class to more easily manage the original parameters that a user passes. This will help ensure our mapper maintains good compatibility.
Adding tests now, but wanted to raise PR to get early feedback.
Related Issues
#1949
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.