Add faiss support in jni#28
Conversation
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
dblock
left a comment
There was a problem hiding this comment.
I did a cursory look paying more attention to the C++ code. I think you want to cleanup the allocation story to make sure nothing is leaking on exceptions, then we can re-review in detail.
Lack of C++ tests is a major problem. Not sure what people use today, Google tests, CPPUNIT or something, but feels like at least a beginning of something is needed, even if it's to test utility functions.
|
|
||
| // Train index if needed -- check if there is a case where index needs part trained but is trained | ||
| if(!indexWriter->is_trained) { | ||
| //TODO: What needs to be freed??? |
There was a problem hiding this comment.
Everything allocated should be wrapped in an auto_ptr or similar smart pointer construct to do things like env->DeleteLocalRef. I'd ensure this before this is merged.
There was a problem hiding this comment.
Makes sense. I will ensure all native allocations are wrapped in smart pointers. For calls to free references in jenv, I will work on some kind of wrapper to ensure everything is freed.
There was a problem hiding this comment.
So, DeleteLocalRef is more of an optimization. Local references will automatically be freed when jni call returns.
|
|
||
| // Write the index to disk | ||
| std::string indexPathCpp(ConvertJavaStringToCppString(env, indexPathJ)); | ||
| faiss::write_index(&idMap, indexPathCpp.c_str()); |
There was a problem hiding this comment.
Does this need CatchCppExceptionAndThrowJava(env); around the whole function?
There was a problem hiding this comment.
With the future goal of separating java specific functionality from native only functionality, I left this wrapper to the jni functions here: https://github.com/jmazanec15/k-NN-1/blob/faiss-initial/jni/src/org_opensearch_knn_index_JNIService.cpp
| for (int i = 0; i < indexBytesCount; i++) { | ||
| vectorIoReader.data.push_back((uint8_t) indexBytesJ[i]); | ||
| } | ||
| env->ReleaseByteArrayElements(templateIndexJ, indexBytesJ, 0); |
There was a problem hiding this comment.
Leaks if any of the above fails? indexBytesJ needs to be inside a smart pointer that calls ReleaseByteArrayElements when it's deleted.
There was a problem hiding this comment.
For this, I am not sure smart pointer is necessary. ReleaseByteArrayElements requires the Java object to be passed as the first argument. So, we would need to construct a wrapper that includes a reference to the Java object. Because the only code between allocation and release is appending to a vector, I think it is okay to keep this as a raw pointer.
|
|
||
| // if result is not enough, padded with -1s | ||
| int resultSize = kJ; | ||
| for(int i = 0; i < kJ; i++) { |
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Sounds good. I will work on that today.
Makes sense. I was debating whether or not to try to include C++ tests on this PR. I was thinking that because this is a development branch, this can be pushed back to a future PR so as not to add too many changes in one PR. What are your thoughts @dblock? |
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
| /** | ||
| * Service to interact with plugin's jni layer. Class dependencies should be minimal | ||
| * | ||
| * In order to compile C++ header file, run: |
There was a problem hiding this comment.
i remember that we have to comment KNNConstants to generate. is that still the case?
There was a problem hiding this comment.
No need to comment out. It will work as is.
| // loadIndex from different library | ||
| String engineName; | ||
| Map<String, Object> parameters; | ||
| if (indexPathUrl.endsWith(KNNEngine.NMSLIB.getExtension()) |
There was a problem hiding this comment.
shall we create a method "KNNEngine getEngineNameFromPath()", so that it will be easy to extend later? what do you think?
There was a problem hiding this comment.
I think that makes sense to add. Will add.
| throw new IllegalArgumentException("[KNN] Invalid engine type for path: " + indexPathUrl); | ||
| } | ||
|
|
||
| final long indexPointer = JNIService.loadIndex(indexPathUrl, parameters, engineName); |
There was a problem hiding this comment.
may be indexReference? i feel pointer is more of cpp jargon.
There was a problem hiding this comment.
I use pointer here because it is generated in the C++ layer and used to access meemory exclusively in C++ layer. So, the value it contains is in fact a valid pointer in C++ and cannot be used to access memory in Java.
|
|
||
| private KNNIndex getKnnIndex() { | ||
| return knnIndex; | ||
| private boolean isClosed() { |
There was a problem hiding this comment.
Do we need to have private getter method?
There was a problem hiding this comment.
This call is used exclusively in the cache. So private is fine. I guess I prefer to use getters as opposed to direct access to the members so that variables dont accidentally get changed, but I understand that this does not prevent a section of code in the cache from accessing directly a value and changing it, its more for convention. That being said, what are your thoughts?
There was a problem hiding this comment.
I understand. I prefer method than variable mostly outside the class. I guess it comes in preference category, you can keep it as it is.
| indices = knnIndexCache.getIndices(getHNSWPaths(searcher.getIndexReader()), getIndexName()); | ||
| } finally { | ||
| searcher.close(); | ||
| try (Engine.Searcher searcher = indexShard.acquireSearcher("knn-warmup")) { |
There was a problem hiding this comment.
FAR: shall we create constant?
There was a problem hiding this comment.
What does FAR stand for?
| if (idsJ == nullptr) { | ||
| throw std::runtime_error("IDs cannot be null"); | ||
| } | ||
|
|
||
| if (vectorsJ == nullptr) { | ||
| throw std::runtime_error("Vectors cannot be null"); | ||
| } | ||
|
|
||
| if (indexPathJ == nullptr) { | ||
| throw std::runtime_error("Index path cannot be null"); | ||
| } | ||
|
|
||
| if (parametersJ == nullptr) { | ||
| throw std::runtime_error("Parameters cannot be null"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Will these exceptions crash the jvm as these are cpp Exceptions. My understanding Java does not understand these exceptions and crash? If this crashes JVM, then we may need to type case to Java Exception. Same comment for all other places where we throw cpp Exceptions.
There was a problem hiding this comment.
We wrap all of the calls to wrappers in https://github.com/jmazanec15/k-NN-1/blob/faiss-initial/jni/src/org_opensearch_knn_index_JNIService.cpp#L20. So, this will ensure that exception is handled correctly. Additionally, some of the tests test exceptions.
| } | ||
| ); | ||
| } catch (Exception ex) { | ||
| throw new RuntimeException("Unable to query the index: " + ex); |
There was a problem hiding this comment.
Minor: Add indexName to the Exception message to help in debugging.
| KNNConstants.HNSW_ALGO_EF_SEARCH, KNNSettings.getEfSearchParam(indexName) | ||
| ); | ||
| } else { | ||
| throw new IllegalArgumentException("[KNN] Invalid engine type for path: " + indexPathUrl); |
There was a problem hiding this comment.
indexPathUrl should be hidden to the users. May be just have indexName in the message. We can
indexPathUrl to a debug log for debugging.
| readLock.lock(); | ||
| try { | ||
| if (knnIndexCacheEntry.isClosed()) { | ||
| throw new IOException("Index is already closed"); |
There was a problem hiding this comment.
Minor: Add indexName, engineName to the message. If possible add spaceType as well. You may want to check other Exception places as well and see if we can add information to the error messages to help debug.
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
|
I decided to refactor jni utils into an interface so that neither wrappers make calls directly to the JNIEnv. This will make it easier to add unit tests because we will just be able to mock the interface. This comes at a minimal performance cost of 1 extra function call, but, when compared to the other operations, this cost is negligible. |
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu> Signed-off-by: Martin Gaievski <gaievski@amazon.com>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Signed-off-by: Jack Mazanec <jmazane1@nd.edu>
Description
This change refactors the jni so that it can support both nmslib and faiss in a single library. The interface revolves around a new class, JNIService, which includes several methods to make calls to the jni for creating indices, training indices, and searching indices. The interface is intended to be future proof (i.e. it will not change if we add a new library).
At the moment, we do not have the functionality to test the native code directly (i.e. for memory leaks). This can be done in the future using Google tests. However, mocking JNI specific components is very tricky. We may consider separating jni components from the rest of the native code so that we can use Google tests to test the rest of the native code.
Included changes in this PR:
JNIServiceThis PR DOES NOT add faiss support to the OpenSearch interface. This will be done in a future PR.
Issues Resolved
#29
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.