Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]The exception is unfriendly(Mapper parsing exception) when k-NN circuit breaker is triggered #1424

Closed
heemin32 opened this issue Jan 26, 2024 · 4 comments
Assignees
Labels
bug Something isn't working v2.15.0

Comments

@heemin32
Copy link
Collaborator

heemin32 commented Jan 26, 2024

What is the bug?
During bulk indexing, when k-NN circuit breaker is triggered, client receive mapper_parsing_exception. This makes user hard to understand the actual root cause of the exception until they look in to a log file.

How can one reproduce the bug?

"type": "knn_vector",
"dimension": 256,
"method": {
    "name": "hnsw",
    "engine": "faiss",
    "space_type": "innerproduct",
    "parameters": {
        "ef_construction": 100,
        "m": 16
    }
}

Do bulk indexing of documents

Error message from client side
failed to parse field [passageVector] of type [knn_vector] in document with id 'asdfasdf'. mapper_parsing_exception'
Preview of fields's value: '[1.23232, 2.123232. ...]', statusCode: 400, type: mapper_parsing_exception
Log
MapperParsingException[failed to parse field [passageVector] of type [knn_vector] in document with id 'asdfasdf'. Preview of field's value: '[1.23232, 2.123232, ...]']; nested: IllegalStateException[Indexing knn vector fields is rejected as circuit breaker triggered. Check _opendistro/_knn/stats for detailed state];

What is the expected behavior?
It should provide the root cause of the exception, k-NN circuit breaker in client side error message

@navneet1v
Copy link
Collaborator

@heemin32 how this is bug? I would consider this as an enhancement where messaging is not proper.

@heemin32
Copy link
Collaborator Author

heemin32 commented Feb 6, 2024

It is a bug because the error message is not just vague but incorrect. If the exception is InternalException or ThrottlingException, it is not a bug. However, mapper parsing exception does not encompass circuit breaker exception but those two are entirely different exceptions.

@vamshin vamshin added v2.14.0 and removed untriaged labels Mar 27, 2024
@vamshin vamshin moved this from Backlog to 2.14.0 in Vector Search RoadMap Apr 1, 2024
@jmazanec15 jmazanec15 moved this from 2.14.0 to 2.15.0 in Vector Search RoadMap Apr 29, 2024
@ryanbogan
Copy link
Member

ryanbogan commented May 2, 2024

Following a deep dive, we throw the nested circuit breaker exception in KNNVectorFieldMapper here. This method is called within a parseCreateField method. Tracing the method calls, we eventually arrive at the parse method in the FieldMapper class of the OpenSearch repository, which calls the method in a try/catch and wraps any exception that is caught. The parsing action fails because the circuit breaker is triggered, so the exception is technically correct. There are other instances of mappers checking conditions in this method and throwing exceptions if necessary, such as ParentIdFieldMapper and DocCountFieldMapper.

We have to add the circuit breaker check in this method to stop the bulk indexing from occurring. Our access to the bulk ingestion API/process is limited due to most of the logic being implemented in the core repository. The parseCreateField method allows us to check on the circuit breaker before the indexing begins, which prevents additional memory from being used when resources are limited.

That being said, I think we should create a new type of exception to throw instead of an illegal state: KNNCircuitBreakerException. I also think we can improve the wording of the nested exception message to more clearly describe to the user what action is failing and what it means for their cluster. For example, we could clarify the significance of the circuit breaker being triggered (lack of memory) and clarify that the parse action could not be run with the current status of the circuit breaker.

@nwatab
Copy link

nwatab commented Jun 25, 2024

Does that mean simply reducing bulk batch size could avoid the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2.15.0
Projects
Status: Done
Development

No branches or pull requests

6 participants