[Inference API] Fix flaky AuthorizationTaskExecutorIT tests#139978
[Inference API] Fix flaky AuthorizationTaskExecutorIT tests#139978jonathan-buttner merged 6 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/search-inference-team (Team:Search - Inference) |
|
The test failure happened when calling |
|
Hmm Looking at the logs closer... That part looks ok so far. The We get 3x of those because we retry that exception. Then we get: It's odd that we don't see the all shards failed exception in those logs, it's further up in the output I wonder if the issue is that we're performing a search while we're trying to create the indices 🤔 . That's what is different between the first call to I think it's still worth a shot trying to ensure that the indices are created ahead of time to see if that helps here. In this PR, the first call to So maybe that'll help 🤷♂️. The other thing this reveals is that we could handle updating the cluster state a little better. Specifically this means that we're passing an empty set of ids to the update cluster state. I think this is happening because when the test completes we delete all the indices so there's probably a race condition when a failure occurs. We can probably skip the update cluster state when there were no successful endpoints created. |
muted-tests.yml
Outdated
| - class: org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT | ||
| method: test {csv-spec:spatial.ConvertFromStringParseError} | ||
| issue: https://github.com/elastic/elasticsearch/issues/139213 |
There was a problem hiding this comment.
Is this test mute being added intentionally?
There was a problem hiding this comment.
Oops nope, I'll remove it. Merge conflict issue.
The test queues a response in the web server in the
Agreed, making sure the indices get created seems like a sensible choice here. I do wonder if this race between inference index creation and running a query on the index is something a customer could hit and what action they should take if they do, though. |
Yeah good point, I don't know why I put that in the initClass() method 😆 |
💔 Backport failed
You can use sqren/backport to manually backport by running |
* upstream/main: (191 commits) Overall Decision for Deciders prioritizes THROTTLE (elastic#140237) Apply group by all logic not only to top-level aggregates (elastic#140248) [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982) Avoid reading entire bloom filter file on reader open (elastic#139374) Mark bloom filter files for random access (elastic#139375) Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034) Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230) Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228) Allow a slight difference in rescored docs (elastic#139931) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480 Start exchange sink fetchers concurrently (elastic#140196) Allow allocation to replacement target node on vacate completion (elastic#140150) Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925) DeterministicQueue refactor and enhancement (elastic#140151) Always error out if CCS expression shows up when CCS is not supported (elastic#139009) Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200) Guard promql capabilities in AnalyzerTests (elastic#140232) [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978) Cleaning up exitable vector value impls (elastic#140190) [Inference API] Fix auth exception listener not called bug (elastic#139966) ...
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…139978) * Fix flaky with no shards available exception * Fixing merge and adding empty response before tests
This PR tries to address an
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failedexception.I think the
all shards failedis indicating that the indices do not exist yet.By having the
modelRegistry.getAllModels(true, listener);passtruethe ModelRegistry should persist the default endpoints and therefore create the inference indices.Failure issue: #138012
Stack trace