Do not report failure after connections are made#99117
Do not report failure after connections are made#99117elasticsearchmachine merged 5 commits intoelastic:mainfrom
Conversation
This PR removes a bogus assertion from ProxyConnectionStrategy. The statement tries to assert that all connection exceptions are caught in the if branch and we should have at least one connection when the code reaches the else branch. This is simply not true because the process of checking whether we should open more connections and subsequently openning connections are *not* atomic. It is possible that the number of connections changes, e.g. remote end drops the connection, in between which makes the code flow go to the else branch and trips the bogus assertion. Relates: elastic#94998 Resolves: elastic#99113
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
@DaveCTurner Could you please help review this PR since you were involved in the PR (#94998) that introduced the bogus assertion? Thanks! It is a rare failure only happens if connections are dropped quickly enough. For the failed test, it reaches the end of the test and starts to cleanup connections once the addressResolver is called. This can cause connections to be closed when the I considered whether we could check But it does not help the overall situation where number of connections can change right after we check it. Since the |
|
@DaveCTurner Ping for awareness. Thank you! |
|
Ah yes this is not a valid assertion. But why check the number of open connections at all? We may as well just report success at this point, because it makes no difference if the connections have closed before or after we read |
|
Checking the number of open connections is an existing behaviour before the assertion was added. I can see the argument that we don't need to check the connection again in the That said, can I address it in a separate PR because it requires changing production code and writing tests? For this PR, I'd prefer to resolve the CI failure first. Thanks! |
|
Can we just mute the test until it's fixed? IMO the behaviour we have today is a genuine bug. |
|
Hi @ywangd, I've created a changelog YAML for you. |
|
@DaveCTurner I have turned this PR into an actual fix as you suggested. I did not mute the failed test because (1) it is a very rare failure and (2) the failure in theory can happen to any tests that use proxy connection strategy and it is rather inconvenient to mute all of them. So hopefully we can get this PR merged before the next failure happens. Thanks! |
Today, when the number of attempts is exhausted, ProxyConnectionStrategy checks the number of connections before returns. It reports connection failure if the number of connections is zero at the time of checking. However, this behaviour is incorrect. In rare cases, a connection can be dropped right after it is initially established and before the number checking. From the perspective of the
openConnectionsmethod, it should not care whether or when opened connections are subsequently closed. As long as connections have been initially established, it should report success instead of failure.This PR adjusts the code to report success in above situation.
Relates: #94998
Resolves: #99113