Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition causing SSH_MSG_UNIMPLEMENTED occasionally during key exchange #851

Merged
merged 5 commits into from
Sep 1, 2023

Conversation

rasantel
Copy link
Contributor

@rasantel rasantel commented Mar 30, 2023

The problem and diagnosis are documented in issue #850.

At KeyExchanger.startKex, the main thread should skip the key exchange if the reader thread is already doing an exchange (as per KeyExchanger.kexOngoing) or an exchange has already completed (as per KeyExchanger.done). Currently, the main thread checks this earlier in SSHClient.onConnect, but the result can change between then and KeyExchanger.startKex.

Thankfully, KeyExchanger.startKex checks kexOngoing again and atomically "acquires" it (by checking it and switching it to true) but there is no new check for done, which by now might have been set by the reader or keepalive threads. If that has happened, the main thread will not know and it will proceed with its own key exchange, which will result in a SSH_MSG_UNIMPLEMENTED from the server.

Note that I found this to happen with the OpenSSH server with the default configuration provided by Ubuntu Linux 22.04, as well as the OpenSSH server used by the company where this issue was first encountered. The Apache sshd fixture in the unit test suite does not complain when receiving extra key exchanges under the same conditions. I'm no expert in the ssh protocol though; I don't know if this is a behavioral difference between the two servers or if it has to do with configuration differences.

The proposed fix is to check that done is false after acquiring kexOngoing. By checking this while owning kexOngoing, there is no risk of done being set by another thread doing its own key exchange at the same time simply because only one thread can own kexOngoing at a time. If done was already set by some other thread before the main thread acquired kexOngoing, then the solution "releases" kexOngoing (sets it back to false) and avoids a new key exchange.

With this solution alone, however, additional key exchanges won't be possible again, but they must be allowed when it's time to re-key. My experiments with my own OpenSSH 8.9p1-3ubuntu0.1 server show that the server will reject with SSH_MSG_UNIMPLEMENTED any extra key exchange before authentication completes. After authentication, new key exchanges are allowed by it. Therefore, after kexOngoing is acquired, the solution adds a second check to see if authentication has completed.

If the approach looks good to reviewers, I'll add unit tests. Although the issue is not reproducible with the current sshd fixture, unit tests can show that duplicate key exchanges are avoided before authentication.

@rasantel rasantel requested a review from hierynomus as a code owner March 30, 2023 15:38
@hierynomus
Copy link
Owner

Hi @rasantel, sorry for coming back somewhat later to this. A few additional unit tests would be great, then I think this is in a mergeable state!

If you could add those, I'll include the PR in the next release.

@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2023

Codecov Report

Merging #851 (f493dec) into master (a5fdb29) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #851      +/-   ##
============================================
+ Coverage     68.34%   68.36%   +0.02%     
- Complexity     1394     1397       +3     
============================================
  Files           207      207              
  Lines          7423     7423              
  Branches        629      629              
============================================
+ Hits           5073     5075       +2     
  Misses         2014     2014              
+ Partials        336      334       -2     
Files Changed Coverage Δ
...java/net/schmizz/sshj/transport/TransportImpl.java 65.56% <ø> (-0.15%) ⬇️
src/main/java/net/schmizz/sshj/SSHClient.java 58.95% <100.00%> (-0.70%) ⬇️
.../java/net/schmizz/sshj/transport/KeyExchanger.java 78.72% <100.00%> (+1.00%) ⬆️

... and 2 files with indirect coverage changes

@rasantel
Copy link
Contributor Author

@hierynomus Thanks for checking. I added unit tests. Let me know what you think!

@hierynomus
Copy link
Owner

Looks good, consider it merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants