Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The server response does not contain an SSH identification string. #1107

Open
rkreisel opened this issue Mar 31, 2023 · 5 comments
Open

The server response does not contain an SSH identification string. #1107

rkreisel opened this issue Mar 31, 2023 · 5 comments

Comments

@rkreisel
Copy link

When testing from a developer machine, the connection to the remote server is successful. But when deployed to an Azure function I get this error upon executing the Connect() method. The referenced ietf document is "greek" to me.

Renci.SshNet.Common.SshConnectionException: The server response does not contain an SSH identification string. The connection to the remote server was closed before any data was received. More information on the Protocol Version Exchange is available here: https://tools.ietf.org/html/rfc4253#section-4.2

@WojciechNagorski
Copy link
Collaborator

I've reproduced this problem here:
https://ci.appveyor.com/project/drieseng/ssh-net/builds/48584754

@Rob-Hague do you have any idea what might have happened?

@Rob-Hague
Copy link
Collaborator

In #1250 it looks like the same as #1220 (comment)

I had guessed that that was related to the connections being re-established too quickly. I started looking at SO_REUSEADDR and SO_REUSEPORT, but I have quite a lot of learning to do there. I'm not sure whether @rkreisel's problem would have the same cause or whether it is something different.

Probably the best thing would be to get a packet capture by running tcpdump on the docker instance... but I wouldn't know how to do that either 🙂

@raimana
Copy link

raimana commented Jan 15, 2024

FWIW I've been troubleshooting a similar problem, i.e. works without issues locally, and failed intermittently in Azure.

The problem was that Azure Functions or App Services (except ASE/Isolated tier, that cost an arm and a leg) have a list of outbound IP addresses it can "pick" for outbound connections.
The IP selected by Azure can change across Function execution, the actual issue was that some IPs were allowed and some were blacklisted (the company hosting the SFTP server was unaware that some Azure IPs were locked, until I showed them the packet capture).
These IPs could have been used by other tenants - before being assigned to your Function App - engaging in "suspicious" activities.

Because the issue was intermittent and similar tickets pointed to SSH.NET potentially not handling connections properly(?), I initially looked into the SSH.NET code but after debugging it extensively I came to the conclusion it had nothing to do with it, then I started to look at the network (should have started there).

This was manifesting itself by a FIN/ACK packet sent - by the remote site - immediately after the TCP handshake.
The client starts the SSH protocol version exchange unaware that the server is initiating the TCP connection termination (screenshot below).
Hence why the server never returns its identification string since it's closing the connection.
azure_sshnet

You can run a packet capture from Azure by upgrading to a premium plan temporarily (if running on a consumption plan).

  1. Go to "Change App Service Plan" -> Select "Function Premium"
  2. Go to "Diagnose and solve problems" -> "Collect Network Trace"
  3. Use Wireshark or similar to review the packet capture

A few options to solve this particular problem are:

  1. Whitelist the function apps' data center IPs, if you control the SFTP server or can convince the vendor (it's a long list)

https://learn.microsoft.com/en-us/azure/azure-functions/ip-addresses?tabs=portal#data-center-outbound-ip-addresses

  1. Route traffic from your Function App to a network appliance with a static IP (NAT gateway, outbound load balancer etc.)

https://learn.microsoft.com/en-us/azure/nat-gateway/nat-overview

  1. others...

@sundman
Copy link

sundman commented Oct 31, 2024

After updating to 2024.1 we started to notice the same error message when trying to connect to an AWS sftp server, but only about 50% of the time, seemingly at random.

After a lot of digging in stuff I really don't understand that well, my current understanding is that it seems like there is some race condition when the protocol version exchange is sent very close in time after an ACK related to the initial connection:

image

Here we see wireshark logs of first a failed connection attempt at about 11:53:50, which ends up in a loop of some retransmission requests until the server gives up on us. I know too little about the insides of TCP connections to know who is to blame for the parts not agreeing any longer, but somehow they end up talking past each other.

The connection attempt at 11:58:50 ends up working tough and everything is fine...

After some tinkering I found that this small change "fixes" the problem:
image

I'm not suggesting that this is a long term solution for anyone, but perhaps this sheds enough light into the problem so that someone actually can solve the bug before we need to upgrade to a new version.

Until then this hack seems to have resolved our immediate issues.

@Rob-Hague
Copy link
Collaborator

No.7906 in your trace seems strange to me, the Ack number from the server is 100 less than it should be, i.e. it is 969_232_960 but I would have thought it should be 969_233_060. No idea how that could happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants