Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Keep-alives #2334

Closed
russjones opened this issue Oct 31, 2018 · 9 comments
Closed

Configurable Keep-alives #2334

russjones opened this issue Oct 31, 2018 · 9 comments
Milestone

Comments

@russjones
Copy link
Contributor

Support configurable client and server keep-alives messages for users whose network disconnects SSH connections after some idle period. This would allow users/admins to configure clients/server to send keep-alive messages shorter than the idle period to prevent the network (VPN) from tearing down the connection.

See Zendesk #377.

@klizhentas
Copy link
Contributor

klizhentas commented Oct 31, 2018

can we just make the keepalives more frequent by default?

@russjones
Copy link
Contributor Author

Sure, but what value is sufficient for more frequent, it would really vary per how your VPN is configured right?

For the forwarding proxy, if I recall correctly I think we heartbeat every 5 minutes.

@elg0ch0
Copy link

elg0ch0 commented Nov 1, 2018

Hi @russjones, @klizhentas,

I think it's more a firewall related setting (sometimes it's at LoadBalancer level), as a reference, you can consider how some related applications/actors behave:

  • AWS: 1min
  • Azure : 4min
  • SSHD: default 0 (use ClientAliveInterval to customize it)

I'd said (to be as much flexible as possible):

  • default: no keep alive (it's aligned with your traffic reduction policy)
  • provide an optional setting (in teleport.yaml?) which allow users to customize it

@russjones
Copy link
Contributor Author

@klizhentas One thing we can do is add keep_alive to the options on roles to control how often the server heartbeats to client connections. That's probably the easiest way to solve this problem and make the keep-alives configurable.

@klizhentas
Copy link
Contributor

this will be hard to manage/troubleshoot per cluster, don't you think? Let's have a cluster global setting instead that applies to every single client for now.

@russjones
Copy link
Contributor Author

@kontsevoy What do you think about adding a cluster level keep alive that will be sent by Teleport nodes to clients. It can be configured with the following syntax:

teleport:
  auth_service:
    server_keep_alive: 5m

@kontsevoy
Copy link
Contributor

I would drop "server" because it's obvious and add "interval" (because we use this word elsewhere in the config to express periodically happening actions).

How about keep_alive_interval?
Also I would set the default to 30 seconds, IIRC it's a cheap operation which only happens on active sessions so it wouldn't increase overall Teleport overhead in terms of the network traffic .

@russjones
Copy link
Contributor Author

Talked with @ev offline about this, we'll add support for keep_alive_count_max as well. Then Teleport will mirror sshd parameters of ClientAliveCountMax and ClientAliveInterval:

     ClientAliveCountMax
             Sets the number of client alive messages (see below) which may
             be sent without sshd(8) receiving any messages back from the
             client.  If this threshold is reached while client alive mes‐
             sages are being sent, sshd will disconnect the client, terminat‐
             ing the session.  It is important to note that the use of client
             alive messages is very different from TCPKeepAlive (below).  The
             client alive messages are sent through the encrypted channel and
             therefore will not be spoofable.  The TCP keepalive option
             enabled by TCPKeepAlive is spoofable.  The client alive mecha‐
             nism is valuable when the client or server depend on knowing
             when a connection has become inactive.

             The default value is 3.  If ClientAliveInterval (see below) is
             set to 15, and ClientAliveCountMax is left at the default, unre‐
             sponsive SSH clients will be disconnected after approximately 45
             seconds.

     ClientAliveInterval
             Sets a timeout interval in seconds after which if no data has
             been received from the client, sshd(8) will send a message
             through the encrypted channel to request a response from the
             client.  The default is 0, indicating that these messages will
             not be sent to the client.

Teleport configuration will look like:

teleport:
  auth_service:
    keep_alive_interval: 15m
    keep_alive_count_max: 3

@kontsevoy
Copy link
Contributor

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants