Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfa: per-session MFA certs for SSH and Kubernetes #5564

Merged
merged 13 commits into from
Mar 10, 2021
Merged

Conversation

awly
Copy link
Contributor

@awly awly commented Feb 12, 2021

This is client-side support for requesting single-use certs with an MFA
check.

The client doesn't know whether they need MFA check when accessing a
resource, this is decided during an RBAC check on the server. So a
client will always try to get a single-use cert, and the server will
respond with NotNeeded if MFA is not required. This is an extra
round-trip for every session which causes ~20% slowdown in SSH logins:

$ hyperfine '/tmp/tsh-old ssh talos date' '/tmp/tsh-new ssh talos date'
Benchmark #1: /tmp/tsh-old ssh talos date
  Time (mean ± σ):      49.9 ms ±   1.0 ms    [User: 15.1 ms, System: 7.4 ms]
  Range (min … max):    48.4 ms …  54.1 ms    59 runs

Benchmark #2: /tmp/tsh-new ssh talos date
  Time (mean ± σ):      60.2 ms ±   1.6 ms    [User: 19.1 ms, System: 8.3 ms]
  Range (min … max):    59.0 ms …  69.7 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/tsh-old ssh talos date' ran
    1.21 ± 0.04 times faster than '/tmp/tsh-new ssh talos date'

Another few other internal changes:

  • client.LocalKeyAgent will now always have a non-nil LocalKeyStore.
    Previously, it would be nil (e.g. in a web UI handler or when using an
    identity file) which easily causes panics. I added a noLocalKeyStore
    type instead that returns errors from all methods.

  • requesting a user cert with a TTL < 1min will now succeed and return a
    1min cert instead of failing

@awly awly force-pushed the andrew/session-2fa-cli branch 8 times, most recently from 5f4fba5 to 5580e2a Compare February 18, 2021 20:44
@awly awly changed the title WIP: per-session certificate client support mfa: per-session MFA certs for SSH and Kubernetes Feb 18, 2021
@awly awly marked this pull request as ready for review February 18, 2021 20:46
Copy link
Contributor

@a-palchikov a-palchikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain about other ramifications of MFA check but if the 20% slowdown is a concern, the connect/port forward APIs could attempt to connect with the local agent's ssh key first (w/o the embedded node name) before requesting the certificate with MFA check if RBAC requires it.

@awly awly force-pushed the andrew/session-2fa-cli branch from 5580e2a to 4c251dd Compare February 22, 2021 21:37
@russjones
Copy link
Contributor

@fspmarshall Can you review this?

Comment on lines 1652 to 1725
// Match NodeName to UUID, hostname or self-reported server address.
if n.GetName() == req.NodeName || n.GetHostname() == req.NodeName || addr == req.NodeName {
node = n
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some folks have multiple nodes with the same hostname and rely on uuid-based dialing to access them. This loop should probably check all nodes for matches with UUID, and only fallback to hostname/addr matches if none are found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.GetName() returns the UUID, so the first condition in this if statement matches nodes by UUID

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what I mean is that iteration shouldn't halt if a hostname/addr match is found because there might still be a node which matches UUID and that would need to take priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some node has a hostname matching other node's UUID (which seems very unlikely), then we have multiple matches and no way to resolve the conflict and for the user to pick one of them.

I think the more likely issue is when user passes a hostname and multiple nodes match.
Changed the logic to catch that scenario - if there are multiple matches return an error.

@awly awly force-pushed the andrew/session-2fa-cli branch from 4c251dd to 45dfb57 Compare February 26, 2021 19:16
@awly awly requested a review from fspmarshall February 26, 2021 19:24
@awly awly force-pushed the andrew/session-2fa-cli branch from 45dfb57 to 096fe3a Compare March 2, 2021 23:43
@awly
Copy link
Contributor Author

awly commented Mar 2, 2021

PTAL @fspmarshall

}
// Errors other than ErrSessionMFARequired mean something else is wrong,
// most likely access denied.
if noMFAAccessErr != services.ErrSessionMFARequired {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if any CheckAccessToXXX APIs get inadvertently updated to wrap the error return, this will start to behave unexpectedly, so either errors.Is or trace.Unwrap might still be on the safe side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
I used trace.Unwrap originally and had trouble (because I think trace.AccessDenied unwraps into the underlying string error). But errors.Is works well, updated.

@awly awly force-pushed the andrew/session-2fa-cli branch from b05a4f8 to c8acbe1 Compare March 3, 2021 18:29
@awly
Copy link
Contributor Author

awly commented Mar 3, 2021

@r0mant @russjones please review

@awly awly added this to the 6.1 milestone Mar 3, 2021
@@ -42,7 +42,7 @@ func PromptMFAChallenge(ctx context.Context, proxyAddr string, c *proto.MFAAuthe
return &proto.MFAAuthenticateResponse{}, nil
// TOTP only.
case c.TOTP != nil && len(c.U2F) == 0:
totpCode, err := prompt.Input(os.Stdout, os.Stdin, fmt.Sprintf("Enter an OTP code from a %sdevice", promptDevicePrefix))
totpCode, err := prompt.Input(os.Stderr, os.Stdin, fmt.Sprintf("Enter an OTP code from a %sdevice", promptDevicePrefix))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this change to stderr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of kubectl's exec plugins.
We configure kubectl to exec a tsh subcommand every time it needs to get credentials to connect.
kubectl expects to read the credentials from stdout of the execed process.
Anything written to stderr will be forwarded to kubectl's own stderr to allow interactive commands like this.

@@ -126,6 +126,7 @@ func promptU2FChallenges(ctx context.Context, proxyAddr string, challenges []*pr
})
}

log.Debugf("prompting U2F devices with facet %q", facet)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a debug leftover, need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is left on purpose.
If a user mis-configures U2F facets (proxy addresses, basically) on the auth_service, it's really hard to debug when they use a different value in tsh --proxy=... and U2F auth will fail.

@awly awly force-pushed the andrew/session-2fa-cli branch from c8acbe1 to 282a31a Compare March 4, 2021 17:20
@awly awly force-pushed the andrew/session-2fa-cli branch 2 times, most recently from 262a22d to e49a15a Compare March 10, 2021 00:30
@awly
Copy link
Contributor Author

awly commented Mar 10, 2021

Added IsMFARequired RPC on the auth server.
The client now uses this against the appropriate auth server (root or leaf) to check if MFA is required.
If it is, client sends the UserSingleUseCerts RPC to the root cluster, which should always be the one issuing credentials. UserSingleUseCerts is back to always forcing the MFA check and not checking RBAC to see if it's needed.

@r0mant @fspmarshall PTAL

@awly awly force-pushed the andrew/session-2fa-cli branch from c76e250 to 4ee0366 Compare March 10, 2021 01:53
if hasLocalUserRole(a.context.Checker) && username == a.context.Identity.GetIdentity().Username {
return nil
}
if hasRemoteUserRole(a.context.Checker) && username == a.context.UnmappedIdentity.GetIdentity().Username {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the implication of this change? Wouldn't this mean that, for example, a remote user has permissions to change the local user's password as long as they have the same username? I may be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, didn't think about that.
It also turns out I was confusing "username" with "ssh login name", so I renamed that request field and it's no longer compared to the caller username.
Changes to currentUserAction reverted.

}

var noMFAAccessErr, notFoundErr error
switch t := req.Target.(type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a meta comment, but don't we want to keep the ACL layer slim without much business logic? Any downsides of moving this switch downstream, e.g. to an auth server method and invoke it here similar to how other methods here do? You can pass checker to it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this code felt like pure authz logic, which is why I put it here.
But I think you're right, moved all of this code into auth.Server.

@awly awly force-pushed the andrew/session-2fa-cli branch 2 times, most recently from 013402e to 4d96b0d Compare March 10, 2021 18:31
Andrew Lytvynov added 10 commits March 10, 2021 11:34
This is client-side support for requesting single-use certs with an MFA
check.

The client doesn't know whether they need MFA check when accessing a
resource, this is decided during an RBAC check on the server. So a
client will always try to get a single-use cert, and the server will
respond with NotNeeded if MFA is not required. This is an extra
round-trip for every session which causes ~20% slowdown in SSH logins:

```
$ hyperfine '/tmp/tsh-old ssh talos date' '/tmp/tsh-new ssh talos date'
Benchmark #1: /tmp/tsh-old ssh talos date
  Time (mean ± σ):      49.9 ms ±   1.0 ms    [User: 15.1 ms, System: 7.4 ms]
  Range (min … max):    48.4 ms …  54.1 ms    59 runs

Benchmark #2: /tmp/tsh-new ssh talos date
  Time (mean ± σ):      60.2 ms ±   1.6 ms    [User: 19.1 ms, System: 8.3 ms]
  Range (min … max):    59.0 ms …  69.7 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/tsh-old ssh talos date' ran
    1.21 ± 0.04 times faster than '/tmp/tsh-new ssh talos date'
```

Another few other internal changes:

- client.LocalKeyAgent will now always have a non-nil LocalKeyStore.
  Previously, it would be nil (e.g. in a web UI handler or when using an
  identity file) which easily causes panics. I added a noLocalKeyStore
  type instead that returns errors from all methods.

- requesting a user cert with a TTL < 1min will now succeed and return a
  1min cert instead of failing
An unknown node could be an OpenSSH node set up via
https://goteleport.com/teleport/docs/openssh-teleport/

In this case, we shouldn't prevent the user from connecting.

There's a small risk of authz bypass - an attacker might know a
different name/IP for a registered node which Teleport doesn't know
about. But a Teleport node will still check RBAC and reject the
connection.
IssueUserCertsWithMFA is called on the leaf auth server in case of
trusted clusters. Username in the request object will be that of the
original unmapped caller.
This RPC is ran before every connection to check whether MFA is
required. If a connection is against the leaf cluster, this request is
forwarded from root to leaf for evaluation.
Also, move the logic into auth.Server out of ServerWithRoles.
@awly awly force-pushed the andrew/session-2fa-cli branch from e7c417f to e1c31bb Compare March 10, 2021 19:34
Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more questions but otherwise lgtm.

Comment on lines +263 to +270
tlsCert, err := key.TeleportTLSCertificate()
if err != nil {
return nil, trace.Wrap(err)
}
rootClusterName, err := tlsca.ClusterName(tlsCert.Issuer)
if err != nil {
return nil, trace.Wrap(err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There's RootClusterName() method on the ProxyClient which you could probably just use instead of extracting from the TLS cert, although the outcome would probably be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RootClusterName does the exact same thing as this code, including loading the key from localAgent.
I need the key below too, so I chose to manually extract rootClusterName to avoid loading the key twice.

@awly awly merged commit 3d02ae6 into master Mar 10, 2021
@awly awly deleted the andrew/session-2fa-cli branch March 10, 2021 23:42
awly pushed a commit that referenced this pull request Mar 25, 2021
* mfa: per-session MFA certs for SSH and Kubernetes

This is client-side support for requesting single-use certs with an MFA
check.

The client doesn't know whether they need MFA check when accessing a
resource, this is decided during an RBAC check on the server. So a
client will always try to get a single-use cert, and the server will
respond with NotNeeded if MFA is not required. This is an extra
round-trip for every session which causes ~20% slowdown in SSH logins:

```
$ hyperfine '/tmp/tsh-old ssh talos date' '/tmp/tsh-new ssh talos date'
Benchmark #1: /tmp/tsh-old ssh talos date
  Time (mean ± σ):      49.9 ms ±   1.0 ms    [User: 15.1 ms, System: 7.4 ms]
  Range (min … max):    48.4 ms …  54.1 ms    59 runs

Benchmark #2: /tmp/tsh-new ssh talos date
  Time (mean ± σ):      60.2 ms ±   1.6 ms    [User: 19.1 ms, System: 8.3 ms]
  Range (min … max):    59.0 ms …  69.7 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/tsh-old ssh talos date' ran
    1.21 ± 0.04 times faster than '/tmp/tsh-new ssh talos date'
```

Another few other internal changes:

- client.LocalKeyAgent will now always have a non-nil LocalKeyStore.
  Previously, it would be nil (e.g. in a web UI handler or when using an
  identity file) which easily causes panics. I added a noLocalKeyStore
  type instead that returns errors from all methods.

- requesting a user cert with a TTL < 1min will now succeed and return a
  1min cert instead of failing

* Capture access approvals on MFA-issued certs

* Address review feedback

* Address review feedback

* mfa: accept unknown nodes during short-term MFA cert creation

An unknown node could be an OpenSSH node set up via
https://goteleport.com/teleport/docs/openssh-teleport/

In this case, we shouldn't prevent the user from connecting.

There's a small risk of authz bypass - an attacker might know a
different name/IP for a registered node which Teleport doesn't know
about. But a Teleport node will still check RBAC and reject the
connection.

* Validate username against unmapped user identity

IssueUserCertsWithMFA is called on the leaf auth server in case of
trusted clusters. Username in the request object will be that of the
original unmapped caller.

* mfa: add IsMFARequired RPC

This RPC is ran before every connection to check whether MFA is
required. If a connection is against the leaf cluster, this request is
forwarded from root to leaf for evaluation.

* Fix integration tests

* Correctly treat "Username" as login name in IsMFARequired

Also, move the logic into auth.Server out of ServerWithRoles.

* Fix TestHA

* Address review feedback
awly pushed a commit that referenced this pull request Mar 29, 2021
* mfa: per-session MFA certs for SSH and Kubernetes

This is client-side support for requesting single-use certs with an MFA
check.

The client doesn't know whether they need MFA check when accessing a
resource, this is decided during an RBAC check on the server. So a
client will always try to get a single-use cert, and the server will
respond with NotNeeded if MFA is not required. This is an extra
round-trip for every session which causes ~20% slowdown in SSH logins:

```
$ hyperfine '/tmp/tsh-old ssh talos date' '/tmp/tsh-new ssh talos date'
Benchmark #1: /tmp/tsh-old ssh talos date
  Time (mean ± σ):      49.9 ms ±   1.0 ms    [User: 15.1 ms, System: 7.4 ms]
  Range (min … max):    48.4 ms …  54.1 ms    59 runs

Benchmark #2: /tmp/tsh-new ssh talos date
  Time (mean ± σ):      60.2 ms ±   1.6 ms    [User: 19.1 ms, System: 8.3 ms]
  Range (min … max):    59.0 ms …  69.7 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/tsh-old ssh talos date' ran
    1.21 ± 0.04 times faster than '/tmp/tsh-new ssh talos date'
```

Another few other internal changes:

- client.LocalKeyAgent will now always have a non-nil LocalKeyStore.
  Previously, it would be nil (e.g. in a web UI handler or when using an
  identity file) which easily causes panics. I added a noLocalKeyStore
  type instead that returns errors from all methods.

- requesting a user cert with a TTL < 1min will now succeed and return a
  1min cert instead of failing

* Capture access approvals on MFA-issued certs

* Address review feedback

* Address review feedback

* mfa: accept unknown nodes during short-term MFA cert creation

An unknown node could be an OpenSSH node set up via
https://goteleport.com/teleport/docs/openssh-teleport/

In this case, we shouldn't prevent the user from connecting.

There's a small risk of authz bypass - an attacker might know a
different name/IP for a registered node which Teleport doesn't know
about. But a Teleport node will still check RBAC and reject the
connection.

* Validate username against unmapped user identity

IssueUserCertsWithMFA is called on the leaf auth server in case of
trusted clusters. Username in the request object will be that of the
original unmapped caller.

* mfa: add IsMFARequired RPC

This RPC is ran before every connection to check whether MFA is
required. If a connection is against the leaf cluster, this request is
forwarded from root to leaf for evaluation.

* Fix integration tests

* Correctly treat "Username" as login name in IsMFARequired

Also, move the logic into auth.Server out of ServerWithRoles.

* Fix TestHA

* Address review feedback
awly pushed a commit that referenced this pull request Mar 29, 2021
* mfa: per-session MFA certs for SSH and Kubernetes

This is client-side support for requesting single-use certs with an MFA
check.

The client doesn't know whether they need MFA check when accessing a
resource, this is decided during an RBAC check on the server. So a
client will always try to get a single-use cert, and the server will
respond with NotNeeded if MFA is not required. This is an extra
round-trip for every session which causes ~20% slowdown in SSH logins:

```
$ hyperfine '/tmp/tsh-old ssh talos date' '/tmp/tsh-new ssh talos date'
Benchmark #1: /tmp/tsh-old ssh talos date
  Time (mean ± σ):      49.9 ms ±   1.0 ms    [User: 15.1 ms, System: 7.4 ms]
  Range (min … max):    48.4 ms …  54.1 ms    59 runs

Benchmark #2: /tmp/tsh-new ssh talos date
  Time (mean ± σ):      60.2 ms ±   1.6 ms    [User: 19.1 ms, System: 8.3 ms]
  Range (min … max):    59.0 ms …  69.7 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/tsh-old ssh talos date' ran
    1.21 ± 0.04 times faster than '/tmp/tsh-new ssh talos date'
```

Another few other internal changes:

- client.LocalKeyAgent will now always have a non-nil LocalKeyStore.
  Previously, it would be nil (e.g. in a web UI handler or when using an
  identity file) which easily causes panics. I added a noLocalKeyStore
  type instead that returns errors from all methods.

- requesting a user cert with a TTL < 1min will now succeed and return a
  1min cert instead of failing

* Capture access approvals on MFA-issued certs

* Address review feedback

* Address review feedback

* mfa: accept unknown nodes during short-term MFA cert creation

An unknown node could be an OpenSSH node set up via
https://goteleport.com/teleport/docs/openssh-teleport/

In this case, we shouldn't prevent the user from connecting.

There's a small risk of authz bypass - an attacker might know a
different name/IP for a registered node which Teleport doesn't know
about. But a Teleport node will still check RBAC and reject the
connection.

* Validate username against unmapped user identity

IssueUserCertsWithMFA is called on the leaf auth server in case of
trusted clusters. Username in the request object will be that of the
original unmapped caller.

* mfa: add IsMFARequired RPC

This RPC is ran before every connection to check whether MFA is
required. If a connection is against the leaf cluster, this request is
forwarded from root to leaf for evaluation.

* Fix integration tests

* Correctly treat "Username" as login name in IsMFARequired

Also, move the logic into auth.Server out of ServerWithRoles.

* Fix TestHA

* Address review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants