Skip to content

RFD 64: Bot for certificate renewals (Machine ID)#7986

Merged
timothyb89 merged 16 commits intomasterfrom
zmb3/cerbot-rfd
Apr 15, 2022
Merged

RFD 64: Bot for certificate renewals (Machine ID)#7986
timothyb89 merged 16 commits intomasterfrom
zmb3/cerbot-rfd

Conversation

@zmb3
Copy link
Copy Markdown
Collaborator

@zmb3 zmb3 commented Aug 19, 2021

No description provided.

@zmb3 zmb3 marked this pull request as draft August 19, 2021 15:47
Copy link
Copy Markdown
Contributor

@klizhentas klizhentas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of early comments

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
First, a token will be generated for the bot to join the cluster.

```
$ tctl tokens add --type=bot
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider scoping the permissions of each individual bot. It seems like the bot should be able to only issue credentials for limited set of roles. So when adding bot we should ask for roles the bot should assume:

tctl tokens add --type=bot --roles=Node,Proxy,my-role
# behind the scenes, create an impersonator bot role that is allowed to impersonate the set above
# read here about impersonation: https://goteleport.com/docs/access-controls/guides/impersonation/

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
```
$ tctl tokens add --type=bot
The invite token: 13b74f49d27536dd5c514073097c197b
This token will expire in 60 minutes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that simplified node joining will make it possible to join without short-lived token, consider it in the design, what would change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! The biggest issue I see here is the simplified node joining is looking for the node to join the cluster within a 5 minute TTL from when the instance is launched. This seems fine if you've got a packaged AMI with everything already pre-installed and configured, but I'm not sure it makes sense when you want to join an existing non-Teleport instance like an OpenSSH server. You'd have to copy the software down, set up a system service, and then reboot the instance in order to get it to join - is that much of an improvement?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packaged AMI with everything already pre-installed and configured

I agree, but that's surprisingly popular use-case on AWS. I'd guess that 80% of our users follow it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyb89 add a note on future support for simplified tbot joining, so we don't forget to resolve this comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: with the IAM join method there will be no TTL from when the instance is launched, which should make this nicer

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
0123-4567 foo
```

TODO: what else might we want to see? public addr? number (and kind) of certs?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, would be cool to see certs and roles under management for this bot

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

In the event of an incident, the bot can be locked with the new `tctl lock`
functionality. A locked bot will be unable to renew certificates and will have
its existing certificates revoked. (TODO: this is how it works, right?)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The certs won't be revoked, that's why it's important to set aggressive default validity of the certificate. For example, by default the certbot will have a certificate that is scoped. It should be able to only request very short lived certificate using impersonation https://goteleport.com/docs/access-controls/guides/impersonation/.

Then once you block the bot's impersonator role, it won't be able to renew the certificates that will expire in a couple of hours.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

TODO:

- do we need a mode equivalant to `tctl auth sign`'s `--compat` flag for openssh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, that should eventually go away, because it was targeting an ancient OpenSSH version that treated optional SSH cert extensions as critical by mistake.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

#### RBAC

The new system role for the bot will need: (verify this)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the impersonation point above. It seems like there is no need for a hardcoded bot role, but instead each bot will have it's own unique impersonator role limiting it's scope.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
functionality. A locked bot will be unable to renew certificates and will have
its existing certificates revoked. (TODO: this is how it works, right?)

Note that a bot can manage multiple sets of certificates, and locking a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a separate section on attack vector tree and possible compromise scenarios. E.g. machine with certbot was taken over. How would lock mitigate that and what is the blast radius? I assume with impersonation, the impersonator user will be blocked from issuing new certificates and the existing certificates will expire within 5 minute window. Any other possible attack vectors?

@tcsc
Copy link
Copy Markdown
Contributor

tcsc commented Aug 23, 2021

What's the behaviour for trusted clusters? My (admittedly limited) experience with OpenSSH nodes is that you needed to manually install CA certs from all the upstream clusters that you want to be able to access the OpenSSH node.

Does the cert bot handle this for you, or is it out of scope for rev 1? I suppose you could always run a new bot for every upstream cluster.

@russjones russjones mentioned this pull request Oct 8, 2021
2 tasks
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
##### Security
In the above example, as no additional client configuration has been specified,
the bot will generate only a single set of end-user certificates with access to
the sum of all roles specified. Users with additional security requirements may
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would say the "union" of all roles.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
durations) but this is out of scope for the initial implementation.

*TODO (Tim):* I think the renewal interval should be some fraction of the total
TTL (at most 1/2) rather than 75%. For example, given a 4 hour TTL, we should
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree. The 75% was a totally arbitrary number I came up with, so I would just propose whatever you want for the group to discuss.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll make the change.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

*TODO* (Tim): Non-filesystem destinations are likely to be very difficult to
encode as CLI flags (e.g. k8s secrets). Perhaps the CLI should only support
local FS and other sinks should defer to `tbot.yaml`?
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid the need for tbot.yaml entirely if we made the flag only work for filesystem sinks, and then add a config file in the future if/when we decide to support other options?

If we can keep the CLI simple enough, I like the idea of not needing another config file.

Edit: should have read further. Looks like you're proposing exactly this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. This is itself sort of a second-order effect stemming from moving multiple certs to a YAML file. When we do that, it makes sense to then also move "advanced" destinations to the file too (in my opinion, at least).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml has it's advantages. I would keep the yaml file fully-functional. People can commit file to their repository and use it to deploy new versions for example.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
* (TODO) Additional artifacts may be written if we decide to write app config
files at this stage.

#### Destination
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've been preferring the term "sink" over destination. I would just pick one and be consistent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're both still distinct and important:

  • A destination contains files in a backend-agnostic way (with some configuration)
  • A sink is a destination + certificates and their configuration (which certificate types, config files to generate, etc)

I think the distinction is important because we'll also store the bot's own data (the renewable cert and other state) in a destination, and the configuration for this will share that configuration syntax. For example, I could see Kubernetes secrets as a useful backend both for the renewable cert and end-user certs. (Or no-op to keep it only in memory, etc)

That's not to say I haven't been using the two terms somewhat interchangeably, I'll see about consolidating terminology where it makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Zac, the difference is not that obvious to me. I'd say destination is pretty clear - target for a cert. Tbot's own state directory can be a completely separate thing.

Copy link
Copy Markdown
Contributor

@klizhentas klizhentas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyb89 added several comments after the first review pass

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
```
$ tctl tokens add --type=bot
The invite token: 13b74f49d27536dd5c514073097c197b
This token will expire in 60 minutes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyb89 add a note on future support for simplified tbot joining, so we don't forget to resolve this comment

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
...
```

Note that both direct dialing or connecting through a Teleport proxy are
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: if you remove "Note that" the sentence will read just as well.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
0123-4567 foo false dev,ssh
```

When the bot starts, it generates a set of keys, and uses the provided join
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generates a private key and certificate pair to be more precise?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reword this to be more accurate.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
secretName: example-secret
roles: [c]

- webhook: http://localhost:8001
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would webhook do?

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
- directory: /home/alice/.tbot
roles: [a, b, c]

# TODO: probably need to specify permissions (this will be ugly)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not supply secure defaults? (bot is only allowed to write, not read)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We most likely will not be able to enforce secure defaults from the bot's process (unless it runs as root, which I'd prefer to avoid). I'll remove the TODO here since it just adds confusion.

This relates back to filesystem permissions from before, if we go the ACL route users would need to create the files using their target Unix user and then grant the bot's Unix user write-only access via ACLs (we could provide a helper subcommand for this). There's still more research to do here, I intend test this out and will update the doc.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
secretName: example-secret
roles: [c]

# Webhooks support only store
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove webhooks for now

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

```
$ tbot start ...
$ tbot config --ssh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely this will be many preset commands, so subcommand will make more sense. There will be one for client and another for server, for example. The advantage of the subcommand is that each subcommand can have separate parameter flags that will let users tweak base folder for example.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's 2 sections on the configuration assistant, so I'll merge the relevant details into this section and remove the other one.

Anyway, I also prefer the subcommand approach, it will more closely match tsh config too.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
When the backend receives the request to generate new user credentials, it validates
the token, ensuring that the token is of the correct type and has not expired.

*TODO (Tim): Investigate adding a new gRPC endpoint for this functionality instead
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should be a separate GRPC endpoint, probably integrated with AWS node joining? Can we modify the existing GRPC endpoint to support multiple use-cases?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now migrated this to a pure gRPC endpoint on the tbot branch (see the timothyb89/tbot branch).

As for AWS node joining, I see no reason why the endpoint (specifically the RenewableCertsRequest message) couldn't be extended to accept an EC2 identity document as we do with existing node joining.

@klizhentas
Copy link
Copy Markdown
Contributor

@timothyb89 let's cut the scope a bit, remove the webhooks and other parts we won't include in the first version. This will allow us to focus on the security core - file permissions, signals, renewals, generations, GRPC endpoints and clients.

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated

When the bot starts, it generates a set of keys, and uses the provided join
token to request a signed, renewable certificate from the auth server. These
certificates are stored in a restricted location, such as `/var/lib/tbot`, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmb3 @klizhentas What do you guys think about putting the bots certificates in /var/lib/teleport/bot instead? I imagine customers are already locking down /var/lib/teleport and they can just get those protections for free.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an easy win to me so I'll just change it (but can revert back if desired)

Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
Comment thread rfd/xxxx-bot-for-cert-renewals.md Outdated
```
$ tbot start \
--name=foo \
--output=file:/home/ubuntu/.tbot \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is see we use --output but in file configuration it's directory and other words. I think it will be more consistent if the word matches. @klizhentas what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should match too. Since we're standardizing on only supporting directory destinations via CLI, it should just be --directory.

timothyb89 added a commit that referenced this pull request Dec 24, 2021
This adds the ability to impersonate one or more roles without
impersonating a particular user.

In Teleport today, when creating an impersonator role, both users and
roles must be specified as impersonation is fundamentally tied to an
existing Teleport user:
```yaml
allow:
  impersonate:
    users: ['jenkins']
    roles: ['jenkins']
```

This is inconvenient for two reasons:
 1. A user must exist for each set of roles you'd like to
    impersonate, creating a UX burden.
 2. It makes it difficult to use impersonation to reduce one's
    permissions as you always inherit all of the roles granted to the
    target user.

For the [certificate bot][bot] we'd instead like to use impersonation
to generate end-user (impersonated) certificates with a reduced set
of permissions. For example, given the following role:
```yaml
allow:
  impersonate:
    roles: ['jenkins', 'deploy']
```

We can then use `GenerateUserCerts` to issue certifices for a subset
of the allowed roles, e.g. one set of certificates with only the
`jenkins` role attached, and another with only `deploy`.

To that end, this patch:
 1. Removes the requirement that roles define both `users` and
    `roles` in impersonate conditions
 2. Introduces a new `RoleRequests` field in `UserCertsRequest`
 3. Modifies `generateUserCerts` to gather `roles` from
    `RoleRequests` if allowed by an `allow` (with no `users`)

[bot]: #7986

First, register a bot with the new `tctl bots add` subcommand, giving the bot a
name and one or more roles the bot is allowed to
[impersonate](https://goteleport.com/docs/access-controls/guides/impersonation/).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a bot have a role with an allow rule for role, auth_connector, oidc, saml, token? Is this a legit use case or it should be prevented against escalations?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some customers interested in using the bot to manage access to Teleport's APIs for automation purposes, like issuing node join tokens on demand. I could see it enabling a lot of automation use-cases so I'm wary of locking out whole classes of functionality.

We could have a sanity check / blacklist in tctl bots add ... but that could be bypassed if users were motivated enough to modify the roles after the fact.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a check warning the users about potential escalations after issuing a tctl bots add ... command. Is assigning roles with allowed writes/updates for role also a legit use-case?

| TLS client certificate | `tlscert` | Signed by Teleport `UserCA` | TLS | Always

In the case of `directory` destinations, we encourage the use of POSIX ACLs.
This is meant to ensure that the primary (renewable) credentials are
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actively checking this when writing to them and throwing a warning/event is advisable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note to warn about this. I'd like to make it a harder error but unfortunately we can't depend on ACLs being supported on any given filesystem / OS.

| TLS client certificate | `tlscert` | Signed by Teleport `UserCA` |
| Private key (TLS) | `key` | |

The bot then generates a set of non-renewable credentials for each configured
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot should check if the destination is a symlink and warn the user about it (e.g. if the symlink points to the bot directory or another directory outside the targeted user path, indicating an exploitation attempt). In order to achieve this securely and avoid race conditions, the destination should be locked first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like warning about symlinks, I'll add a note on that.

As for locking, I'm not sure I fully understand the recommendation. Are Linux locks really suitable as a security control? flock and friends seem to be advisory only, and I'm not sure what lock we'd hold to prevent a rename-style attack even if we could depend on mandatory locks being available.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to a general locking (e.g. w ownership/permission change of the destination directory first), or using one of the two common strategies to avoid these attacks (either recreating the implementation of os.Create or use os.Rename to move it to the destination)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a challenge since we don't expect to have anything beyond minimal write privileges (via ACLs) to the specific destination files, they'll be (ideally) owned by an entirely separate user and created in advance by that user with tbot init .... I don't expect we'll be able to change permissions from the bot's end, and I assume the usual file moving tricks would erase our ACLs.

My read of the linked article makes me think the O_NOFOLLOW method should still work for our case, does that seem sane to you?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the O_NOFOLLOW flag causes the open to fail if the final component in the name is a symbolic link, the RESOLVE_NO_SYMLINKS should block symbolic-link traversal at all stages of pathname traversal, rather than just for the final component, so it's desirable (source).


#### Audit Log

The auth server will emit new events to the audit log when:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an event to report when a bot is created

The auth server will emit new events to the audit log when:

- a new renewable certificate is issued for the first time
- a certificate is renewed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event should contain:

  • The final destination path that was set at the time
  • A flag indicating the ACL set on the destination (if insecure)
  • A flag indicating if the destination was a symlink, and what it resolved to
  • Which certs were generated (key/key.pub/sshcert/known_hosts/ssh_config...) and eventually their pins/hashes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As useful as information this info would be, there is currently no mechanism for relaying that sort of information to generateUserCerts(). The auth server also generates both cert types (TLS and SSH) for all cert requests, bot or otherwise.

I suppose we could extend the existing UserCertsRequest to add these as (optional) fields? It feels a bit odd to add fields expressly for audit logging purposes, but it is an option.

for signalling or otherwise executing user code when a certificate rotation
occurs. Instead, end users are encouraged to use filesystem watches (e.g.
`inotify`) to watch for changes directly using a least-privileged system user.
We will include a helper command, `tbot watch`, to perform simple actions on
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the command be just part of a “standalone” utility? What are the privileges that the launched script/command will have?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbot watch will be effectively standalone and will just provide a nice UX on top of inotify. It'll inherit privileges of the Unix user that runs it, but otherwise does nothing you couldn't also do with other tools, e.g. inotifywatch.


#### User Experience

First, register a bot with the new `tctl bots add` subcommand, giving the bot a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enrollment endpoint to add bots (through tctl bots add) using invite codes should be protected by appropriate rate-limiting mechanisms, just like the other token-based invitations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an appendix detailing the new gRPC functions to be added and noted the need for ratelimiting on GenerateInitialRenewableUserCerts(), which is where the admin-created token is exchanged for renewable certificates.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small change here, we've removed the GenerateInitialRenewableUserCerts() endpoint entirely in favor of adding support for what we need in Teleport's standard auth.Register() flow.

@timothyb89 timothyb89 linked an issue Jan 13, 2022 that may be closed by this pull request
2 tasks
@timothyb89 timothyb89 removed a link to an issue Jan 13, 2022
2 tasks
timothyb89 added a commit that referenced this pull request Jan 14, 2022
* Allow impersonation of roles without users

This adds the ability to impersonate one or more roles without
impersonating a particular user.

In Teleport today, when creating an impersonator role, both users and
roles must be specified as impersonation is fundamentally tied to an
existing Teleport user:
```yaml
allow:
  impersonate:
    users: ['jenkins']
    roles: ['jenkins']
```

This is inconvenient for two reasons:
 1. A user must exist for each set of roles you'd like to
    impersonate, creating a UX burden.
 2. It makes it difficult to use impersonation to reduce one's
    permissions as you always inherit all of the roles granted to the
    target user.

For the [certificate bot][bot] we'd instead like to use impersonation
to generate end-user (impersonated) certificates with a reduced set
of permissions. For example, given the following role:
```yaml
allow:
  impersonate:
    roles: ['jenkins', 'deploy']
```

We can then use `GenerateUserCerts` to issue certifices for a subset
of the allowed roles, e.g. one set of certificates with only the
`jenkins` role attached, and another with only `deploy`.

To that end, this patch:
 1. Removes the requirement that roles define both `users` and
    `roles` in impersonate conditions
 2. Introduces a new `RoleRequests` field in `UserCertsRequest`
 3. Modifies `generateUserCerts` to gather `roles` from
    `RoleRequests` if allowed by an `allow` (with no `users`)

[bot]: #7986

* Add `determineDesiredRolesAndTraits`; audit log on role impersonation

This splits initial role and trait determination into a new function,
`determineDesiredRolesAndTraits`, to improve control flow and clarity
given the new branches introduced for role impersonation.

Additionally, this moves the call to `CheckRoleImpersonation` down
to match regular user impersonation's flow, and emits an audit log
event on failure.

* Formatting fix

* Unit testing for role requests

This adds a new set of unit tests for role requests.

Also discovered the `impersonator` field wasn't being set for
role impersonation, so it's now set to the user's own username.
In other words, role impersonation will appear (in the audit log and
elsewhere) as self-impersonation.

* Clean up testing users between runs

* Deny most reimpersonation cases and add tests

This attempts to deny most cases of reimpersonation, where an
impersonated certificate might be used to generate certificates for
other roles the user is allowed to impersonate.

One test case is currently failing pending a solution.

* Add new DisallowReissue certificate extension

This adds a new DisallowReissue certificate extension that, if set,
prevents that identity from interacting with `GenerateUserCerts`.

This flag is always set when RoleRequests are used to prevent
unintended privilege escalation (while avoiding breaking changes to
Teleport's existing certificate generation behavior).

* Fix test lints

* Fix typo

* Fix test doc typo, add testcase for user impersonation misuse

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Accept context in CreateRole per review feedback

* Fix misleading comment

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
@timothyb89 timothyb89 marked this pull request as ready for review February 25, 2022 17:58
@github-actions github-actions Bot added the rfd Request for Discussion label Feb 25, 2022
Comment thread rfd/0056-bot-for-cert-renewals.md Outdated
```

In future iterations, additional output destination backends may be considered.
A few ideas include:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we able to use MachineID to obtain AWS creds via Teleport Application Access?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about our AWS app access story today to say for certain, but if AWS can be configured to the trust certs issued by Teleport it should be possible.

This initial iteration of Machine ID only issues barebones certs so at the moment app / db / kubernetes access aren't expected to work. We'll be looking into supporting these as soon as possible, though.

Comment thread rfd/0056-bot-for-cert-renewals.md
@benarent benarent changed the title First draft RFD for cert renewal bot First draft RFD for Machine ID Mar 9, 2022
@benarent
Copy link
Copy Markdown
Contributor

benarent commented Apr 1, 2022

The RFD is a handy reference, is now a good time to merge?

@russjones
Copy link
Copy Markdown
Contributor

@timothyb89 Yeah, I think we can merge this now.

@timothyb89 timothyb89 changed the title First draft RFD for Machine ID RFD 64: Bot for certificate renewals (Machine ID) Apr 4, 2022
zmb3 and others added 15 commits April 4, 2022 15:16
 * Detailed new UX with config file fallback for complex cases
 * Added significantly more detail on how + why we'll use
   impersonation
 * Detailed plans for preventing certificate propagation
 * Added several notes on security plans (and several unsolved
   issues)
 * Updated several TODOs
 * Added more TODOs than removed, oops
* standardize on `destinations`
* use only directory destinations on the CLI, remove unneeded section
  on CLI syntax
* only generate one type of cert by default, allow generating both
* document new config assistant approach (auto templates + helper CLI)
* document memory-only storage backend
* add detail to config example
* use `/var/lib/teleport/bot` instead of `/var/lib/tbot`
* mostly remove discussion of webhooks and k8s secrets (still
  referenced in a "future ideas" section)
* several wording improvements
* Document plans on ACL use (`tbot init`)
* Clarify plans on automatic bot locking on cert generation conflict
* Add notes about EC2 identity documents in the future
* Add note on disallow-reissue
* Simplify plan for reload events (FS watch only + `tbot watch`
  helper)
* Remove Bot resource
* Use only gRPC endpoints
Also, remove "API Client Refresh" section
Aiming for merge readiness.
Co-authored-by: Ben Arent <ben@goteleport.com>
@timothyb89 timothyb89 enabled auto-merge (squash) April 15, 2022 19:15
@timothyb89 timothyb89 merged commit 8c887bb into master Apr 15, 2022
@timothyb89 timothyb89 deleted the zmb3/cerbot-rfd branch April 15, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfd Request for Discussion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants