Skip to content

Add renewable certificate generation checks#10098

Merged
timothyb89 merged 15 commits intotimothyb89/tbotfrom
timothyb89/tbot-generation-counter
Feb 18, 2022
Merged

Add renewable certificate generation checks#10098
timothyb89 merged 15 commits intotimothyb89/tbotfrom
timothyb89/tbot-generation-counter

Conversation

@timothyb89
Copy link
Copy Markdown
Contributor

@timothyb89 timothyb89 commented Feb 2, 2022

This adds a new validation check for renewable certificates that maintains a renewal counter as both a certificate extension and a user label. This counter is used to ensure only a single certificate lineage can exist: for example, if a renewable certificate is stolen, only one copy of the certificate can be renewed as the generation counter will not match.

When renewing a certificate, first the generation counter presented by the user (via their TLS identity) is compared to a value stored with the associated user (in a new teleport.dev/bot-generation label field). If they aren't equal, the renewal attempt fails. Otherwise, the generation counter is incremented by 1, stored to the database using a CompareAndSwap() to ensure atomicity, and set on the generated certificate for use in future renewals.

Summary of changes:

  1. Adds a new Generation extension to TLS and SSH certificates, with associated fields in Identity, IdentityContext, UserCertParams, certRequest, etc.
  2. Adds a new CompareAndSwapUser method for performing compare-and-swap operation on User objects.
  3. Adds a new validateGenerationLabel() method, called by generateUserCerts(), to check and update the generation field + label. If the counters do not match, a lock is created on the bot user and an audit warning is emitted.

For more information about the intent behind this, refer to the "Preventing Certificate Propagation" section in the certificate bot RFD.

Reliability considerations

  • CompareAndSwap() doesn't seem to be hugely battle-tested and until now has only been used during CA rotations. A couple of bugs were found during the implementation (see also Enable map key sorting in utils.FastMarshal #10070)
  • Adds a modest amount of complexity to the already-complex generateUserCerts() process, particularly:
    • Adds a cache-bypassing GetUser() read and a CompareAndSwap() write to an API call that was otherwise (mostly) read-only (I think the only writes previously were to the audit log).
  • In general this change does severely increase the failure potential for certificate renewal attempts. Most of the biggest failure points have been smoothed out but there are some that probably can't be (e.g. now the bot simply breaks if it fails to save a set of freshly-renewed certs)

Security considerations

  • The CompareAndSwapUser() call in validateGenerationLabel() bypasses RBAC controls to update the User. It's a very narrow update so I don't think this should pose a problem by itself.
  • We're relying on CompareAndSwapUser() to act as a security control and prevent race conditions if e.g. an attacker steals a renewable certificate.
  • This mechanism is mainly designed to limit abuse potential in case certificates are stolen, rather than eliminate it. There is still a fairly wide window of time where an attacker could steal and use a certificate before being discovered.

Outstanding TODOs

  • Certificate generation mismatch should produce an audit log event and lock the associated bot user (per the bot RFD)
  • Thorough unit testing

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.
@timothyb89 timothyb89 requested a review from zmb3 February 2, 2022 23:56
This was referenced Feb 2, 2022
@timothyb89
Copy link
Copy Markdown
Contributor Author

Just hit (I think) gogo/protobuf#693 due to user traits now being set, so now we just fetch the user from the database a bunch of times to copy it. Which is insane, but I'm not sure there's a better way. We can't marshal/unmarshal it because importing the code to do that creates an import cycle 🙃

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.
@zmb3
Copy link
Copy Markdown
Collaborator

zmb3 commented Feb 14, 2022

Food for thought:

I wonder if we need a new label prefix. Thus far, I believe teleport.dev has been used for things that are set by teleport but potentially useful for users. For example:

  • teleport.dev/origin indicates whether a resource was defined in a config file or discovered dynamically
  • teleport.dev/os Contains the OS of a Windows desktop discovered via LDAP

I'm not aware of any instances where teleport.dev is necessary for teleport itself to function correctly. This would be the first. If a user changes or removes this label, things break, right?

Maybe a teleport.internal/ prefix for things that are required by Teleport and should never be touched by users makes sense. We could always hide these in the UI and CLIs unless some sort of --all-labels flag was present.

@timothyb89
Copy link
Copy Markdown
Contributor Author

Maybe a teleport.internal/ prefix for things that are required by Teleport and should never be touched by users makes sense. We could always hide these in the UI and CLIs unless some sort of --all-labels flag was present.

I like that, it clarifies the intent nicely.

I've also debated whether or not some of these fields could just be added to the User struct directly. Realistically though I don't think I see that happening before v9.

@timothyb89
Copy link
Copy Markdown
Contributor Author

Now that locking is working, I figure this is ready enough to at least merge into the main PR (#10099), pending review of course.

@timothyb89 timothyb89 marked this pull request as ready for review February 16, 2022 01:40
@github-actions github-actions Bot added audit-log Issues related to Teleports Audit Log tctl tctl - Teleport admin tool labels Feb 16, 2022
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/services/local/users.go Outdated
@timothyb89 timothyb89 force-pushed the timothyb89/tbot-generation-counter branch from 3beed61 to 2416ef8 Compare February 18, 2022 20:30
* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
@timothyb89
Copy link
Copy Markdown
Contributor Author

I'm going to force squash+merge this one into the parent PR (#10099) for the sake of reviewers

@timothyb89 timothyb89 merged commit 004b25c into timothyb89/tbot Feb 18, 2022
@timothyb89 timothyb89 deleted the timothyb89/tbot-generation-counter branch February 18, 2022 21:00
timothyb89 added a commit that referenced this pull request Feb 19, 2022
* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
timothyb89 added a commit that referenced this pull request Mar 10, 2022
* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Implement `tbot init` subcommand

This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

* Remove ModeHint

* Rename Identity.Cert and Identity.XCert

* Add `symlinks` flag to tbot config

The optional symlinks flag for directory destinations allows users to
opt in / out of whichever symlink attack hardening mode is selected
by default.

* Add mostly-working secure implementation of botfs.Create/Write

This adds symlink mode selection (secure, try-secure, insecure) and
Linux `Create()`/`Write()` implementations to open files safely.

* Add configurable ACL modes and verify ACL support in tbot init

* Initialize destinations at startup and test before renewal

This initializes destinations at startup (to create directories if
not using `tbot init`) and tests them to ensure the bot can write
_before_ attempting to renew certificates; this should prevent most
accidental generation counter locks.

* Hide watch for now

* Issue a new identity if a token change is detected

* Warn if identity appears to be expired on startup

* Fully implement ACL Verify and Configure

 - Fully implements ACL support for Linux
 - Adds bot-side verification support to ensure ACLs are configured
   properly at runtime.
 - Gracefully falls back to no ACLs if the platform / filesystem
   doesn't support them
 - Clear up outstanding lints

* Make `tbot init` work without a config file

* Show init instructions in tctl bots add

Also:
 - Make --bot-user a flag in init (the tctl instructions were
   confusing otherwise)
 - Handle IsOwnedBy sanely on unsupported platforms
 - Add Bold colorizing support

* Clear some TODOs and rephrase tctl help

* Fix typo

* Fix token hash detection bug

* Actually read and write certs with symlink enforcement

Also, fix a config loading bug where CheckAndSetDefaults() wasn't
being called in all cases with CLI destinations.

* Add workaround for OpenSSH permissions check with ACLs

OpenSSH has an overly-paranoid permissions check that forces key
files to be exclusively owner-readable. Unfortunately, for POSIX
compatibility purposes, when ACLs are set, the ACL mask is set as
the group permissions. This effectively makes any ACL incompatible
with OpenSSH.

However, OpenSSH's check does have an escape hatch: it only applies
if the current user is the owner of the file. Therefore, this change
tweaks the `tbot init` flow to create files as root, owned by a
separate user (either `nobody` or even the bot user), with ACL
permissions granting both the bot and reader user access to the
certificates. This effectively bypasses OpenSSH's permissions check
and should preserve our security boundaries.

* Fix lints

* Fix an improper directory chmod to 0600 if ACL test fails

* First pass of tbot init unit tests

* Add symlink tests and fix bug with resolving the default owner

* Fix err misuse

* Fix an ACL error if the bot or reader user is the owner.

* Fix typo

* Fix missing error case in VerifyACL causing unreadable directories

* Address review feedback

- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Address review feedback

- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`

* Fix lint error

* Fix imports in fs_other

* Fix possible nil pointer deref if storage is unset

* Use the bot user as default owner

This is more likely to be a safe owner choice than `nobody:nobody`.

* Apply suggestions from code review

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* Code review fixes

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
timothyb89 added a commit that referenced this pull request Mar 10, 2022
---

Implement `tbot init` subcommand and ACL management (#10289)

* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Implement `tbot init` subcommand

This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

* Remove ModeHint

* Rename Identity.Cert and Identity.XCert

* Add `symlinks` flag to tbot config

The optional symlinks flag for directory destinations allows users to
opt in / out of whichever symlink attack hardening mode is selected
by default.

* Add mostly-working secure implementation of botfs.Create/Write

This adds symlink mode selection (secure, try-secure, insecure) and
Linux `Create()`/`Write()` implementations to open files safely.

* Add configurable ACL modes and verify ACL support in tbot init

* Initialize destinations at startup and test before renewal

This initializes destinations at startup (to create directories if
not using `tbot init`) and tests them to ensure the bot can write
_before_ attempting to renew certificates; this should prevent most
accidental generation counter locks.

* Hide watch for now

* Issue a new identity if a token change is detected

* Warn if identity appears to be expired on startup

* Fully implement ACL Verify and Configure

 - Fully implements ACL support for Linux
 - Adds bot-side verification support to ensure ACLs are configured
   properly at runtime.
 - Gracefully falls back to no ACLs if the platform / filesystem
   doesn't support them
 - Clear up outstanding lints

* Make `tbot init` work without a config file

* Show init instructions in tctl bots add

Also:
 - Make --bot-user a flag in init (the tctl instructions were
   confusing otherwise)
 - Handle IsOwnedBy sanely on unsupported platforms
 - Add Bold colorizing support

* Clear some TODOs and rephrase tctl help

* Fix typo

* Fix token hash detection bug

* Actually read and write certs with symlink enforcement

Also, fix a config loading bug where CheckAndSetDefaults() wasn't
being called in all cases with CLI destinations.

* Add workaround for OpenSSH permissions check with ACLs

OpenSSH has an overly-paranoid permissions check that forces key
files to be exclusively owner-readable. Unfortunately, for POSIX
compatibility purposes, when ACLs are set, the ACL mask is set as
the group permissions. This effectively makes any ACL incompatible
with OpenSSH.

However, OpenSSH's check does have an escape hatch: it only applies
if the current user is the owner of the file. Therefore, this change
tweaks the `tbot init` flow to create files as root, owned by a
separate user (either `nobody` or even the bot user), with ACL
permissions granting both the bot and reader user access to the
certificates. This effectively bypasses OpenSSH's permissions check
and should preserve our security boundaries.

* Fix lints

* Fix an improper directory chmod to 0600 if ACL test fails

* First pass of tbot init unit tests

* Add symlink tests and fix bug with resolving the default owner

* Fix err misuse

* Fix an ACL error if the bot or reader user is the owner.

* Fix typo

* Fix missing error case in VerifyACL causing unreadable directories

* Address review feedback

- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Address review feedback

- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`

* Fix lint error

* Fix imports in fs_other

* Fix possible nil pointer deref if storage is unset

* Use the bot user as default owner

This is more likely to be a safe owner choice than `nobody:nobody`.

* Apply suggestions from code review

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* Code review fixes

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
timothyb89 added a commit that referenced this pull request Mar 10, 2022
---

Implement `tbot init` subcommand and ACL management (#10289)

* Add certificate renewal bot

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.

* Cert bot refactoring pass

* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup

* Do not pass through `renewable` flag when role requests are set

This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.

* Various tbot UX improvements; render SSH config

* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers

* Add stubs for tbot init and watch commands

* Add gRPC endpoints for managing bots

* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`

* Fix outdated destination flag in example tbot command

* Bugfix pass for demo

* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers

* Address first wave of review feedback

Fixes the majority of smaller issues caught by reviewers, thanks all!

* Add doc comments for bot.go functions

* Return the token TTL from CreateBot

* Split initial user cert issuance from `generateUserCerts()`

Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.

* Set bot traits to silence log messages

* tbot log message consistency pass

* Implement `tbot init` subcommand

This adds a new CLI subcommand to initialize a tbot destination
directory by creating required files ahead of time and assigning
proper permissions (and ACLs, where possible).

* Resolve lints

* Add config tests

* Remove CreateBotJoinToken endpoint

Users should instead use the CreateBot/DeleteBot endpoints.

* Create a fresh private key for every impersonated identity renewal

* Hide `config` subcommand

* Rename bot label prefix to `teleport.internal/`

* Use types.NewRole() to create bot roles

* Clean up error handling in custom YAML unmarshallers

Also, add notes about the supported YAML shapes.

* Fetch proxy host via gRPC Ping() instead of GetProxies()

* Update lib/auth/bot.go

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Fix some review comments

* Add renewable certificate generation checks (#10098)

* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* Address another batch of review feedback

* Addres another batch of review feedback

Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.

* Fix lint

* Add missing doc comments to SaveIdentity / LoadIdentity

* Remove pam tag from tbot build

* Update note about bot lock deletion

* Another pass of review feedback

Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`

* Remove ModeHint

* Rename Identity.Cert and Identity.XCert

* Add `symlinks` flag to tbot config

The optional symlinks flag for directory destinations allows users to
opt in / out of whichever symlink attack hardening mode is selected
by default.

* Add mostly-working secure implementation of botfs.Create/Write

This adds symlink mode selection (secure, try-secure, insecure) and
Linux `Create()`/`Write()` implementations to open files safely.

* Add configurable ACL modes and verify ACL support in tbot init

* Initialize destinations at startup and test before renewal

This initializes destinations at startup (to create directories if
not using `tbot init`) and tests them to ensure the bot can write
_before_ attempting to renew certificates; this should prevent most
accidental generation counter locks.

* Hide watch for now

* Issue a new identity if a token change is detected

* Warn if identity appears to be expired on startup

* Fully implement ACL Verify and Configure

 - Fully implements ACL support for Linux
 - Adds bot-side verification support to ensure ACLs are configured
   properly at runtime.
 - Gracefully falls back to no ACLs if the platform / filesystem
   doesn't support them
 - Clear up outstanding lints

* Make `tbot init` work without a config file

* Show init instructions in tctl bots add

Also:
 - Make --bot-user a flag in init (the tctl instructions were
   confusing otherwise)
 - Handle IsOwnedBy sanely on unsupported platforms
 - Add Bold colorizing support

* Clear some TODOs and rephrase tctl help

* Fix typo

* Fix token hash detection bug

* Actually read and write certs with symlink enforcement

Also, fix a config loading bug where CheckAndSetDefaults() wasn't
being called in all cases with CLI destinations.

* Add workaround for OpenSSH permissions check with ACLs

OpenSSH has an overly-paranoid permissions check that forces key
files to be exclusively owner-readable. Unfortunately, for POSIX
compatibility purposes, when ACLs are set, the ACL mask is set as
the group permissions. This effectively makes any ACL incompatible
with OpenSSH.

However, OpenSSH's check does have an escape hatch: it only applies
if the current user is the owner of the file. Therefore, this change
tweaks the `tbot init` flow to create files as root, owned by a
separate user (either `nobody` or even the bot user), with ACL
permissions granting both the bot and reader user access to the
certificates. This effectively bypasses OpenSSH's permissions check
and should preserve our security boundaries.

* Fix lints

* Fix an improper directory chmod to 0600 if ACL test fails

* First pass of tbot init unit tests

* Add symlink tests and fix bug with resolving the default owner

* Fix err misuse

* Fix an ACL error if the bot or reader user is the owner.

* Fix typo

* Fix missing error case in VerifyACL causing unreadable directories

* Address review feedback

- Rename ACLOn -> ACLRequired
- Simplify fs_linux.Read()
- Add missing fs_other.Read()
- Hoist renewal loop logic into its own function
- A few misc bugfixes

* Apply suggestions from code review

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Address review feedback

- Only log syscall warning once
- Formatting and wording changes
- Improve error handling for `--clean`

* Fix lint error

* Fix imports in fs_other

* Fix possible nil pointer deref if storage is unset

* Use the bot user as default owner

This is more likely to be a safe owner choice than `nobody:nobody`.

* Apply suggestions from code review

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* Code review fixes

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log tctl tctl - Teleport admin tool

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants