Skip to content

Certificate renewal bot#10099

Merged
timothyb89 merged 39 commits intomasterfrom
timothyb89/tbot
Feb 19, 2022
Merged

Certificate renewal bot#10099
timothyb89 merged 39 commits intomasterfrom
timothyb89/tbot

Conversation

@timothyb89
Copy link
Copy Markdown
Contributor

@timothyb89 timothyb89 commented Feb 2, 2022

This adds a new tbot tool to continuously renew a set of certificates after registering with a Teleport cluster using a similar process to standard node joining.

This makes some modifications to user certificate generation to allow for certificates that can be renewed beyond their original TTL, and exposes new gRPC endpoints:

  • CreateBot creates a bot user, role, and a join token
  • DeleteBot removes an existing bot user and its resources
  • GetBotUsers fetches all Users with bot fields set
  • GenerateInitialRenewableUserCerts exchanges a token for a set of certificates with a new renewable flag set

A new tctl command, tctl bots add, creates a bot user and calls CreateBotJoinToken to issue a token. A bot instance can then be started using a provided command.

Refer to the RFD for more details: #7986

Security Considerations

  • Renewable certificates: certificates that can be renewed indefinitely (until the backing User is locked or deleted)
  • Bot management
    • CreateBot and DeleteBot gRPC calls create roles which allow role impersonation (i.e. they allow arbitrary privilege escalation). The endpoints do check that the calling user has relevant RBAC permissions to create/delete Roles and Users. We assume that the "create Role" RBAC privilege is effectively root access in Teleport today.

Outstanding TODOs

This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.

This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
 * `CreateBotJoinToken` creates a join token for a bot user
 * `GenerateInitialRenewableUserCerts` exchanges a token for a set of
   certificates with a new `renewable` flag set

A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.
* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup
This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.

In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.
* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers
* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
  `BotGenerationLabel`
* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers
@timothyb89 timothyb89 requested a review from zmb3 February 2, 2022 23:56
@timothyb89
Copy link
Copy Markdown
Contributor Author

timothyb89 commented Feb 2, 2022

Copy link
Copy Markdown
Contributor

@nklaassen nklaassen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass on this

Comment thread lib/auth/auth.go
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/auth/bot.go Outdated
Comment thread lib/auth/bot.go Outdated
Comment thread lib/auth/usertoken.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go
Comment thread tool/tbot/main.go
Copy link
Copy Markdown
Collaborator

@zmb3 zmb3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.

Comment thread api/client/proto/authservice.proto Outdated
Comment thread api/client/proto/authservice.proto
Comment thread api/client/proto/authservice.proto Outdated
Comment thread api/client/proto/authservice.proto Outdated
Comment thread api/client/proto/authservice.proto Outdated
Comment thread tool/tbot/config/config_destination.go Outdated
Comment thread tool/tbot/config/config_storage.go Outdated
Comment thread tool/tbot/config/configtemplate_ssh.go Outdated
Comment thread tool/tbot/utils/destination.go Outdated
Comment thread tool/tctl/common/bots_command.go Outdated
@timothyb89 timothyb89 mentioned this pull request Feb 3, 2022
2 tasks
Fixes the majority of smaller issues caught by reviewers, thanks all!
Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).

This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.
Users should instead use the CreateBot/DeleteBot endpoints.
Comment thread lib/auth/bot.go Outdated

// createBotRole creates a role from a bot template with the given parameters.
func createBotRole(ctx context.Context, s *Server, botName string, resourceName string, roleRequests []string) error {
return s.UpsertRole(ctx, &types.RoleV4{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use NewRole which will call CheckAndSetDefaults

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, looks like Role is missing SetMetadata() so setting our labels is a bit ugly, oh well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could implement SetMetadata(), it's probably just not there because there was no caller

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and implemented SetMetadata()

Comment thread tool/tbot/config/configtemplate.go Outdated
Comment thread tool/tbot/config/configtemplate_ssh.go Outdated
Comment thread tool/tbot/config/destination_directory.go Outdated
Comment thread tool/tbot/config/destination_directory.go
Comment thread tool/tbot/config/destination_memory.go Outdated
Comment thread tool/tbot/identity/identity.go Outdated
@timothyb89
Copy link
Copy Markdown
Contributor Author

I think this is about as ready as it's going to get for review, though it's a bit of a monster so apologies in advance to the reviewers.

A rough review/merge plan that hopefully makes sense:

@timothyb89 timothyb89 marked this pull request as ready for review February 17, 2022 00:01
@github-actions github-actions Bot added audit-log Issues related to Teleports Audit Log tctl tctl - Teleport admin tool labels Feb 17, 2022
Comment thread Makefile Outdated

.PHONY: $(BUILDDIR)/tbot
$(BUILDDIR)/tbot:
GOOS=$(OS) GOARCH=$(ARCH) $(CGOFLAG) go build -tags "$(PAM_TAG) $(FIPS_TAG)" -o $(BUILDDIR)/tbot $(BUILDFLAGS) ./tool/tbot
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you get different behavior if you omit the PAM/FIPS tags? If not, can we do without them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to say "we can just drop them" but I suppose it's possible the FIPS tag is honored in some of the teleport libraries we use. PAM is almost certainly unnecessary but I included it since tsh does for some reason as well.

Copy link
Copy Markdown
Contributor Author

@timothyb89 timothyb89 Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the PAM tag. I'm wary of touching the FIPS tag since I'm not sure where it's read (no +build seems to check for it). My gut feeling is that if tsh cares, we should too.

Comment thread api/client/client.go Outdated
// DeleteBot deletes a bot user.
rpc DeleteBot(DeleteBotRequest) returns (google.protobuf.Empty);
// GetBotUsers gets all users with bot labels.
rpc GetBotUsers(GetBotUsersRequest) returns (stream types.UserV2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why does this leverage streaming?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetUsers() does and I just cloned its declaration

Comment thread lib/auth/bot.go Outdated
Comment thread lib/auth/bot.go Outdated
Comment thread lib/auth/bot.go Outdated

func (s *Server) deleteBot(ctx context.Context, botName string) error {
// TODO:
// remove any locks for the bot's impersonator role?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a separate issue where we're tracking locking of bots?

Copy link
Copy Markdown
Contributor Author

@timothyb89 timothyb89 Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mostly resolved bot locking in #10098, I think this is the only remaining open question. I'm inclined to not delete any locks automatically since that would be new (and maybe unexpected) behavior - I don't think we delete locks when e.g. removing a user or role today.

If you agree I can just remove the TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the TODO here with a note explaining why locks aren't currently deleted. Let me know if you'd prefer to see a behavior change instead!

Comment thread lib/auth/bot.go Outdated
Comment thread lib/auth/bot.go Outdated
Comment thread tool/tbot/config/configtemplate.go Outdated
Comment thread tool/tbot/config/configtemplate_ssh.go Outdated
Comment thread api/client/client.go Outdated
Comment thread api/types/constants.go
Comment thread lib/auth/auth_with_roles.go Outdated
Comment on lines +2095 to +2096
// roles listed in the request and doesn't attempt to verify that the
// current user has permissions for those embedded roles. We assume that
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this result in a privilege escalation? So you can create a bot that would issues certs with roles you don't have access to?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can. From what I can, tell, though, VerbCreate on Role is effectively root-level in Teleport today for exactly this reason, so at least I don't think we're introducing a new path to privilege escalation.

We do check for VerbCreate on both User and Role here so I think we have our bases covered but definitely let me know if I've missed something, this is a pretty scary bit of code IMO.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyb89 I'm curious if you have considered making a bot a "resource" like any other object in the cluster. Then it could just be managed by usual tctl create/get/rm commands, we could have a types.KindBot which would grant users explicit permissions to manage them, and would be easy to reason about "bot" as a "resource" in general. Probably a bit too late to ask design questions at this point but I'm curious if this has ever been considered.

Copy link
Copy Markdown
Contributor Author

@timothyb89 timothyb89 Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was part of Zac's original design, actually, and I ended up removing it.

Ultimately we needed to reuse so much of Teleport's traditional auth infrastructure (necessitating a backing User and Role) that the entire Bot resource existed solely to be created and removed, alongside the other resources it still needed. Removing the resource was a surprisingly large negative diff and didn't end up breaking any functionality because it just wasn't used.

I wouldn't be surprised to see it come back eventually if we eventually add more of bot-specific metadata. This whole "stuffing bot data into User labels" strategy is probably not great in the long term .

Comment thread lib/auth/bot.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go Outdated
Comment thread tool/tbot/main.go Outdated
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
timothyb89 and others added 10 commits February 18, 2022 13:25
* Add renewable certificate generation checks

This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match

When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.

* Add unit tests for the generation counter

This adds new unit tests to exercise the generation counter checks.

Additionally, it fixes two other renewable cert tests that were
failing.

* Remove certRequestGeneration() function

* Emit audit event when cert generations don't match

* Fully implement `tctl bots lock`

* Show bot name in `tctl bots ls`

* Lock bots when a cert generation mismatch is found

* Make CompareFailed respones from validateGenerationLabel() more actionable

* Update lib/services/local/users.go

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* Backend changes for tbot IoT and AWS joining (#10360)

* backend changes

* add token permission check

* pass ctx from caller

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* fix comment typo

Co-authored-by: Roman Tkachenko <roman@goteleport.com>

* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event

* Client changes for tbot IoT joining (#10397)

* client changes

* delete replaced APIs

* delete unused tbot/auth.go

* add license header

* don't unecessarily fetch host CA

* log fixes

* s/tunnelling/tunneling/

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* auth server addresses may be proxies

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* comment typo fix

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

* move *Server methods out of auth_with_roles.go (#10416)

Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>

Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>

Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.
Comment thread lib/auth/authclient/authclient.go Outdated
Comment thread lib/auth/bot.go
}

// createBot creates a new certificate renewal bot from a bot request.
func (s *Server) createBot(ctx context.Context, req *proto.CreateBotRequest) (*proto.CreateBotResponse, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function also validate that all roles specified in the CreateBotRequest are valid and exist in the system? Ignore if it's already validated somewhere and I just missed it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, I'll add a quick check

Comment thread lib/auth/bot.go Outdated
// If this fails it's likely to be some miscellaneous competing
// write. The request should be tried again - if it's malicious,
// someone will get a generation mismatch and trigger a lock.
return trace.WrapWithMessage(err, "Database comparison failed, try the request again")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would trace.CompareFailed be a more appropriate error to return?

Comment thread lib/auth/bot.go Outdated
if err != nil {
return trace.Wrap(err)
}
if err := s.UpsertLock(context.Background(), lock); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := s.UpsertLock(context.Background(), lock); err != nil {
if err := s.UpsertLock(ctx, lock); err != nil {

Comment thread lib/auth/bot.go Outdated
// If this fails it's likely to be some miscellaneous competing
// write. The request should be tried again - if it's malicious,
// someone will get a generation mismatch and trigger a lock.
return trace.WrapWithMessage(err, "Database comparison failed, try the request again")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace.CompareFailed

Comment thread lib/auth/join.go Outdated
default:
// delete bot join tokens so they can't be re-used
if err := a.DeleteToken(ctx, provisionToken.GetName()); err != nil {
log.WithError(err).Warnf("could not delete bot provision token %q after generating certs",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
log.WithError(err).Warnf("could not delete bot provision token %q after generating certs",
log.WithError(err).Warnf("Could not delete bot provision token %q after generating certs.",

Comment thread lib/auth/join.go
// bots use this endpoint but get a user cert
// botResourceName must be set, enforced in CheckAndSetDefaults
botResourceName := provisionToken.GetBotName()
expires := a.GetClock().Now().Add(defaults.DefaultRenewableCertTTL)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this generate the cert with default cert TTL instead of the one the bot is configured with?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat arbitrary. We don't pass along a desired TTL for the initial certs via the token request here which complicates things. That said, in practice bots immediately attempt to renew certs after joining, and do request a user-configurable TTL then. So for however long these initial certs last, in practice they are disposed of within seconds.

That being said, the DefaultRenewableCertTTL is way too long, I'll set it to 1 hour.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah even if it's disposed of quickly, there's always a chance that a potential attacker can snatch it so shorting the default TTL is probably a good idea.

Comment thread lib/services/local/users.go Outdated
if err := services.ValidateUser(new); err != nil {
return trace.Wrap(err)
}
newValue, err := services.MarshalUser(new.WithoutSecrets().(types.User))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do a safe type assertion to avoid panics (just in case).

Comment thread lib/services/local/users.go Outdated
ID: new.GetResourceID(),
}

existingValue, err := services.MarshalUser(existing.WithoutSecrets().(types.User))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do a safe type assertion to avoid panics (just in case).

Comment thread tool/tbot/identity/identity.go
Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`
@timothyb89 timothyb89 enabled auto-merge (squash) February 19, 2022 02:21
@timothyb89 timothyb89 merged commit bb121d7 into master Feb 19, 2022
@timothyb89 timothyb89 deleted the timothyb89/tbot branch February 19, 2022 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log tctl tctl - Teleport admin tool

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants