Machine ID: Introduce BotService#35441
Conversation
480dd85 to
eea5e1d
Compare
c6b2863 to
1cbbe8c
Compare
|
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
|
@strideynet - this PR will require admin approval to merge due to its size. Consider breaking it up into a series smaller changes. |
codingllama
left a comment
There was a problem hiding this comment.
Drive-by review, just regarding that one comment.
ravicious
left a comment
There was a problem hiding this comment.
Gave it just a cursory look as I'm not the best person to review this, but I see that you already requested a review from Tim.
| // CreateBot creates a new bot user. | ||
| rpc CreateBot(CreateBotRequest) returns (CreateBotResponse); | ||
| // | ||
| // Deprecated: Use [teleport.machineid.v1.BotService] instead. |
There was a problem hiding this comment.
How can I ensure that a doc link like this is valid?
This comment ends up in api/client/proto/authservice.pb.go. I was under the impression that if machineid.v1 is not directly imported in that file, then a doc link must use the full import path, which I assume would be github.com/gravitational/teleport/…. Isaiah pointed it out in one of my PRs but I've never tried to actually verify if my doc links are correct.
https://tip.golang.org/doc/comment#doclinks
I know it doesn't matter much as most of the time those "doc links" will not end up in any docs and are just for internal reference. Still, it's been bugging me since it was pointed out to me so I figured maybe you know the answer.
There was a problem hiding this comment.
Ah - I suppose it's a little odd here since my intention was to "comment on the proto" and therefore here I've referred to the proto service rather than the generated code.
There was a problem hiding this comment.
You can:
$ go install golang.org/x/pkgsite/cmd/pkgsite@latest
$ pkgsite .
And then navigate to http://localhost:8080/github.com/gravitational/teleport#section-directories to see the docs rendered locally.
| return bot, nil | ||
| } | ||
|
|
||
| func (bs *BotService) UpdateBot( |
There was a problem hiding this comment.
Throughout the codebase we have these UpdateX functions which semantically mean "if this resource already exists, overwrite it completely (a-la Upsert), otherwise error". AFAICT it's very unusual to need this semantic:
If you're calling UpdateX it almost always means that you've just called GetX, changed a field or two, and are now updating the object. This makes the check for "if this resource already exists" superfluous -- you already know it exists because you just got it with GetX.
Note that HTTP doesn't have any such semantic:
-
CREATE:CreateX -
GET:GetX -
DELETE:DeleteX -
PUT:UpsertX -
PATCH: no equivalent except an oddUpdateXlikeUpdateRemoteCluster -
POST: according to the RFC:The POST method requests that the target resource process the
representation enclosed in the request according to the resource's
own specific semantics.Arguably our
UpdateXis this miscellaneous type, however as I pointed out previously it's an odd semantic to implement.
Given this line of reasoning, I think we should aim to avoid adding these UpdateX by default. Usually an Upsert seems to do the job.
One loose end I have on this topic is: how do Upserts behave with the new optimistic locking behavior? If a resource has a Revision, will it be checked and rejected if it doesn't match what's in the database on an Upsert? If yes, then Upsert already implements the current Update semantic (and Updates can be replaced with Upserts). If that's not how it works, and Upserts just ignore Revision, then the current Updates could be seen as means of distributed synchronization to prevent a case like
- My code calls
GetX - Some other part of the system calls
DeleteXon that object - My code updates a field and calls
UpdateXon that object
However I don't think this was an intentional design pattern.
There was a problem hiding this comment.
Fwiw: my implementation here is following the recently proposed RFD 153 on the design of resource APIs. So it may be worth following up with some of these concerns there - #34103 .
This makes the check for "if this resource already exists" superfluous
I don't think this is true - we're operating in a distributed system. I can think of a few scenarios where writing to a resource which has been modified or deleted since you read it risks harming consistency and whilst we don't have "real" locking this seems like the most sensible alternative.
Given this line of reasoning, I think we should aim to avoid adding these UpdateX by default. Usually an Upsert seems to do the job.
I believe if anything long term goal is to do the opposite of what you suggest, and eliminate upsert in favour of create and update. Right now, several things rely on the ability to upsert (e.g tctl create -f / the terraform provider / the kubernetes operator). Once we reach a point where all of those no longer rely on this behaviour, then I think we can almost globally deprecate upsert RPCs.
One loose end I have on this topic is: how do Upserts behave with the new optimistic locking behavior?
IIRC the behaviour of upsert RPCs has not been changed. Optimistic locking is entirely ignored.
However I don't think this was an intentional design pattern.
It was my understanding that this was the entire motivation behind adding optimistic locking to the update RPC.
In addition to this,update RPCs are encouraged to support update masks where appropriate. At least in this case, the UpdateBot RPC implements this to allow selective modification of the roles or traits associated with the Bot without knowledge of the state of other fields.
There was a problem hiding this comment.
Upsert is by definition unconditional, and it's required (at least as a primitive operation, it doesn't have to be exposed through grpc necessarily) for the cache to work.
The old Update primitive backend operation has no right to exist, and it has the semantics that @ibeckermayer mentioned. ConditionalUpdate is the one that you want to use, and you use exactly like that. You get the resource, you make some decisions and calculate the new resource, you ConditionalUpdate the resource with a constraint on the revision you got, and your update succeeds if and only if nothing has written the underlying backend item since you read it.
Whether the revision should be exposed to the client over grpc is a choice that should be done depending on the resource type and the semantics we want to give to each operation; it might make sense sometimes to expose an "update" function that lets the caller update some fields with no regard to what was there in the first place, but even in that case the auth server should 100% use the conditional update operation internally.
There was a problem hiding this comment.
(and yeah, "i just read the item successfully so the item exists" is 100% bogus and it's a pattern that we should start removing from Teleport whenever possible)
There was a problem hiding this comment.
@strideynet
It was my understanding that this was the entire motivation behind adding optimistic locking to the update RPC.
I was referring to the state of things pre-optimistic-locking.
I believe if anything long term goal is to do the opposite of what you suggest, and eliminate upsert in favour of create and update. Right now, several things rely on the ability to upsert (e.g tctl create -f / the terraform provider / the kubernetes operator). Once we reach a point where all of those no longer rely on this behaviour, then I think we can almost globally deprecate upsert RPCs.
Seems like an odd goal... Is create -f not the user explicitly asking for an upsert? I can imagine implementing it with a get + update, but I'm not currently seeing what the point of that would be.
@espadolini
Upsert is by definition unconditional
Granted, dumb question.
The old Update primitive backend operation has no right to exist, and it has the semantics that @ibeckermayer mentioned. ConditionalUpdate is the one that you want to use, and you use exactly like that. You get the resource, you make some decisions and calculate the new resource, you ConditionalUpdate the resource with a constraint on the revision you got, and your update succeeds if and only if nothing has written the underlying backend item since you read it.
Is this not essentially what Update has become, presuming it now uses the optimistic locking mechanism?
I can see several plausible update-like semantics:
- update the entire object irrespective of revision (error if object dne)
- update the entire object on a specific revision (error if revision/object dne)
- patch a few fields irrespective of revision (error if object dne)
- patch a few fields on a specific revision (error if revision/object dne)
- write an entire object irrespective of revision/existence (upsert)
analysis:
- is the status quo
- is presumably the primary reason we implemented optimistic locking
- we have in a few places (HTTP patch semantic)
- seems practically useless -- if we have the object's revision, that typically means we have the object, so we may as well just use 2
- seems useful to me, although there seems to be some discussion of getting rid of it
Feel free to enlighten me on where I might be confused about how our optimistic locking system works (highly plausible), what we need it to enforce (I'm mostly ignorant except in very general terms) or what I'm missing about upsert.
There was a problem hiding this comment.
Thanks for pointing out that PR @strideynet, this whole discussion is more suited to that RFD so let's continue it over there: #34103 (comment)
There was a problem hiding this comment.
@ibeckermayer do you think this blocks this PR moving forward ? I'm a little concerned about how long discussing this will run and I have some time pressure to get this merged as it's blocking other work.
There was a problem hiding this comment.
Not necessarily, I can review the rest and we can just consider this potentially-pending-a-future-PR.
Since you did go through the trouble of adding a field mask, it's worth following this line of inquiry through to its conclusion.
timothyb89
left a comment
There was a problem hiding this comment.
Just a few comments after playing with the branch for a while. The implementation turned out pretty nice! The separate service and RFD 153 resources make this much tidier than I remember.
I didn't notice any compatibility issues but haven't yet tested the Terraform provider integration (which probably could use an update anyway, now that we have sane bot objects).
Co-authored-by: Tim Buckley <tim@goteleport.com>
|
FYI @timothyb89 - I've addressed your comments if you'd like to re-review. |
timothyb89
left a comment
There was a problem hiding this comment.
Looks good, I'm excited to finally see this!
|
Flaky Test Detector is failing due to test length - it looks like the |
|
@strideynet This test will be skipped by the flaky test detector momentarily - #35882 |
|
Merged |
| // CreateBot creates a new bot user. | ||
| rpc CreateBot(CreateBotRequest) returns (CreateBotResponse); | ||
| // | ||
| // Deprecated: Use [teleport.machineid.v1.BotService] instead. |
There was a problem hiding this comment.
You can:
$ go install golang.org/x/pkgsite/cmd/pkgsite@latest
$ pkgsite .
And then navigate to http://localhost:8080/github.com/gravitational/teleport#section-directories to see the docs rendered locally.
| // IsBot returns true if the user is a bot. | ||
| func (u UserV2) IsBot() bool { | ||
| _, ok := u.GetMetadata().Labels[BotGenerationLabel] | ||
| _, ok := u.GetMetadata().Labels[BotLabel] |
There was a problem hiding this comment.
Does creating the new bot resource still create a user and role in the backend, or are those replaced by the bot resource?
There was a problem hiding this comment.
The Bot resource is still just transformed into a User/Role - eventually we'll remove this but that'll involve delving into the core of Teleport's RBAC 😓
This change here is just to make this a little less fragile. I'm not quite sure why we used the Generation label here - it stays at "0" for bots using delegated joining and could be removed in future. The BotLabel is a much better indicator of a user/role being linked to a bot.
| // BotResourceName returns the default name for resources associated with the | ||
| // given named bot. | ||
| func BotResourceName(botName string) string { | ||
| return "bot-" + strings.ReplaceAll(botName, " ", "-") |
There was a problem hiding this comment.
Do we need to remove / as well?
There was a problem hiding this comment.
At the moment, this has just moved the previous implementation of BotResourceName from lib/auth into lib/auth/machineid/machineidv1 - I do agree that maybe we ought to replace / but I feel it's out of scope for this PR and I don't want to introduce additional risk by changing this.
Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
|
@strideynet See the table below for backport results.
|
Closes #33808
Closes #10477
Quite a few deprecations in this one. Deprecated RPCs and perms now emit a warning log message which should encourage most folks to migrate prior to v16.0.0.
Fairly large diff - in part due to about 1800 lines of tests that expand the coverage of bot related code significantly. Some of these tests are for the legacy/deprecated endpoints that weren't covered previously but I wanted to make sure they behaved properly in their deprecated state.
changelog: SEE PR
Machine ID Bots can now be managed as yaml with the typical
tctlcommands, e.g:tctl get bot footctl get botstctl create -f bot.yamlThe ability to specifically control access to managing Bots has been added to Teleport's RBAC engine. Users managing Bots should now be explicitly granted permissions to manage the
botresource. eg:The previous required permissions, a combination of access to
userandrole, will no longer grant access to Bot RPCs in Teleport 16.0.0.Additionally, three new audit events are now emitted:
bot.createbot.updatebot.deleteFor those using the Teleport API, the following RPCs are deprecated and will stop functioning in Teleport 16.0.0:
proto.AuthService.CreateBot- switch tomachineid.v1.BotService.CreateBotproto.AuthService.DeleteBot- switch tomachineid.v1.BotService.DeleteBotproto.AuthService.GetBotUsers- switch tomachineid.v1.BotService.ListBots