[connect] access requests for Teleport Connect#16694
Conversation
|
@avatus Did you consider submitting separate PRs for advanced search & access requests? It feels like one could build the access requests stuff on top of advanced search so I wonder if there were any inherent problems that prevented you from doing so. Edit: nvm, I thought that your PRs added both advanced search & access requests but they add access requests alone. |
ravicious
left a comment
There was a problem hiding this comment.
I managed to take only a glimpse at the code, I'll continue the review first thing in the morning.
| ) | ||
|
|
||
| func TestGetSortByFromString(t *testing.T) { | ||
| t.Parallel() |
There was a problem hiding this comment.
IIRC, in the old PR Zac mentioned that t.Parallel() can be removed, it looks like this one here is a leftover.
|
|
||
| option go_package = "github.com/gravitational/teleport/lib/teleterm/v1"; | ||
|
|
||
| // Database describes a database |
| for _, tt := range tests { | ||
| tt := tt | ||
| t.Run(tt.in, func(t *testing.T) { | ||
|
|
| // GetSortByFromString expects a string in format `<fieldName>:<asc|desc>` where | ||
| // index 0 is fieldName and index 1 is direction. | ||
| // If a direction is not set, or is not recognized, it defaults to ASC. | ||
| func GetSortByFromString(sortStr string) (sortBy SortBy) { |
There was a problem hiding this comment.
| func GetSortByFromString(sortStr string) (sortBy SortBy) { | |
| func GetSortByFromString(sortStr string) SortBy { |
no need for named returns here, since there's only a single return value.
There was a problem hiding this comment.
no need for named returns here, since there's only a single return value.
I recommended the named return in the other PR (#15934 (comment)). This function used to have var sortBy SortBy immediately after the signature so I figured out this could be made more compact.
| PASSWORDLESS_PROMPT_CREDENTIAL = 3; | ||
| } | ||
|
|
||
| // RequestState represents the state of a request for escalated privilege. |
There was a problem hiding this comment.
I assume this all has to be kept in sync with the existing proto types we already have for access requests? If so, maybe a comment saying "make sure to also update xyz if you change these" would be helpful.
Longer term, (as we've discussed) I'm not sure I love pluming extra RPCs through when we already have RPCs for all of this stuff, but that's out of scope for now.
| "context" | ||
| "time" | ||
|
|
||
| // "github.com/gravitational/teleport/api/client/proto" |
There was a problem hiding this comment.
If these imports aren't needed, remove them.
| @@ -44,7 +45,7 @@ type Database struct { | |||
| func (c *Cluster) GetDatabase(ctx context.Context, dbURI string) (*Database, error) { | |||
| // TODO(ravicious): Fetch a single db instead of filtering the response from GetDatabases. | |||
There was a problem hiding this comment.
It seems we can actually address this todo now that we have GetDatabases, right?
There was a problem hiding this comment.
Yeah, it's coming in a separate PR
ravicious
left a comment
There was a problem hiding this comment.
First round of reviews for today, I'll try to make another one after I finish my lunch.
ravicious
left a comment
There was a problem hiding this comment.
I looked at the proto files mostly since they dictate everything else. I'll look at the rest of the code next week.
| string created = 7; | ||
| string expires = 8; |
There was a problem hiding this comment.
What's the advantage of storing this as string over Timestamp?
I don't think Connect had to deal with dates & times in protobufs before so I'd advice to make sure this is a good way to handle them before we set a precedent (and also check how the rest of our Go codebase handles them).
There was a problem hiding this comment.
Switched to timestamp. Converting Time to timestamp was straightforward
Frontend is a bit annoying because times come through as (example)
{
seconds: 16666000000
nanos: 6778000
}so we have to account for that but not too bad
| string expires = 8; | ||
| repeated AccessRequestReview reviews = 9; | ||
| repeated string suggested_reviewers = 10; | ||
| repeated string threshold_names = 11; |
There was a problem hiding this comment.
Could you add comments to some of the fields? Some of them have clear names but I don't know what a threshold name is or what the user field designates – I suspect it's the user who made the request but it could mean many different things.
| // ListDatabases lists databases | ||
| rpc ListDatabases(ListDatabasesRequest) returns (ListDatabasesResponse); | ||
| rpc GetAllDatabases(GetAllDatabasesRequest) returns (GetAllDatabasesResponse); |
| // GetAccessRequests lists filtered AccessRequests | ||
| rpc GetAccessRequests(GetAccessRequestsRequest) returns (GetAccessRequestsResponse); | ||
| // GetAccessRequest retreives a single Access Request | ||
| rpc GetAccessRequest(GetAccessRequestsRequest) returns (GetAccessRequestResponse); |
There was a problem hiding this comment.
Do they have to share the same request type? Is there a situation where we want to call GetAccessRequests with a specific request ID?
There was a problem hiding this comment.
This is because authClient only has a plural GetAccessRequests. I figured clusters_access_request.go could have a single Get like the authclient and treat ID as essentially a filter. handler_access_requests has a distinct method to return just 1 request from the ID.
There was a problem hiding this comment.
I'd be for keeping the list of invalid states to a minimum. Any situation where we allow a data structure with invalid data to exist is an opportunity to introduce bugs.
Having a single method like Cluster.GetAccessRequests is fine but could you refactor these RPCs so that invalid states are not possible? So either keep a single RPC for the Terminal service as well or change the request for GetAccessRequests so that passing ID is not possible. I suppose the first one would be easier to implement.
There was a problem hiding this comment.
I ended up moving into separate RPCs. A couple other things was I removed the user and RequestState params because we don't do server side searching for AccessRequests on web or Connect so they are dead params anyway. GetAccessRequest now accepts an ID/cluster_uri and GetAccessRequests only accepts the cluster_uri. The service now checks for existence of req.Id instead of cluster method (also in DeleteRequests). So now, the service has two RPCs with separate requests that are used to just pass types.AccessRequestsFilter to the cluster method
| } | ||
| defer proxyClient.Close() | ||
|
|
||
| authClient, err = proxyClient.ConnectToCluster(ctx, c.clusterClient.SiteName) |
There was a problem hiding this comment.
Currently access requests are always created/reviewed/assumed in the root cluster, and we rely on the cluster role mapping for access to leaf clusters. I'm not sure if this will always be the root cluster here, just want to make sure you're aware. Have you tested with a trusted cluster setup, and with access requests created from tsh/web ui and reviewed in connect?
There was a problem hiding this comment.
Yes, in Connect we pass the rootClusterUri during creating/reviewing/assuming. The local cluster is used only when fetching request-able resources essentially.
There was a problem hiding this comment.
Perhaps we could add a comment in service.proto above the cluster_uri field of CreateAccessRequestRequest stating that it needs to be a root cluster URI? Or even change the name of the field to root_cluster_uri.
There was a problem hiding this comment.
I think changing the name of the field is better as it flows through to the frontend as well with setRootClusterUri
| requestedResourceIDs := make([]*api.ResourceID, 0, len(req.GetRequestedResourceIDs())) | ||
| for _, r := range req.GetRequestedResourceIDs() { |
There was a problem hiding this comment.
I recently added some logic for the Web UI to get extra details about requested resources, currently just SSH server hostnames and in the future we'll want resource labels as well https://github.com/gravitational/teleport.e/pull/530/files
I don't think you should add it to this PR but it will probably be desirable to have in Connect as well. Would be nice if we could share code between the web and connect backends for this
There was a problem hiding this comment.
@avatus could you file an issue to add support for this?
There was a problem hiding this comment.
|
@avatus Before merging this PR, could you make a tag build to make sure everything works fine? It's a big change so I'd like to double check that it doesn't break the build. Last time we merged that big initial PR with Connect, it turned out we were pulling stuff into lib/teleterm that was supposed to stay only in teleport and tctl and it broke the build on Windows. |
| // GetAccessRequests lists filtered AccessRequests | ||
| rpc GetAccessRequests(GetAccessRequestsRequest) returns (GetAccessRequestsResponse); | ||
| // GetAccessRequest retreives a single Access Request | ||
| rpc GetAccessRequest(GetAccessRequestsRequest) returns (GetAccessRequestResponse); |
There was a problem hiding this comment.
I'd be for keeping the list of invalid states to a minimum. Any situation where we allow a data structure with invalid data to exist is an opportunity to introduce bugs.
Having a single method like Cluster.GetAccessRequests is fine but could you refactor these RPCs so that invalid states are not possible? So either keep a single RPC for the Terminal service as well or change the request for GetAccessRequests so that passing ID is not possible. I suppose the first one would be easier to implement.
| } | ||
| defer proxyClient.Close() | ||
|
|
||
| authClient, err = proxyClient.ConnectToCluster(ctx, c.clusterClient.SiteName) |
There was a problem hiding this comment.
Perhaps we could add a comment in service.proto above the cluster_uri field of CreateAccessRequestRequest stating that it needs to be a root cluster URI? Or even change the name of the field to root_cluster_uri.
| ) | ||
|
|
||
| func (s *Handler) GetRequestableRoles(ctx context.Context, req *api.GetRequestableRolesRequest) (*api.GetRequestableRolesResponse, error) { | ||
| roles, err := s.DaemonService.GetRequestableRoles(ctx, req) |
There was a problem hiding this comment.
So far we haven't really been passing proto messages around, we'd always repack them into a struct that was specific to the given layer.
teleport/lib/teleterm/apiserver/handler/handler_gateways.go
Lines 28 to 36 in bd0ae38
However, maybe passing proto messages around isn't necessarily a bad idea as long as they inhibit good design. One example would be a situation where daemon.Service could try to parse some values and prevent invalid data from ever entering the rest of the system. But instead of doing that we delegate that to a layer below because it's more convenient to just pass that single proto message through multiple layers without constricting any types along the way.
@codingllama once said that "Passing proto messages around is fine, protos are meant to carry data after all". I think I agree with that here. Any part of lib/teleterm can import the protogen stuff and trivially create a proto message so using them shouldn't get in the way.
| if len(req.Roles) > 0 { | ||
| request, err = services.NewAccessRequest(c.status.Username, req.Roles...) | ||
| } else { | ||
| request, err = services.NewAccessRequestWithResources(c.status.Username, nil, resourceIDs) | ||
| } | ||
| if err != nil { | ||
| return trace.Wrap(err) | ||
| } | ||
|
|
||
| request.SetRequestReason(req.Reason) | ||
| request.SetSuggestedReviewers(req.SuggestedReviewers) |
There was a problem hiding this comment.
Do these calls need to be made within addMetadataToRetryableError? I'd try to keep the function that's passed to addMetadataToRetryableError as small as possible. If possible, only calls that make requests to the proxy should be kept within the function.
Otherwise it just introduces a bit of noise because another person reading the code might wonder why the calls need to be included in the function. If a call needs to be within the callback and it's not immediately clear that it makes a request to the proxy/cluster, it should be noted in a comment.
| if req.RequestId == "" { | ||
| return trace.BadParameter("missing request id") | ||
| } |
There was a problem hiding this comment.
That's one of the issues with passing protos around that I mentioned. What would you say about doing this validation in daemon.Service.DeleteAccessRequest? This way daemon.Service acts as the outermost layer (I guess a controller in MVC architecture) and can validate data before passing it to the rest of the system. This way other layers can assume that api.DeleteAccessRequestRequest is valid and don't need to perform defensive checks.
There was a problem hiding this comment.
Really like this idea!
| if err := c.clusterClient.SaveProfile(c.dir, true); err != nil { | ||
| return trace.Wrap(err) | ||
| } |
There was a problem hiding this comment.
This call doesn't make a request to the proxy, so I think we could move it outside of the function passed to addMetadataToRetryableError.
| } | ||
|
|
||
| message AssumeRoleResponse { | ||
| string success = 1; |
There was a problem hiding this comment.
Is this for compatibility with webapps.e stuff? If so, let's capture that in a comment please.
There was a problem hiding this comment.
Nah, I just left it in during testing and forgot about changing to EmptyResponse. Fixing now
ravicious
left a comment
There was a problem hiding this comment.
Ok, I think we're getting close to being finished here, those are probably my remaining comments.
Renaming that field to root_cluster_uri was a good idea, it's immediately clear what the expectations of the given RPC are.
| if len(requests) < 1 { | ||
| return nil, trace.NotFound("access request %q not found", req.Id) | ||
| } |
There was a problem hiding this comment.
Oh, but this was a good check, wasn't it? Now the GetAccessRequest will return a successful response if the request wasn't found. I suppose we could still do this check within daemon.Service.GetAccessRequest?
There was a problem hiding this comment.
I moved this into daemon.Service.GetAccessRquest in the previous push.
There was a problem hiding this comment.
I moved this into
daemon.Service.GetAccessRquestin the previous push.
I don't understand, I don't see this logic being moved to daemon.Service.GetAccessRquest. 🤔
There was a problem hiding this comment.
Weird, I also don't see it when clicking on the file name from this comment. I don't see the GetAccessRequest at all. But I see it in the diff here
| } | ||
|
|
||
| return nil | ||
| return err |
There was a problem hiding this comment.
This can be replaced with return nil I think? We already handle the err != nil case above.
There was a problem hiding this comment.
You're right, but I might as well just change to
return c.clusterClient.CreateAccessRequest(ctx, request)I think it's a bit easier to read. Thoughts?
| } | ||
|
|
||
| return nil | ||
| return err |
There was a problem hiding this comment.
In this case, there are multiple calls that can return err within a single function passed to addMetadataToRetryableError. It'd be best to wrap each of them so that we know exactly which line returned the error.
| return err | |
| return trace.Wrap(err) |
| // cluster_uri is the cluster uri | ||
| string cluster_uri = 1; | ||
| // specifcies a specific request id | ||
| string id = 2; |
There was a problem hiding this comment.
Access request RPCs are not consistent when it comes to naming the ID field in the request message. Some use id, some use request_id, others access_request_ids.
Since for request messages related to gateways we use gateway_uri, I'd recommend sticking with either request_id or access_request_id. access_request_id strikes me as a better choice because it's more clear.
The id field on the AccessRequest message can stay as it is IMHO as it's grouped withing a specific message.
There was a problem hiding this comment.
Thanks for keeping me accountable here. I agree with access_request_id 👍
|
@zmb3 @nklaassen if you guys get a chance, can I get some 👀 on this? @ravicious is OOO this week and I got the green checkmarks on the UI side. Thanks! |
nklaassen
left a comment
There was a problem hiding this comment.
I don't want to block this, but a couple suggestions.
The PR is quite large. I could have been split up to into PRs like:
- view access requests
- review access requests
- create access requests
- assume roles
It seems like a lot of extra work but smaller PRs are so much easier to review get merged
I don't know what the state of testing is for Connect, but it would definitely be nice to have some unit or integration tests to confirm that this all actually works. If it is not feasible to do it in this PR could you file an issue to add testing?
I will approve but I would like to see the drone changes pulled into their own PR and likely backported
| requestedResourceIDs := make([]*api.ResourceID, 0, len(req.GetRequestedResourceIDs())) | ||
| for _, r := range req.GetRequestedResourceIDs() { |
There was a problem hiding this comment.
@avatus could you file an issue to add support for this?
| Servers []Server | ||
| // StartKey is the next key to use as a starting point. | ||
| StartKey string | ||
| // // TotalCount is the total number of resources available as a whole. |
There was a problem hiding this comment.
| // // TotalCount is the total number of resources available as a whole. | |
| // TotalCount is the total number of resources available as a whole. |
| - git submodule update --init e | ||
| - git submodule update --init --recursive webassets | ||
| - cd $WebappsSrc | ||
| - git submodule update --init packages/webapps.e |
There was a problem hiding this comment.
I would like to see the drone changes get their own PR so that any potential issue can more easily be pinpointed or rolled back, and the backport decision can be independent
Definitely my biggest regret with this PR. I went back and looked at the search based access requests PRs to learn a bit more how to structure this the next time I have a larger project like this. I appreciate your insight on ways to split it up in the future |
| rpc ListLeafClusters(ListLeafClustersRequest) returns (ListClustersResponse); | ||
| // ListDatabases lists databases | ||
| rpc ListDatabases(ListDatabasesRequest) returns (ListDatabasesResponse); | ||
| // GetDatabases lists all databases without pagination |
There was a problem hiding this comment.
| // GetDatabases lists all databases without pagination | |
| // GetAllDatabases lists all databases without pagination |
| rpc GetRequestableRoles(GetRequestableRolesRequest) returns (GetRequestableRolesResponse); | ||
| // AssumeRole assumes the role of the given access request | ||
| rpc AssumeRole(AssumeRoleRequest) returns (EmptyResponse); | ||
| // GetAllKubes list kubes |
There was a problem hiding this comment.
The doc for GetAllKubes and GetKubes is identical. Do both RPCs do the same exact thing or is there a difference? If there's a difference, can we make it clear what the difference is?
| } | ||
|
|
||
| message GetAccessRequestRequest { | ||
| // cluster_uri is the cluster uri |
There was a problem hiding this comment.
I don't know that this comment is adding much value. Up to you, but my vote would be to simply remove it if we don't have anything useful to add.
| repeated string active_requests = 4; | ||
| // acl is the user acl | ||
| ACL acl = 4; | ||
| ACL acl = 5; |
There was a problem hiding this comment.
The whole point of numbered fields is so that you can add/remove fields without breaking compatibility. By changing acl from 4 to 5 you've broken compatibility.
Granted, that may not matter in this case, since this GRPC interface is only between connect and its embedded tsh binary, but I still don't see a need to break things - we could just make active_requests field 5 instead.
There was a problem hiding this comment.
This was brought up previously when changing ListServers to GetAllServers as any older client would then have ListServers return an unimplemented error. You are correct in that it won't affect us here because of the embedded binary, but it's best to keep this practice and not lose its value. Thanks for pointing this one out. Will change active_requests to field 5 instead.
* access requests for teleterm * removed unused imports and named returns * remove comment * using timestamppb instead of string for access requests * updated proto with some more comments * updated protos with comments * using clusterClient, comments, and moving validation to daemon for access request delete * separated GetAccessRequests into separate RPCs * protobuf updates * moved requestid check before resolving cluster * fullstops in comments * used standard access_request_id through rpc messages * updated protofiles * updated daemon service types to match grpc * added kube advanced search support * updated protos for kubes in access requests * testing tag build * fix detached head * new tag build * protobuf update * lint fixes * allow drone windows Connect build to include webapps.e * protobuf files * remove drone changes and updated comment * proto changes with comment fixes and changed field order * protobuf updates
This PR adds functionality to tshd to support Access Requests in Connect. It also adds search functionality to our current clusters_servers and clusters_databases to allow "search_as_roles" searches when browsing for resources. We will have to update this for kube support shortly after but I didn't want to keep blowing this PR up and wouldn't rather get current functionality reviewed and add kube right after.
Most of this was created in a "we already do it this way" style so please let me know if I've missed context or misinterpreted the exiting code.
I included the updated
make grpc-teletermfiles as well but if that should be in a different PR I can remove them (although that doesn't seem right?).Closes #15170
webapps PR
webapps.e PR