RFD 129: Avoid Discovery Resource Name Collisions#27258
Conversation
Expand on UX when user references unsupported template var
| aws: | ||
| - types: ["ec2", "rds"] | ||
| regions: ["us-west-1"] | ||
| resource_name_template: "{{.Name}}-{{.Region}}-{{.AWS.AccountID}}" |
There was a problem hiding this comment.
Why do we need to give users such a flexibility? Why can't we always import resource names with name, region and account ID in mind as the default, making sure the names are unique?
Do we really need such flexibility and complexity? To me that looks like opportunity for all sorts of issues @GavinFrazar @r0mant
There was a problem hiding this comment.
We could do that, there were a couple of reasons we wanted to go with the templating approach, mostly UX and compatibility related:
- If we always import with name/region/accountID, imported database names will become very long/unwieldy and users might have a hard time using them in e.g.
tsh db connect. - By default we'll still default to just "name" like now so only users who have the name conflict problem will need to use this.
IIRC we discussed this on the product sync a while ago but we can reconsider if you still think this is a bad idea.
There was a problem hiding this comment.
I remember this conversation. But now seeing the proposal, I understand how needlessly complicated the whole setup would be.
We already practice database with long names, see for example my setup:
Name Description Allowed Users Labels Connect
-------------------------------- -------------------------------- ------------------- --------------- --------------------------------
> sales-center-prod-reader-ek... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce... tsh db connect sales-center-p...
sales-center-staging-reader-e... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce...
This is a pretty long name, but it does not create any issues for me.
I wonder if there is easier way to solve this problem using just minor UX tweaks.
For example, what if we set name to the unique name at all times, but also show a short name, derived from the original resource name and allow connecting by labels as well:
Name Full name
-------------------------------- -------------------
mydatabase mydatabase-us-east-1-1234-567-8910
Then resource name will be long name nobody will read, and if there is no collision, then tsh db connect mydatabase will work exactly as tsh db connect mydatabase-us-east-1-1234-567-8910 If there is a collision we will ask users to provide fully qualified name.
We can also let users tsh db connect label=value as well.
What do you think?
There was a problem hiding this comment.
@klizhentas I like the idea to default to a long name and print a short name instead of templating, since teleport config is already very complex.
I think supporting tsh db connect label=value would be useful, in and of itself, for executing statements on multiple databases at once non-interactively, which is something I wanted before for setting up Azure MySQL IAM users in multiple databases at once:
bash-5.1$ cat <<'EOF' | tsh db connect --db-user=database-access 'engine="Microsoft.DBforMySQL/servers"'
SET aad_auth_validate_oids_in_tenant = OFF;
CREATE AADUSER 'SomeNewUser' IDENTIFIED BY '64ca60b1-5b77-4dc1-9ad3-79ac7fb1fd09'; # this is the ID of the service principal
GRANT ALL ON `%`.* TO 'SomeNewUser'@'%';
EOF
There was a problem hiding this comment.
I like the idea of allowing tsh db connect to connect using substring of a name and labels 👍 This should also be backwards compatible since "instance name" will be a part of the whole name as well so users will still be able to connect using it unless there's ambiguity.
Then it sounds like we don't need any config changes, just these 3 UX tweaks:
- When importing databases, include account ID and region in the resource name.
- Update
tsh db connectandtsh proxy dbto treat provided "name" as a substring of the resource name. - Update
tsh db connectandtsh proxy dbto select database by labels.
For "2" and "3", we return error in case of ambiguity asking user to specify "unambiguated" (is that a word lol?) name, and maybe including a list of names that matched in the message.
Does that sound reasonable? @GavinFrazar @klizhentas @smallinsky
There was a problem hiding this comment.
@r0mant @klizhentas
I did some investigation into using label selectors, and I think my proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.
Therefore, to limit the scope of this RFD to just what is necessary to avoid name collisions in discovered resources, I've rewritten the RFD to just support prefixes and label matching to unambiguously select a single resource for now.
Besides, we need to decide if we need to keep the compatibility of cached certificates. For example, tsh keeps Kubernetes certs cached locally and the certificate contains the Kubernetes cluster name. If we going to continue with this change and since discovered clusters are dynamically loaded, users might end up with cached certificates (for a long time) that cannot be used for anything because the embedded cluster name does not match with the server's clusters names.
Could you expand on that? I don't understand how the scenario you've described applies - the cert will have the full kube cluster name in it, we'll just support short names for identifying kube clusters when using tsh kube login
There was a problem hiding this comment.
My concern is that:
- A user has an auto-discovery config in place and working
- calls
tsh kube login <cluster_name> - Accesses the cluster and caches the certificate locally
kubernetes_cluster=<cluster_name> - Someone upgrades
discovery_serviceto a version that implements this RFD and the cluster names change. What wascluster_namenow becomes{account_id}-{region}-{cluster_name}. - The next call to kubectl will fail because the cluster embedded in the certificate -
<cluster_name>- no longer exists. To fix it, it requires changes to the kubeconfig or callingtsh kube loginagain
Since tsh kube credentials does not validate if the cluster exists when the certificate is cached to reduce the execution time, the error returned will be not found from the kubernetes proxy which is sub-optimal.
As a result, we might end up breaking access to clusters with sub-optimal errors for the time users have cached credentials.
There was a problem hiding this comment.
proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.
@GavinFrazar I might have missed it, but why do we need to execute commands on multiple databases? People have asked about something like this before but this is out of scope of this issue. Let's skip it for now.
Edit: Just re-read your comment, I agree with "just support prefixes and label matching to unambiguously select a single resource for now" assessment. To clarify though: it's not just prefixes, it's any substring of the name, correct?
@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?
There was a problem hiding this comment.
To clarify though: it's not just prefixes, it's any substring of the name, correct?
I think prefix matching will be more intuitive than substrings in tsh, so my intent was to go with prefixes. Consider this example if we do substrings:
# foo-rds-us-west-1-0123456789012 # this one made by discovery
# rds-prod-something # this one made by static/dynamic config
$ tsh db connect rds
error: ambiguous ...*snip*...But I think substring search in Connect/WebUI makes sense because the user first searches and then interactively selects the resource to use
There was a problem hiding this comment.
@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?
I am ok with this as long as we are clear about the change and include it in the breaking changes notes.
Handling the compatibility of certificates would be a nightmare for security reasons and we shouldn't pursue that path.
* remove config template discussion * explain a discovery naming convention approach
|
|
||
| # ambiguous prefix name is an error | ||
| $ tsh db connect --db-user=alice --db-name-postgres bar | ||
| error: ambiguous database name could match multiple databases: |
There was a problem hiding this comment.
Here is the output that is easy to copy paste:
Multiple databases match the name "bar". Please specify the full database name. You can find it by running
$tsh db ls -v, or run the following commands:
$ tsh db connect bar-rds-us-west-1-0123456789012 # instance in account-id=0123456789012,region=us-west-1,env=dev`
$ tsh db connect bar-rds-us-east-2-0123456789012 # instance in ...There was a problem hiding this comment.
initially, I thought copy/paste examples would be a good idea, but I'm not so sure anymore - the commands don't include the flags the user used previously. As a user I think i'm more likely to press the up arrow instead to modify my last command than copy/paste a new one. Also, printing example commands is bug prone, and it's frustrating to copy/paste a suggested command that doesn't work or doesn't do what I intended.
I prefer the error output from tsh ssh that just provides hints about how to resolve the ambiguity, with a generic example Hint: try addressing the node by unique id (ex: tsh ssh user@node-id)
| error: ambiguous database name could match multiple databases: | ||
| Name Description Allowed Users Labels Connect | ||
| ------ ------------------- ------------------- --------------------------- ------- | ||
| bar-rds-us-west-1-0123456789012 RDS instance in ... [*] account-id=0123456789012,region=us-west-1,env=dev,... |
There was a problem hiding this comment.
why not show the full output then?
There was a problem hiding this comment.
I actually dislike tsh db ls -v because my terminal wraps long lines and breaks the table formatting, outputting a jumbled mess, although i'm not sure how common line wrapping is for other terminals.
It's not really an issue with tsh ls -v for ssh servers because the output isn't as verbose.
I like the hint to just suggest using tsh db ls -v, or maybe tsh db ls --format=yaml, as well as suggested commands to copy/paste
klizhentas
left a comment
There was a problem hiding this comment.
Almost there, we should add the same story for UI and Connect too.
* fixup example formatting, aws account ID length, sort order
| argument. These commands shall support | ||
| `tsh <sub-command> [name | prefix] [key1=value1,key2=value2,...]` syntax: | ||
|
|
||
| - `tsh db login` |
There was a problem hiding this comment.
Can we include tsh app xxx commands in the list?
There was a problem hiding this comment.
For UX consistency, yes. I only excluded it from the list because we don't have app auto-discovery
* consistent naming scheme * helps avoid collisions in rare cases of invalid resource group chars
* support --query and --labels flags instead of positional labels arg * clarify how prefix resource name matching will be implemented * update examples
Rendered:
I have prototyped this idea for AWS databases and the implementation was straightforward (and works to solve the name collision issue). This RFD is mostly to discuss:1. should we use Go templates instead of some custom templating like we do in Teleport RBAC or a simpler alternative like name rewriting to just add a prefix/suffix?2. if we like the Go template solution, what template variables/functionality should we expose?Reworked RFD to discuss a discovery resource naming convention and
tshUX to make working with longer resource names less tedious.