RFD 129: Avoid Discovery Resource Name Collisions by GavinFrazar · Pull Request #27258 · gravitational/teleport

GavinFrazar · 2023-06-02T01:23:21Z

Rendered:

RFD 129 - Discovery Resource Name Templates

~~I have prototyped this idea for AWS databases and the implementation was straightforward (and works to solve the name collision issue). This RFD is mostly to discuss:~~

~~1. should we use Go templates instead of some custom templating like we do in Teleport RBAC or a simpler alternative like name rewriting to just add a prefix/suffix?~~
~~2. if we like the Go template solution, what template variables/functionality should we expose?~~

Reworked RFD to discuss a discovery resource naming convention and tsh UX to make working with longer resource names less tedious.

Expand on UX when user references unsupported template var

klizhentas · 2023-06-03T01:45:11Z

+  aws:
+    - types: ["ec2", "rds"]
+      regions: ["us-west-1"]
+      resource_name_template: "{{.Name}}-{{.Region}}-{{.AWS.AccountID}}"


Why do we need to give users such a flexibility? Why can't we always import resource names with name, region and account ID in mind as the default, making sure the names are unique?

Do we really need such flexibility and complexity? To me that looks like opportunity for all sorts of issues @GavinFrazar @r0mant

We could do that, there were a couple of reasons we wanted to go with the templating approach, mostly UX and compatibility related:

If we always import with name/region/accountID, imported database names will become very long/unwieldy and users might have a hard time using them in e.g. tsh db connect.

By default we'll still default to just "name" like now so only users who have the name conflict problem will need to use this.

IIRC we discussed this on the product sync a while ago but we can reconsider if you still think this is a bad idea.

I remember this conversation. But now seeing the proposal, I understand how needlessly complicated the whole setup would be.

We already practice database with long names, see for example my setup:

Name Description Allowed Users Labels Connect -------------------------------- -------------------------------- ------------------- --------------- -------------------------------- > sales-center-prod-reader-ek... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce... tsh db connect sales-center-p... sales-center-staging-reader-e... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce...

This is a pretty long name, but it does not create any issues for me.

I wonder if there is easier way to solve this problem using just minor UX tweaks.

For example, what if we set name to the unique name at all times, but also show a short name, derived from the original resource name and allow connecting by labels as well:

Name Full name -------------------------------- ------------------- mydatabase mydatabase-us-east-1-1234-567-8910

Then resource name will be long name nobody will read, and if there is no collision, then tsh db connect mydatabase will work exactly as tsh db connect mydatabase-us-east-1-1234-567-8910 If there is a collision we will ask users to provide fully qualified name.

We can also let users tsh db connect label=value as well.

What do you think?

@klizhentas I like the idea to default to a long name and print a short name instead of templating, since teleport config is already very complex.

I think supporting tsh db connect label=value would be useful, in and of itself, for executing statements on multiple databases at once non-interactively, which is something I wanted before for setting up Azure MySQL IAM users in multiple databases at once:

bash-5.1$ cat <<'EOF' | tsh db connect --db-user=database-access 'engine="Microsoft.DBforMySQL/servers"' SET aad_auth_validate_oids_in_tenant = OFF; CREATE AADUSER 'SomeNewUser' IDENTIFIED BY '64ca60b1-5b77-4dc1-9ad3-79ac7fb1fd09'; # this is the ID of the service principal GRANT ALL ON `%`.* TO 'SomeNewUser'@'%'; EOF

I like the idea of allowing tsh db connect to connect using substring of a name and labels 👍 This should also be backwards compatible since "instance name" will be a part of the whole name as well so users will still be able to connect using it unless there's ambiguity.

Then it sounds like we don't need any config changes, just these 3 UX tweaks:

When importing databases, include account ID and region in the resource name.

Update tsh db connect and tsh proxy db to treat provided "name" as a substring of the resource name.

Update tsh db connect and tsh proxy db to select database by labels.

For "2" and "3", we return error in case of ambiguity asking user to specify "unambiguated" (is that a word lol?) name, and maybe including a list of names that matched in the message.

Does that sound reasonable? @GavinFrazar @klizhentas @smallinsky

@r0mant @klizhentas
I did some investigation into using label selectors, and I think my proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.

Therefore, to limit the scope of this RFD to just what is necessary to avoid name collisions in discovered resources, I've rewritten the RFD to just support prefixes and label matching to unambiguously select a single resource for now.

@tigrato

Besides, we need to decide if we need to keep the compatibility of cached certificates. For example, tsh keeps Kubernetes certs cached locally and the certificate contains the Kubernetes cluster name. If we going to continue with this change and since discovered clusters are dynamically loaded, users might end up with cached certificates (for a long time) that cannot be used for anything because the embedded cluster name does not match with the server's clusters names.

Could you expand on that? I don't understand how the scenario you've described applies - the cert will have the full kube cluster name in it, we'll just support short names for identifying kube clusters when using tsh kube login

My concern is that:

A user has an auto-discovery config in place and working

calls tsh kube login <cluster_name>

Accesses the cluster and caches the certificate locally kubernetes_cluster=<cluster_name>

Someone upgrades discovery_service to a version that implements this RFD and the cluster names change. What was cluster_name now becomes {account_id}-{region}-{cluster_name}.

The next call to kubectl will fail because the cluster embedded in the certificate - <cluster_name> - no longer exists. To fix it, it requires changes to the kubeconfig or calling tsh kube login again

Since tsh kube credentials does not validate if the cluster exists when the certificate is cached to reduce the execution time, the error returned will be not found from the kubernetes proxy which is sub-optimal.

As a result, we might end up breaking access to clusters with sub-optimal errors for the time users have cached credentials.

proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.

@GavinFrazar I might have missed it, but why do we need to execute commands on multiple databases? People have asked about something like this before but this is out of scope of this issue. Let's skip it for now.

Edit: Just re-read your comment, I agree with "just support prefixes and label matching to unambiguously select a single resource for now" assessment. To clarify though: it's not just prefixes, it's any substring of the name, correct?

@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?

To clarify though: it's not just prefixes, it's any substring of the name, correct?

I think prefix matching will be more intuitive than substrings in tsh, so my intent was to go with prefixes. Consider this example if we do substrings:

# foo-rds-us-west-1-0123456789012 # this one made by discovery # rds-prod-something # this one made by static/dynamic config $ tsh db connect rds error: ambiguous ...*snip*...

But I think substring search in Connect/WebUI makes sense because the user first searches and then interactively selects the resource to use

@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?

I am ok with this as long as we are clear about the change and include it in the breaking changes notes.

Handling the compatibility of certificates would be a nightmare for security reasons and we shouldn't pursue that path.

* remove config template discussion * explain a discovery naming convention approach

klizhentas · 2023-06-07T01:45:56Z

+
+# ambiguous prefix name is an error
+$ tsh db connect --db-user=alice --db-name-postgres bar
+error: ambiguous database name could match multiple databases:


Here is the output that is easy to copy paste:

Multiple databases match the name "bar". Please specify the full database name. You can find it by running $tsh db ls -v, or run the following commands: $ tsh db connect bar-rds-us-west-1-0123456789012 # instance in account-id=0123456789012,region=us-west-1,env=dev` $ tsh db connect bar-rds-us-east-2-0123456789012 # instance in ...

initially, I thought copy/paste examples would be a good idea, but I'm not so sure anymore - the commands don't include the flags the user used previously. As a user I think i'm more likely to press the up arrow instead to modify my last command than copy/paste a new one. Also, printing example commands is bug prone, and it's frustrating to copy/paste a suggested command that doesn't work or doesn't do what I intended.

I prefer the error output from tsh ssh that just provides hints about how to resolve the ambiguity, with a generic example Hint: try addressing the node by unique id (ex: tsh ssh user@node-id)

klizhentas · 2023-06-07T01:46:12Z

+error: ambiguous database name could match multiple databases:
+Name   Description         Allowed Users       Labels                      Connect 
+------ ------------------- ------------------- --------------------------- ------- 
+bar-rds-us-west-1-0123456789012 RDS instance in ... [*] account-id=0123456789012,region=us-west-1,env=dev,...


why not show the full output then?

I actually dislike tsh db ls -v because my terminal wraps long lines and breaks the table formatting, outputting a jumbled mess, although i'm not sure how common line wrapping is for other terminals.
It's not really an issue with tsh ls -v for ssh servers because the output isn't as verbose.

I like the hint to just suggest using tsh db ls -v, or maybe tsh db ls --format=yaml, as well as suggested commands to copy/paste

klizhentas

Almost there, we should add the same story for UI and Connect too.

* fixup example formatting, aws account ID length, sort order

r0mant · 2023-06-14T21:16:49Z

+argument. These commands shall support
+`tsh <sub-command> [name | prefix] [key1=value1,key2=value2,...]` syntax:
+
+- `tsh db login`


Can we include tsh app xxx commands in the list?

For UX consistency, yes. I only excluded it from the list because we don't have app auto-discovery

* consistent naming scheme * helps avoid collisions in rare cases of invalid resource group chars

* support --query and --labels flags instead of positional labels arg * clarify how prefix resource name matching will be implemented * update examples

RFD: Auto-Discovery Resource Name Templates

73c5a7c

github-actions Bot requested review from AntonAM and probakowski June 2, 2023 01:23

github-actions Bot added rfd Request for Discussion size/md labels Jun 2, 2023

GavinFrazar requested review from jentfoo, klizhentas, r0mant, reedloden, smallinsky and xinding33 and removed request for AntonAM and probakowski June 2, 2023 01:24

Update 0129-discovery-name-templating.md

feea5fd

GavinFrazar changed the title ~~RFD: Auto-Discovery Resource Name Templates~~ RFD 129: Auto-Discovery Resource Name Templates Jun 2, 2023

GavinFrazar requested a review from tigrato June 2, 2023 01:29

GavinFrazar added 2 commits June 1, 2023 18:39

Update 0129-discovery-name-templating.md

4c2031a

Expand on UX when user references unsupported template var

Update 0129-discovery-name-templating.md

479accb

klizhentas requested changes Jun 2, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

GavinFrazar added 3 commits June 2, 2023 13:16

Add dynamic config examples

e8304f6

Show proto message updates needed

ba31553

Fixup error message example for tctl

d49d13b

jentfoo reviewed Jun 2, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

klizhentas reviewed Jun 3, 2023

View reviewed changes

rework RFD

57512c6

* remove config template discussion * explain a discovery naming convention approach

GavinFrazar changed the title ~~RFD 129: Auto-Discovery Resource Name Templates~~ RFD 129: Avoid Discovery Resource Name Collisions Jun 7, 2023

remove tsh proxy app entry

8f967d6

klizhentas reviewed Jun 7, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

klizhentas reviewed Jun 7, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

klizhentas reviewed Jun 7, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

klizhentas reviewed Jun 7, 2023

View reviewed changes

GavinFrazar added 3 commits June 8, 2023 12:46

add web ui and Teleport Connect UX

cf1e183

expand full detail on ambiguous tsh error

a07ed8d

* fixup example formatting, aws account ID length, sort order

fix formatting of table

bcfbd6f

smallinsky requested a review from klizhentas June 14, 2023 15:34

r0mant reviewed Jun 14, 2023

View reviewed changes

smallinsky reviewed Jun 15, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md

Comment thread rfd/0129-discovery-name-templating.md Outdated

include azure region

7b21853

* consistent naming scheme * helps avoid collisions in rare cases of invalid resource group chars

smallinsky approved these changes Jun 16, 2023

View reviewed changes

GavinFrazar added 3 commits June 16, 2023 14:13

update tsh UX

c0e5a5b

* support --query and --labels flags instead of positional labels arg * clarify how prefix resource name matching will be implemented * update examples

address backward compat

3718bc7

update subcommands to include apps and db logout

2a3bd6e

klizhentas approved these changes Jun 16, 2023

View reviewed changes

r0mant approved these changes Jun 16, 2023

View reviewed changes

Comment thread rfd/0129-discovery-name-templating.md Outdated

public-teleport-github-review-bot Bot removed request for reedloden and xinding33 June 16, 2023 23:26

GavinFrazar added this pull request to the merge queue Jun 20, 2023

Merged via the queue into master with commit 2dacc4d Jun 20, 2023

GavinFrazar deleted the rfd/0129-discovery-resource-name-templating branch June 20, 2023 18:16

GavinFrazar mentioned this pull request Jun 30, 2023

update tsh db resource selection #28505

Merged

GavinFrazar mentioned this pull request Jul 19, 2023

differentiate discovered resource names #28845

Merged

Conversation

GavinFrazar commented Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klizhentas Jun 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klizhentas Jun 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r0mant Jun 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tigrato Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r0mant Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GavinFrazar Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klizhentas Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klizhentas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

GavinFrazar commented Jun 2, 2023 •

edited

Loading

klizhentas Jun 3, 2023 •

edited

Loading

klizhentas Jun 3, 2023 •

edited

Loading

r0mant Jun 3, 2023 •

edited

Loading

tigrato Jun 7, 2023 •

edited

Loading

r0mant Jun 7, 2023 •

edited

Loading

GavinFrazar Jun 7, 2023 •

edited

Loading

klizhentas Jun 7, 2023 •

edited

Loading