Skip to content

RFD 129: Avoid Discovery Resource Name Collisions#27258

Merged
GavinFrazar merged 16 commits intomasterfrom
rfd/0129-discovery-resource-name-templating
Jun 20, 2023
Merged

RFD 129: Avoid Discovery Resource Name Collisions#27258
GavinFrazar merged 16 commits intomasterfrom
rfd/0129-discovery-resource-name-templating

Conversation

@GavinFrazar
Copy link
Copy Markdown
Contributor

@GavinFrazar GavinFrazar commented Jun 2, 2023

Rendered:

I have prototyped this idea for AWS databases and the implementation was straightforward (and works to solve the name collision issue). This RFD is mostly to discuss:

1. should we use Go templates instead of some custom templating like we do in Teleport RBAC or a simpler alternative like name rewriting to just add a prefix/suffix?
2. if we like the Go template solution, what template variables/functionality should we expose?

Reworked RFD to discuss a discovery resource naming convention and tsh UX to make working with longer resource names less tedious.

@github-actions github-actions Bot requested review from AntonAM and probakowski June 2, 2023 01:23
@github-actions github-actions Bot added rfd Request for Discussion size/md labels Jun 2, 2023
@GavinFrazar GavinFrazar requested review from jentfoo, klizhentas, r0mant, reedloden, smallinsky and xinding33 and removed request for AntonAM and probakowski June 2, 2023 01:24
@GavinFrazar GavinFrazar changed the title RFD: Auto-Discovery Resource Name Templates RFD 129: Auto-Discovery Resource Name Templates Jun 2, 2023
@GavinFrazar GavinFrazar requested a review from tigrato June 2, 2023 01:29
Expand on UX when user references unsupported template var
Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md Outdated
aws:
- types: ["ec2", "rds"]
regions: ["us-west-1"]
resource_name_template: "{{.Name}}-{{.Region}}-{{.AWS.AccountID}}"
Copy link
Copy Markdown
Contributor

@klizhentas klizhentas Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to give users such a flexibility? Why can't we always import resource names with name, region and account ID in mind as the default, making sure the names are unique?

Do we really need such flexibility and complexity? To me that looks like opportunity for all sorts of issues @GavinFrazar @r0mant

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, there were a couple of reasons we wanted to go with the templating approach, mostly UX and compatibility related:

  • If we always import with name/region/accountID, imported database names will become very long/unwieldy and users might have a hard time using them in e.g. tsh db connect.
  • By default we'll still default to just "name" like now so only users who have the name conflict problem will need to use this.

IIRC we discussed this on the product sync a while ago but we can reconsider if you still think this is a bad idea.

Copy link
Copy Markdown
Contributor

@klizhentas klizhentas Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this conversation. But now seeing the proposal, I understand how needlessly complicated the whole setup would be.

We already practice database with long names, see for example my setup:

Name                             Description                      Allowed Users       Labels          Connect                          
-------------------------------- -------------------------------- ------------------- --------------- -------------------------------- 
> sales-center-prod-reader-ek... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce... tsh db connect sales-center-p... 
sales-center-staging-reader-e... Salescenter AWS RDS PostgreSQ... [platform_readonly] app=sales-ce...             

This is a pretty long name, but it does not create any issues for me.

I wonder if there is easier way to solve this problem using just minor UX tweaks.

For example, what if we set name to the unique name at all times, but also show a short name, derived from the original resource name and allow connecting by labels as well:

Name                                                                     Full name                          
--------------------------------                        ------------------- 
mydatabase                                                           mydatabase-us-east-1-1234-567-8910

Then resource name will be long name nobody will read, and if there is no collision, then tsh db connect mydatabase will work exactly as tsh db connect mydatabase-us-east-1-1234-567-8910 If there is a collision we will ask users to provide fully qualified name.

We can also let users tsh db connect label=value as well.

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klizhentas I like the idea to default to a long name and print a short name instead of templating, since teleport config is already very complex.

I think supporting tsh db connect label=value would be useful, in and of itself, for executing statements on multiple databases at once non-interactively, which is something I wanted before for setting up Azure MySQL IAM users in multiple databases at once:

bash-5.1$ cat <<'EOF' | tsh db connect --db-user=database-access 'engine="Microsoft.DBforMySQL/servers"'
SET aad_auth_validate_oids_in_tenant = OFF;
CREATE AADUSER 'SomeNewUser' IDENTIFIED BY '64ca60b1-5b77-4dc1-9ad3-79ac7fb1fd09'; # this is the ID of the service principal
GRANT ALL ON `%`.* TO 'SomeNewUser'@'%';
EOF           

Copy link
Copy Markdown
Collaborator

@r0mant r0mant Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of allowing tsh db connect to connect using substring of a name and labels 👍 This should also be backwards compatible since "instance name" will be a part of the whole name as well so users will still be able to connect using it unless there's ambiguity.

Then it sounds like we don't need any config changes, just these 3 UX tweaks:

  1. When importing databases, include account ID and region in the resource name.
  2. Update tsh db connect and tsh proxy db to treat provided "name" as a substring of the resource name.
  3. Update tsh db connect and tsh proxy db to select database by labels.

For "2" and "3", we return error in case of ambiguity asking user to specify "unambiguated" (is that a word lol?) name, and maybe including a list of names that matched in the message.

Does that sound reasonable? @GavinFrazar @klizhentas @smallinsky

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0mant @klizhentas
I did some investigation into using label selectors, and I think my proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.

Therefore, to limit the scope of this RFD to just what is necessary to avoid name collisions in discovered resources, I've rewritten the RFD to just support prefixes and label matching to unambiguously select a single resource for now.

@tigrato

Besides, we need to decide if we need to keep the compatibility of cached certificates. For example, tsh keeps Kubernetes certs cached locally and the certificate contains the Kubernetes cluster name. If we going to continue with this change and since discovered clusters are dynamically loaded, users might end up with cached certificates (for a long time) that cannot be used for anything because the embedded cluster name does not match with the server's clusters names.

Could you expand on that? I don't understand how the scenario you've described applies - the cert will have the full kube cluster name in it, we'll just support short names for identifying kube clusters when using tsh kube login

Copy link
Copy Markdown
Contributor

@tigrato tigrato Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that:

  1. A user has an auto-discovery config in place and working
  2. calls tsh kube login <cluster_name>
  3. Accesses the cluster and caches the certificate locally kubernetes_cluster=<cluster_name>
  4. Someone upgrades discovery_service to a version that implements this RFD and the cluster names change. What was cluster_name now becomes {account_id}-{region}-{cluster_name}.
  5. The next call to kubectl will fail because the cluster embedded in the certificate - <cluster_name> - no longer exists. To fix it, it requires changes to the kubeconfig or calling tsh kube login again

Since tsh kube credentials does not validate if the cluster exists when the certificate is cached to reduce the execution time, the error returned will be not found from the kubernetes proxy which is sub-optimal.

As a result, we might end up breaking access to clusters with sub-optimal errors for the time users have cached credentials.

Copy link
Copy Markdown
Collaborator

@r0mant r0mant Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposal in the comments above to support executing commands on multiple matching databases at once will require a lot more consideration to actually implement.

@GavinFrazar I might have missed it, but why do we need to execute commands on multiple databases? People have asked about something like this before but this is out of scope of this issue. Let's skip it for now.

Edit: Just re-read your comment, I agree with "just support prefixes and label matching to unambiguously select a single resource for now" assessment. To clarify though: it's not just prefixes, it's any substring of the name, correct?

@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?

Copy link
Copy Markdown
Contributor Author

@GavinFrazar GavinFrazar Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify though: it's not just prefixes, it's any substring of the name, correct?

I think prefix matching will be more intuitive than substrings in tsh, so my intent was to go with prefixes. Consider this example if we do substrings:

# foo-rds-us-west-1-0123456789012 # this one made by discovery
# rds-prod-something # this one made by static/dynamic config

$ tsh db connect rds
error: ambiguous ...*snip*...

But I think substring search in Connect/WebUI makes sense because the user first searches and then interactively selects the resource to use

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrato I think it's ok to reserve this change for Teleport 14 and not complicate implementation with trying to preserve compatibility for certificates. People will have to relogin, yes, but that should be fine and we can mention it in the T14 release notes. We've done things that require relogin before. WDYT?

I am ok with this as long as we are clear about the change and include it in the breaking changes notes.

Handling the compatibility of certificates would be a nightmare for security reasons and we shouldn't pursue that path.

* remove config template discussion
* explain a discovery naming convention approach
@GavinFrazar GavinFrazar changed the title RFD 129: Auto-Discovery Resource Name Templates RFD 129: Avoid Discovery Resource Name Collisions Jun 7, 2023
Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md Outdated

# ambiguous prefix name is an error
$ tsh db connect --db-user=alice --db-name-postgres bar
error: ambiguous database name could match multiple databases:
Copy link
Copy Markdown
Contributor

@klizhentas klizhentas Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the output that is easy to copy paste:

Multiple databases match the name "bar". Please specify the full database name. You can find it by running 

$tsh db ls -v, or run the following commands:

$ tsh db connect bar-rds-us-west-1-0123456789012 # instance in account-id=0123456789012,region=us-west-1,env=dev`
$ tsh db connect bar-rds-us-east-2-0123456789012 # instance in ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially, I thought copy/paste examples would be a good idea, but I'm not so sure anymore - the commands don't include the flags the user used previously. As a user I think i'm more likely to press the up arrow instead to modify my last command than copy/paste a new one. Also, printing example commands is bug prone, and it's frustrating to copy/paste a suggested command that doesn't work or doesn't do what I intended.

I prefer the error output from tsh ssh that just provides hints about how to resolve the ambiguity, with a generic example Hint: try addressing the node by unique id (ex: tsh ssh user@node-id)

Comment thread rfd/0129-discovery-name-templating.md Outdated
error: ambiguous database name could match multiple databases:
Name Description Allowed Users Labels Connect
------ ------------------- ------------------- --------------------------- -------
bar-rds-us-west-1-0123456789012 RDS instance in ... [*] account-id=0123456789012,region=us-west-1,env=dev,...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not show the full output then?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually dislike tsh db ls -v because my terminal wraps long lines and breaks the table formatting, outputting a jumbled mess, although i'm not sure how common line wrapping is for other terminals.
It's not really an issue with tsh ls -v for ssh servers because the output isn't as verbose.

I like the hint to just suggest using tsh db ls -v, or maybe tsh db ls --format=yaml, as well as suggested commands to copy/paste

Copy link
Copy Markdown
Contributor

@klizhentas klizhentas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, we should add the same story for UI and Connect too.

@smallinsky smallinsky requested a review from klizhentas June 14, 2023 15:34
Comment thread rfd/0129-discovery-name-templating.md Outdated
argument. These commands shall support
`tsh <sub-command> [name | prefix] [key1=value1,key2=value2,...]` syntax:

- `tsh db login`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include tsh app xxx commands in the list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For UX consistency, yes. I only excluded it from the list because we don't have app auto-discovery

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md Outdated
Comment thread rfd/0129-discovery-name-templating.md
Comment thread rfd/0129-discovery-name-templating.md Outdated
* consistent naming scheme
* helps avoid collisions in rare cases of invalid resource group chars
* support --query and --labels flags instead of positional labels arg
* clarify how prefix resource name matching will be implemented
* update examples
Comment thread rfd/0129-discovery-name-templating.md Outdated
@GavinFrazar GavinFrazar added this pull request to the merge queue Jun 20, 2023
Merged via the queue into master with commit 2dacc4d Jun 20, 2023
@GavinFrazar GavinFrazar deleted the rfd/0129-discovery-resource-name-templating branch June 20, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants