validate node hostnames when they are being created or updated#46892
validate node hostnames when they are being created or updated#46892capnspacehook wants to merge 7 commits intomasterfrom
Conversation
b6945ce to
886e7ec
Compare
|
@marcoandredinis do you think this could interfere/break EC2 discovery? I don't think so as the DNS names that are assigned to EC2 nodes always seem to be valid hostnames. It looks like when creating an EC2 instance with |
I don't think this will break EC2 discovery. |
886e7ec to
62857a7
Compare
62857a7 to
6808937
Compare
cfb23de to
31350e2
Compare
31350e2 to
1475874
Compare
8394fe5 to
897a528
Compare
897a528 to
04aead8
Compare
04aead8 to
b41998e
Compare
|
It turns out setting node hostnames to UUIDs is pretty common among our tests and codebase in general, so I added a function that will specifically validate node hostnames, as |
rosstimothy
left a comment
There was a problem hiding this comment.
What impact will this have on any existing nodes with a now invalid hostname? Will they fail to heartbeat and disappear from the inventory altogether? How will cluster admins be able to identify these invalid hosts? What recourse will they have to resolve the problem if Teleport is their only means to access the now offline hosts?
|
The checks were added such that nodes with hostnames that were previously allowed shouldn't be affected at all, the checks are only done at the I tested this before, but in testing again found a check at the heartbeat input point triggered errors for existing nodes @rosstimothy. |
|
I'm not convinced the current state of this PR addresses all the possible ways the linked issue could be exploited. If you want to merge as is, that's fine, but I don't think we should close the linked issue. |
|
+1 to current state not fully resolving the issue since it still permits malformed hostnames to be added by either token registration or changing the hostname of an extant agent. I don't see us solving those cases in a simple/performant manner. A reasonable compromise might be to backport the current state (no new openssh/discovery nodes), along with a notification that goes something like: Then, merge hard rejection of all invalid hostnames for master/v18+. The linked issue is (IMO) fairly low severity, so I don't think giving folks a major version cycle to migrate their existing agents would be unreasonable. |
|
Got approval from @r0mant issue a warning in v17 and hard reject invalid hostnames in v18. Added a warning here, will open a PR to add to the v17 changelog after this is merged as well and another PR to master that will hard reject invalid hostnames. |
fspmarshall
left a comment
There was a problem hiding this comment.
I think it would be nice to have a periodic that checks for invalid hostnames and generates a notification to improve visibility in affected clusters. That could easily be part of followup work though.
rosstimothy
left a comment
There was a problem hiding this comment.
It would be nice to have test coverage outside of the utils package to verify invalid hostnames are forbidden going forward and prevent any regressions.
|
Sorry for throwing yet another wrench into this process, but @rosstimothy and I were chatting about this elsewhere and had some more thoughts on how we might make this a bit of a "safer" transition: The whole evicting/rejecting nodes thing is scary. Even with notifications/warnings and a full major version, someone who is on an older version and upgrading fast might be blindsided by nodes dropping out unexpectedly. What if we don't evict/reject invalid hostnames at all, and instead come up with a scheme for keeping the agents dialable while also refusing to advertise invalid hostnames? Just ignoring/omitting the invalid hostname is sketchy, as it leaves blank entries in the UI/ With the above (or something like it), we should be able to fully address invalid hostnames with no loss of access or information. @capnspacehook what do you think? I know we've had a lot of back and forth about this issue, and I'm very sorry about that, but I think this is likely a much better path forward than anything we've considered thus far. |
5748aee to
d5fb2fe
Compare
|
Thanks for the idea @fspmarshall, that's a much better approach than evicting nodes. I added a periodic global notification sent to users that can update nodes, let me know if I need to change anything. |
|
@rosstimothy added regression test |
| err := utils.ValidateNodeHostname(eiceNode.GetHostname()) | ||
| if err != nil { |
There was a problem hiding this comment.
Suggestion: reduce the error scope
| err := utils.ValidateNodeHostname(eiceNode.GetHostname()) | |
| if err != nil { | |
| if err := utils.ValidateNodeHostname(eiceNode.GetHostname()); err != nil { |
| return nil, trace.BadParameter("cannot place node in namespace %q, custom namespaces are deprecated", n) | ||
| } | ||
| if err := utils.ValidateNodeHostname(server.GetHostname()); err != nil { | ||
| s.log.Warnf( |
There was a problem hiding this comment.
This will be misleading if this hostname violates the newly enforced length restriction. Perhaps the message here should be a bit more vague and include the error message instead?
| if err := utils.ValidateNodeHostname(srv.GetHostname()); err != nil { | ||
| invalidNodeHostnames = append(invalidNodeHostnames, srv.GetHostname()) |
There was a problem hiding this comment.
We should cap the number of invalid nodes added here to protect against consuming too much memory(what if all my nodes now have an invalid hostname 😨?) and to reduce overwhelming consumers of the notification. If the notification content is too large, we may be unable to persist the notification, and would also be hard for a user to digest. We probably want some happy medium since we also don't want one notification for each invalid hostname.
| } | ||
| if hostname := s.GetHostname(); hostname != "" { | ||
| if err := utils.ValidateNodeHostname(hostname); err != nil { | ||
| return nil, trace.Wrap(err) |
There was a problem hiding this comment.
I lean toward never hard rejecting any invalid hostnames, and always permitting access, but obfuscating the hostname. Imagine that Teleport is the only means by which an admin has access to the host to change the hostname and we've now prevented them from being able to modify the hostname by rejecting here.
rosstimothy
left a comment
There was a problem hiding this comment.
As mentioned in a few comments, I don't think we should proceed with an approach where we are rejecting heartbeats because of an invalid hostname. I've opened #48988 with the approach that we discussed about moving the invalid hostnames to a label and replacing the hostname with a valid alternative.
|
Superseded by #48988. |
See #46892 (comment) on why the validation logic was changed from previously using
utils.IsValidHostname.Updates https://github.com/gravitational/teleport-private/issues/1676.
changelog: validate node hostnames when they are being created or updated