Skip to content

Edwarddowling/ec2 labels#21079

Closed
EdwardDowling wants to merge 17 commits intomasterfrom
edwarddowling/ec2-labels
Closed

Edwarddowling/ec2 labels#21079
EdwardDowling wants to merge 17 commits intomasterfrom
edwarddowling/ec2-labels

Conversation

@EdwardDowling
Copy link
Copy Markdown
Contributor

@EdwardDowling EdwardDowling commented Feb 1, 2023

Part of #18329

Teleport nodes running on EC2 instances can automatically import labels using metadata endpoint but some users disable the metadata endpoint but still would like to import tags.

This PR introduces a discovered server resource to store the labels retrieved from the describeInstances call we use when discovering the ec2 instances.

Comment thread lib/srv/heartbeat.go Outdated
@EdwardDowling EdwardDowling marked this pull request as ready for review February 3, 2023 12:46
@EdwardDowling EdwardDowling marked this pull request as draft February 3, 2023 13:12
@EdwardDowling EdwardDowling marked this pull request as ready for review February 3, 2023 15:50
@rosstimothy
Copy link
Copy Markdown
Contributor

I think we should hold off on implementation until the RFD is approved and merged

Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done full review yet, just one point.

Comment thread lib/services/local/presence.go Outdated
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@espadolini @fspmarshall Mind taking a look at this again since you looked at it before?

Comment thread lib/auth/api.go Outdated
UpsertDiscoveredServer(context.Context, types.DiscoveredServer) (*types.KeepAlive, error)

// GetDiscoveredServer gets a DiscoveredServer.
GetDiscoveredServer(context.Context, string, string) (*types.DiscoveredServerV1, error)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think GetDiscoveredServer should be a part of the Announcer interface.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would access point be a better spot for them? Or was there somewhere more appropriate you were thinking of?

if err := a.action(apidefaults.Namespace, types.KindDiscoveredServer, types.VerbRead); err != nil {
return nil, trace.Wrap(err)
}
return a.authServer.GetDiscoveredServer(ctx, instanceID, accountID)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@espadolini @fspmarshall This goes through cache first, correct? Since authServer embeds Cache.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's preferable to explicitly invoke the method against the cache for something like this (e.g. a.authServer.Cache.GetDiscoveredServer(...)), to better communicate the fact that hitting the cache isn't just preferred, it is required in order to prevent an outage.

Comment thread lib/srv/discovery/discovery.go Outdated
Comment thread lib/srv/discovery/discovery.go Outdated
Comment thread lib/services/local/presence.go Outdated
Comment thread lib/labels/cloud.go
Comment thread lib/auth/auth_with_roles.go Outdated
Comment thread lib/auth/auth_with_roles.go
Comment thread lib/auth/auth_with_roles.go Outdated
Comment on lines +948 to +952
// Use labels from discovered resources instead of ones reported by discovered ec2 instances
s, err := a.filterEC2Labels(ctx, s)
if err != nil {
return nil, trace.Wrap(err)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this logic belongs in the ACL layer tbh. Wouldn't GRPC server be a better place for it, we already update parts of the received Server resource there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other way around, needs to be on the level of auth.Server to ensure it always gets called no matter where the UpsertNode call is coming from (see #21079 (comment)).

Comment thread lib/services/local/presence.go Outdated
Also fix some unsafe casting and move some values to constants
Copy link
Copy Markdown
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node needs to be aware of its labels to make RBAC decisions, and having the node heartbeat resource (and thus the auth) disagree with what the node thinks its labels are is going to be confusing at best, and a security issue at worst.

Comment thread lib/auth/grpcserver.go
Comment on lines 3096 to 3111
}
node.SetAddr(utils.ReplaceLocalhost(node.GetAddr(), p.Addr.String()))

// Use labels from discovered resources instead of ones reported by discovered ec2 instances
filteredNode, err := auth.ServerWithRoles.FilterEC2Labels(ctx, node)
if err != nil {
return nil, trace.Wrap(err)
}
node, ok = filteredNode.(*types.ServerV2)
if !ok {
return nil, trace.BadParameter("unexpected type %T", filteredNode)
}

keepAlive, err := auth.ServerWithRoles.UpsertNode(ctx, node)
if err != nil {
return nil, trace.Wrap(err)
Copy link
Copy Markdown
Contributor

@fspmarshall fspmarshall Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have no effect in most cases. UpsertNode is only called by the node itself if it doesn't have a healthy control stream. When healthy, its heartbeats are injected by the auth server directly:

lease, err := c.auth.UpsertNode(c.closeContext, sshServer)

If you want to always intercept a node heartbeat, this should be happening at the level of auth.Server, since that will affect all calls to UpsertNode, not just calls that get routed through the GRPC server.

@r0mant
Copy link
Copy Markdown
Collaborator

r0mant commented Feb 10, 2023

@espadolini Good point. Should we force apply RBAC for auto-discovered nodes on the proxy then, or have the node sync the "actual" labels somehow? What do you think would be a good approach?

@fspmarshall
Copy link
Copy Markdown
Contributor

Should we force apply RBAC for auto-discovered nodes on the proxy then, or have the node sync the "actual" labels somehow? What do you think would be a good approach?

@r0mant Forcing-applying RBAC on the proxy isn't really an option for direct-dial nodes. Short of completely deprecating the concept of node-side RBAC, I'd be very hesitant to rely on proxy-side RBAC actually being enforced consistently...

Honestly, having a node's RBAC state be partially defined by an external authority is a totally new concept in teleport... I'm betting any solution we come up with is going to need pretty rigorous verification to be certain it works in a consistent manner. Even if we have the node sync labels, nodes allow incoming connections even if they don't have a healthy connection to auth... at the very least, I'd say we'd want to disallow the use of ec2 labels in deny rules, just to ensure that if the system fails it fails in the direction of reduced privilege rather than escalated privilege.

@r0mant
Copy link
Copy Markdown
Collaborator

r0mant commented Feb 10, 2023

@fspmarshall @espadolini What if we did the following:

  • Node would get back its "real" labels in the response to heartbeat/keep-alive.
  • We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

What do you think? If this works, we will update the RFD to clarify this approach.

@rosstimothy
Copy link
Copy Markdown
Contributor

rosstimothy commented Feb 10, 2023

@fspmarshall @espadolini What if we did the following:

  • Node would get back its "real" labels in the response to heartbeat/keep-alive.
  • We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

What do you think? If this works, we will update the RFD to clarify this approach.

We just did a bunch of work to ensure that connections to nodes don't require auth (RFD). It would be a shame if we walked this back just to support this feature.

@rosstimothy rosstimothy reopened this Feb 10, 2023
@fspmarshall
Copy link
Copy Markdown
Contributor

Node would get back its "real" labels in the response to heartbeat/keep-alive.
We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

IIUC as currently implemented, ec2 labels are "discovered" for any instance that matches the discovery selector, meaning that it might be discovering labels for instances that were added by another means. I think we'd need to be careful to make sure individual nodes know statically wether or not they are supposed to be taking on this behavior, and ensure that labels never get applied to nodes that aren't statically configured to take on this behavior. Likely, this means that we'd want the node's join token to force it into a special mode where it accepts externally applied labels, and have the label discovery applied only to nodes that exhibit this setting.

Even then, since these externally applied labels are dynamically discovered rather than static, it feels brittle. Any disruption of label discovery is effectively an outage for all discovery nodes since we cannot evaluate RBAC sanely for them.

IMO we'd be much better off accepting that discovered labels are populated lazily and treating them as ineligible for matching in deny rules... the failure case is much more graceful in that case.

@r0mant
Copy link
Copy Markdown
Collaborator

r0mant commented Feb 10, 2023

@fspmarshall @rosstimothy @espadolini Thanks guys. Let me put this PR to draft for now, we're probably better off moving this discussion back to the RFD.

@r0mant r0mant marked this pull request as draft February 10, 2023 19:18
@fspmarshall
Copy link
Copy Markdown
Contributor

Side note: we've talked for a long time about the idea of statically associating labels with join tokens s.t. the labels are guaranteed to exist for the lifetime of the instance's credentials. That's much easier to enforce since it's not dynamically propagated... could we add a feature that let users statically assign labels based on the instance labels observed at time of initial discovery instead? That'd make all of this much easier I think.

@r0mant
Copy link
Copy Markdown
Collaborator

r0mant commented Apr 26, 2023

We're gonna be implementing a different approach described in #22033.

@r0mant r0mant closed this Apr 26, 2023
@r0mant r0mant deleted the edwarddowling/ec2-labels branch April 26, 2023 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants