Edwarddowling/ec2 labels by EdwardDowling · Pull Request #21079 · gravitational/teleport

EdwardDowling · 2023-02-01T16:35:10Z

Teleport nodes running on EC2 instances can automatically import labels using metadata endpoint but some users disable the metadata endpoint but still would like to import tags.

This PR introduces a discovered server resource to store the labels retrieved from the describeInstances call we use when discovering the ec2 instances.

label grabbing from describe instance not metadata api

rosstimothy · 2023-02-06T18:43:27Z

I think we should hold off on implementation until the RFD is approved and merged

r0mant

Haven't done full review yet, just one point.

r0mant

@espadolini @fspmarshall Mind taking a look at this again since you looked at it before?

r0mant · 2023-02-09T01:38:49Z

+	UpsertDiscoveredServer(context.Context, types.DiscoveredServer) (*types.KeepAlive, error)
+
+	// GetDiscoveredServer gets a DiscoveredServer.
+	GetDiscoveredServer(context.Context, string, string) (*types.DiscoveredServerV1, error)


I don't think GetDiscoveredServer should be a part of the Announcer interface.

Would access point be a better spot for them? Or was there somewhere more appropriate you were thinking of?

r0mant · 2023-02-09T01:40:13Z

+	if err := a.action(apidefaults.Namespace, types.KindDiscoveredServer, types.VerbRead); err != nil {
+		return nil, trace.Wrap(err)
+	}
+	return a.authServer.GetDiscoveredServer(ctx, instanceID, accountID)


@espadolini @fspmarshall This goes through cache first, correct? Since authServer embeds Cache.

Yes, but it's preferable to explicitly invoke the method against the cache for something like this (e.g. a.authServer.Cache.GetDiscoveredServer(...)), to better communicate the fact that hitting the cache isn't just preferred, it is required in order to prevent an outage.

r0mant · 2023-02-09T01:59:01Z

+	// Use labels from discovered resources instead of ones reported by discovered ec2 instances
+	s, err := a.filterEC2Labels(ctx, s)
+	if err != nil {
+		return nil, trace.Wrap(err)
+	}


I don't know if this logic belongs in the ACL layer tbh. Wouldn't GRPC server be a better place for it, we already update parts of the received Server resource there.

Other way around, needs to be on the level of auth.Server to ensure it always gets called no matter where the UpsertNode call is coming from (see #21079 (comment)).

Also fix some unsafe casting and move some values to constants

espadolini

The node needs to be aware of its labels to make RBAC decisions, and having the node heartbeat resource (and thus the auth) disagree with what the node thinks its labels are is going to be confusing at best, and a security issue at worst.

fspmarshall · 2023-02-10T17:18:18Z

 	}
 	node.SetAddr(utils.ReplaceLocalhost(node.GetAddr(), p.Addr.String()))

+	// Use labels from discovered resources instead of ones reported by discovered ec2 instances
+	filteredNode, err := auth.ServerWithRoles.FilterEC2Labels(ctx, node)
+	if err != nil {
+		return nil, trace.Wrap(err)
+	}
+	node, ok = filteredNode.(*types.ServerV2)
+	if !ok {
+		return nil, trace.BadParameter("unexpected type %T", filteredNode)
+	}
+
 	keepAlive, err := auth.ServerWithRoles.UpsertNode(ctx, node)
 	if err != nil {
 		return nil, trace.Wrap(err)


This will have no effect in most cases. UpsertNode is only called by the node itself if it doesn't have a healthy control stream. When healthy, its heartbeats are injected by the auth server directly:

teleport/lib/inventory/controller.go

Line 462 in 70833b6

lease, err := c.auth.UpsertNode(c.closeContext, sshServer)

If you want to always intercept a node heartbeat, this should be happening at the level of auth.Server, since that will affect all calls to UpsertNode, not just calls that get routed through the GRPC server.

r0mant · 2023-02-10T17:42:35Z

@espadolini Good point. Should we force apply RBAC for auto-discovered nodes on the proxy then, or have the node sync the "actual" labels somehow? What do you think would be a good approach?

fspmarshall · 2023-02-10T18:10:21Z

Should we force apply RBAC for auto-discovered nodes on the proxy then, or have the node sync the "actual" labels somehow? What do you think would be a good approach?

@r0mant Forcing-applying RBAC on the proxy isn't really an option for direct-dial nodes. Short of completely deprecating the concept of node-side RBAC, I'd be very hesitant to rely on proxy-side RBAC actually being enforced consistently...

Honestly, having a node's RBAC state be partially defined by an external authority is a totally new concept in teleport... I'm betting any solution we come up with is going to need pretty rigorous verification to be certain it works in a consistent manner. Even if we have the node sync labels, nodes allow incoming connections even if they don't have a healthy connection to auth... at the very least, I'd say we'd want to disallow the use of ec2 labels in deny rules, just to ensure that if the system fails it fails in the direction of reduced privilege rather than escalated privilege.

r0mant · 2023-02-10T19:00:10Z

@fspmarshall @espadolini What if we did the following:

Node would get back its "real" labels in the response to heartbeat/keep-alive.
We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

What do you think? If this works, we will update the RFD to clarify this approach.

rosstimothy · 2023-02-10T19:02:18Z

@fspmarshall @espadolini What if we did the following:

Node would get back its "real" labels in the response to heartbeat/keep-alive.

We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

What do you think? If this works, we will update the RFD to clarify this approach.

We just did a bunch of work to ensure that connections to nodes don't require auth (RFD). It would be a shame if we walked this back just to support this feature.

fspmarshall · 2023-02-10T19:15:41Z

Node would get back its "real" labels in the response to heartbeat/keep-alive.
We could potentially reject direct dials if node doesn't have a healthy connection to auth. I don't think direct dial would be a common use-case for auto-discovery.

IIUC as currently implemented, ec2 labels are "discovered" for any instance that matches the discovery selector, meaning that it might be discovering labels for instances that were added by another means. I think we'd need to be careful to make sure individual nodes know statically wether or not they are supposed to be taking on this behavior, and ensure that labels never get applied to nodes that aren't statically configured to take on this behavior. Likely, this means that we'd want the node's join token to force it into a special mode where it accepts externally applied labels, and have the label discovery applied only to nodes that exhibit this setting.

Even then, since these externally applied labels are dynamically discovered rather than static, it feels brittle. Any disruption of label discovery is effectively an outage for all discovery nodes since we cannot evaluate RBAC sanely for them.

IMO we'd be much better off accepting that discovered labels are populated lazily and treating them as ineligible for matching in deny rules... the failure case is much more graceful in that case.

r0mant · 2023-02-10T19:18:07Z

@fspmarshall @rosstimothy @espadolini Thanks guys. Let me put this PR to draft for now, we're probably better off moving this discussion back to the RFD.

fspmarshall · 2023-02-10T19:19:36Z

Side note: we've talked for a long time about the idea of statically associating labels with join tokens s.t. the labels are guaranteed to exist for the lifetime of the instance's credentials. That's much easier to enforce since it's not dynamically propagated... could we add a feature that let users statically assign labels based on the instance labels observed at time of initial discovery instead? That'd make all of this much easier I think.

r0mant · 2023-04-26T15:37:46Z

We're gonna be implementing a different approach described in #22033.

EdwardDowling added 6 commits January 30, 2023 15:20

Add discoveredServer resource

8eb0caf

Add GetDiscoveredServer funcs and initial label grabbing

5644c91

label grabbing from describe instance not metadata api

Change handleEC2Instances so existing discoveredServers are updated

e0582d8

Add mock func for describeinstances to test

89d8ac2

Update discovery test to ensure discovered isntances are recorded

7c62c2c

Fix roles for discovered server

a3fbd54

r0mant reviewed Feb 1, 2023

View reviewed changes

Comment thread lib/srv/heartbeat.go Outdated

EdwardDowling added 4 commits February 2, 2023 13:14

Move ec2 label replacing from client side ot auth side

0543b18

Prevent ec2 instance and account id labels being overwritten

45b2c80

Remove unused deleteDiscoveredServer func

b352f45

Prevent metadata endpoint being used for ec2 instances

8c901e8

EdwardDowling marked this pull request as ready for review February 3, 2023 12:46

github-actions Bot added the size/md label Feb 3, 2023

github-actions Bot requested review from strideynet and xacrimon February 3, 2023 12:47

Add un/marshal tests for discovered server resource

3cc372c

EdwardDowling marked this pull request as draft February 3, 2023 13:12

Fix some tests broken by new discovered server label method

22c9033

EdwardDowling marked this pull request as ready for review February 3, 2023 15:50

github-actions Bot requested review from espadolini and greedy52 February 3, 2023 15:50

github-actions Bot added application-access size/lg labels Feb 3, 2023

fspmarshall mentioned this pull request Feb 6, 2023

RFD 57: Add agentless mode section and AWS tags forwarding section #18676

Merged

r0mant requested changes Feb 6, 2023

View reviewed changes

Comment thread lib/services/local/presence.go Outdated

Add discovered server resource to the cache

cf3c4f3

EdwardDowling requested a review from fspmarshall February 8, 2023 17:35

r0mant requested changes Feb 9, 2023

View reviewed changes

Swap account and instance id in discovered server key

4d458ef

Also fix some unsafe casting and move some values to constants

EdwardDowling added 3 commits February 9, 2023 17:40

Move ec2 label filtering to grpc server

9791e47

Add back in ec2 cloud importer setup

d6a78e9

Move discoveredNode funcs out of announcer interface

bf7370e

espadolini requested changes Feb 10, 2023

View reviewed changes

fspmarshall reviewed Feb 10, 2023

View reviewed changes

rosstimothy closed this Feb 10, 2023

rosstimothy reopened this Feb 10, 2023

r0mant marked this pull request as draft February 10, 2023 19:18

r0mant closed this Apr 26, 2023

r0mant deleted the edwarddowling/ec2-labels branch April 26, 2023 15:37

fspmarshall mentioned this pull request Nov 28, 2023

Add docs for resource-based labels #34475

Merged

Conversation

EdwardDowling commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rosstimothy commented Feb 6, 2023

Uh oh!

r0mant left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

r0mant left a comment

Choose a reason for hiding this comment

Uh oh!

r0mant Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

EdwardDowling Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

r0mant Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

fspmarshall Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r0mant Feb 9, 2023

Choose a reason for hiding this comment

Uh oh!

fspmarshall Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

espadolini left a comment

Choose a reason for hiding this comment

Uh oh!

fspmarshall Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r0mant commented Feb 10, 2023

Uh oh!

fspmarshall commented Feb 10, 2023

Uh oh!

r0mant commented Feb 10, 2023

Uh oh!

rosstimothy commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fspmarshall commented Feb 10, 2023

Uh oh!

r0mant commented Feb 10, 2023

Uh oh!

fspmarshall commented Feb 10, 2023

Uh oh!

r0mant commented Apr 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

EdwardDowling commented Feb 1, 2023 •

edited

Loading

fspmarshall Feb 10, 2023 •

edited

Loading

rosstimothy commented Feb 10, 2023 •

edited

Loading