RFD 57: Add agentless mode section and AWS tags forwarding section by lxea · Pull Request #18676 · gravitational/teleport

lxea · 2022-11-22T13:18:09Z

This adds some sections to the EC2 discovery rfd outlining some more information on how agentless mode will be configured and implemented as well as information on how AWS tags will be set for the teleport node labels.

r0mant · 2022-11-30T23:43:17Z

@jakule @capnspacehook Can you take a look as well when you get a chance? This is going to rely on the "OpenSSH inventory management" functionality you're working on.

zmb3 · 2022-12-01T15:02:28Z

+will be created, each secret will have the following contents
+
+```json
+{


What about CA rotations?

In order to not break access to these agentless nodes, the discovery service would need to watch for CA rotations and push updated certs to the agentless nodes.

Are we including this in scope, or are we okay with the fact that CA rotations will break things for now?

Yes, @lxea and I talked about it - we should address this in the RFD but we probably won't include this in the scope of the initial implementation. I wonder if we'll be able to reuse some of the automatic upgrade functionality @fspmarshall is working on for this.

I don't see why automatic upgrades apply. That's about updating the Teleport binary, not getting new certs.

I'm no expert here, but I would think discovery service could set up a watcher for CA rotations just like Machine ID does, and push the new certs to the instances when this happens.

I didn't mean automatic upgrades specifically but rather machinery that's part of script-based upgrades for example, basically the machinery that will run some script on the nodes based on some conditions (new version availability is just one of the conditions in my mind). But yeah, it could be done in discovery service too.

capnspacehook

LGTM as long as cert rotation is addressed

fspmarshall · 2023-02-06T20:00:28Z

+### Including AWS Tags as Teleport labels
+
+The AWS tags on discovered EC2 instances will be included as Teleport labels on the
+discovered Nodes.
+
+In order to achieve this a helper resource named `DiscoveredServer` will be
+introduced with will store metadata about discovered nodes that was retrieved via the
+AWS API.
+
+When Teleport is installed and registers the EC2 instance, the Auth server will check
+for a corresponding `DiscoveredServer` resource by matching on `instance-id` and
+`account-id` labels. If there is a matching `DiscoveredServer`, it will create a
+`Server` resource using the metadata from the `DiscoveredServer` and ignore labels
+sent via heartbeat from the node.
+


Sorry for jumping into this late, got here via #21079

I think this needs some more consideration from a scalability perspective.

As currently implemented in #21079 this induces an additional backend read for every node heartbeat (regardless of wether this feature is in use), and has the potential to induce an additional backend write per instance in the worst case. This is very problematic.

For context, I just got out of an S1 induced by a change that increased reads/writes by about 30%. IIUC this solution has the potential to be far more impactful. The 30% change took care to use very aggressive jittering s.t. the added operations were evenly distributed across a ~3 min interval. As currently implemented in #21079, this change performs writes in a tight loop once per minute, so I'd hazard that this change would be more than 3x as impactful.

In general, adding any new resource that has the potential to scale with cluster size is a pretty big deal.

At the very least, I think we need to modify the implementation s.t. we don't incur the additional read on node heartbeat creation if this feature is not in use, so that people can opt into the significantly increased backend load. E.g. if the cert of the node encoded wether or not it was subject to discovered labels, then heartbeat logic could selectively decide when it was appropriate to incur the added load.

Ideally, we'd find a way to implement this without significantly changing the current load characteristics. E.g. if it were acceptable for there to be a short (1-6 min) delay between node joining and full label population, I think we could work this feature into the existing Instance resource model without significantly changing load characteristics (tho it'd be tricky).

@fspmarshall Yep, we shouldn't be placing this check in the local backend, I agree. We'll move it up the stack so auth server maintains local cache for discovered servers as for other resources and checks that for node heartbeats.

klizhentas

@r0mant lgtm, assuming you will address @fspmarshall's scalability concerns.

r0mant · 2023-02-17T21:23:22Z

@lxea I've moved the part about tags import into a separate RFD: #22033.

Can you remove it from here and let's merge this one.

lxea force-pushed the lxea/rfd-update-agentless branch 2 times, most recently from caca05c to 517d966 Compare November 23, 2022 16:36

lxea marked this pull request as ready for review November 23, 2022 16:38

github-actions Bot requested review from greedy52 and r0mant November 23, 2022 16:38

github-actions Bot added the rfd Request for Discussion label Nov 23, 2022

r0mant reviewed Nov 30, 2022

View reviewed changes

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

r0mant reviewed Nov 30, 2022

View reviewed changes

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

Comment thread rfd/0057-automatic-aws-server-discovery.md

r0mant requested review from capnspacehook and jakule and removed request for greedy52 November 30, 2022 23:42

lxea force-pushed the lxea/rfd-update-agentless branch from 517d966 to 9c8f8a1 Compare December 1, 2022 14:35

zmb3 reviewed Dec 1, 2022

View reviewed changes

capnspacehook approved these changes Dec 6, 2022

View reviewed changes

r0mant reviewed Dec 6, 2022

View reviewed changes

r0mant requested review from klizhentas and xinding33 December 6, 2022 19:50

klizhentas requested changes Dec 7, 2022

View reviewed changes

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

Comment thread rfd/0057-automatic-aws-server-discovery.md Outdated

capnspacehook mentioned this pull request Dec 12, 2022

RFD 98: Registered OpenSSH Nodes #19261

Merged

lxea force-pushed the lxea/rfd-update-agentless branch 4 times, most recently from 3b8a21a to 47490e1 Compare December 23, 2022 15:08

lxea mentioned this pull request Dec 29, 2022

Add agentless installer in the teleport discovery service #19648

Merged

lxea force-pushed the lxea/rfd-update-agentless branch from 47490e1 to 0f3d07b Compare January 18, 2023 13:27

r0mant approved these changes Feb 6, 2023

View reviewed changes

fspmarshall reviewed Feb 6, 2023

View reviewed changes

This was referenced Feb 6, 2023

Edwarddowling/ec2 labels #21079

Closed

Implement automatic CA rotation for EC2 agentless nodes #21406

Closed

This was referenced Feb 7, 2023

Make agentless EC2 discovery the default #20407

Closed

Agentless enrollment and discovery #21408

Closed

r0mant changed the title ~~Add agentless mode section and AWS tags forwarding section to EC2 discovery RFD~~ RFD 57: Add agentless mode section and AWS tags forwarding section Feb 7, 2023

klizhentas approved these changes Feb 10, 2023

View reviewed changes

r0mant mentioned this pull request Feb 17, 2023

Add RFD for fetching EC2 tags via API #22033

Merged

Alex McGrath added 6 commits February 20, 2023 11:00

Add agentless mode section to ec2 discovery rfd

2d321b0

Update the labels section

07f1c58

use teleport join command instead of secret-manager

aaafcd0

update to include a full teleport join command example

6fd6adb

Add cert rotation section

865c9f5

remove AWS Tags section

adbb574

lxea force-pushed the lxea/rfd-update-agentless branch from 0f3d07b to adbb574 Compare February 20, 2023 11:01

lxea enabled auto-merge February 20, 2023 11:01

lxea added this pull request to the merge queue Feb 20, 2023

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 20, 2023

lxea added this pull request to the merge queue Feb 20, 2023

Merged via the queue into master with commit a15e987 Feb 20, 2023

lxea deleted the lxea/rfd-update-agentless branch February 20, 2023 13:09

Conversation

lxea commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r0mant commented Nov 30, 2022

Uh oh!

zmb3 Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

r0mant Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

zmb3 Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

r0mant Dec 1, 2022

Choose a reason for hiding this comment

Uh oh!

capnspacehook left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fspmarshall Feb 6, 2023

Choose a reason for hiding this comment

Uh oh!

r0mant Feb 6, 2023

Choose a reason for hiding this comment

Uh oh!

klizhentas left a comment

Choose a reason for hiding this comment

Uh oh!

r0mant commented Feb 17, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lxea commented Nov 22, 2022 •

edited

Loading