RFD 0119 - Integrate with AWS using OIDC by marcoandredinis · Pull Request #22556 · gravitational/teleport

marcoandredinis · 2023-03-02T19:28:59Z

r0mant

First pass. @mdwn Could you take a look as well when you get a chance, in particular at OIDC-related bits?

r0mant · 2023-03-08T02:53:31Z

+## What
+
+Discover AWS RDS Databases without manual configuration.
+
+#### Goals
+
+Easily discover AWS RDS Databases of an account.


Isn't the goal to just have a generic way to create an AWS integration in Teleport? RDS discovery/configuration would be just one of potential users of this integration.

I was targeting something more specific to avoid the "where are we going to use this"

Let me rephrase the goal and RFD title to "Integrate with AWS using OIDC"

r0mant · 2023-03-08T02:57:26Z

+Our current discovery process for AWS resources requires multiple steps before the user gets to the "aha moment".
+Some of those steps are a little complicated and require the user to know some Teleport specific internals.
+
+We can reduce the number of steps and remove most of the Teleport specific configuration by creating an OIDC IdP in Teleport.


Be specific: instead of saying "some steps", "a little complicated" and "some specific internals", say that IAM setup/configuration is the challenge and that using AWS OIDC integration will allow Teleport proxy/auth services to gain some limited privileges in users' AWS accounts so simplify these steps. And that it will work with Cloud.

👍
Mentioning cloud here seems out of place 🤔

I'll add it to the goals instead, if that's ok

r0mant · 2023-03-08T03:01:42Z

+
+```yaml
+kind: integration
+subkind: aws-oidc


Nit: Do we need -oidc suffix?

I'm trying to leave space/pattern to have other types of AWS integrations (using secrets or using an ec2 instance).

Should I remove the -oidc anyway?

r0mant · 2023-03-08T03:20:48Z

+#### AWS Calls from unauthorized Teleport Users
+We'll be able to call `DescribeDBInstances` AWS endpoint, however we must ensure that this is protected by RBAC.
+
+In order to do so, we'll re-use the `db` resource with the `list` verb to allow listing RDS Databases.


This part is not very clear tbh. Can you explain what we're trying to protect against here? How can unauthorized Teleport users can leverage AWS integration? Reusing existing database RBAC rules for AWS integration seems strange tbh.

Picture the state where we have an Integration with AWS.
We'll have an HTTP endpoint in Proxy that calls the AWS API Describe DB Instance.
We'll have a new resource Integration, and subsequent rules (create, delete, list, update), but that only works for CRUD operations over the resource itself.

Should any user be able to list the AWS RDS DBs? Or do we want some kind of check before calling the AWS Endpoint?
I'm overloading the db.list rule with something else not entirely relevant.
However, I think we should add some check before calling the AWS Endpoint.

We'll have a new verb: use
If the user has integration.use then they can issue Integration API Calls

We'll have a new verb: use If the user has integration.use then they can issue Integration API Calls

Apologies for a naive, after-the-merge question, but what is stopping us from using one of the existing verbs like "read" for this? Does it have to do with the part where you mention rules working only for CRUD operations over the resource itself?

We have managed to avoid adding a verb like "use" for resources where one could theoretically make use of it as well ("using" a db server or a node) so I'm wondering how this situation is different.

If it's related to the POST .../integration/:id/action/aws-oidc-list-databases endpoint mentioned in the RFD, then we could even argue that describing access to call an integration action could be made through checking the "create" verb for a new resource called "integration action". OTOH I've seen someone mention that it's hard to implement those checks for "virtual" resources that don't actually exist on the backend.

There's some discussion here, which provide some context
#24256 (comment)

We have managed to avoid adding a verb like "use" for resources where one could theoretically make use of it as well ("using" a db server or a node) so I'm wondering how this situation is different.

They are protected by <resource>_labels, so we could also use that here.
We discussed that approach but ended deciding on using the VerbUse

As for creating a virtual resource integration action, I don't think it was considered.
I'm not aware of any issues of implementing resources without a backend. But I didn't try it.

I think it could be an alternative, but don't actually see a big disadvantage in using the VerbUse

strideynet

I'm really excited for an OIDC IdP starting to be added to Teleport. This will eventually unlock the ability for Teleport to issue ID tokens to processes running on hosts with Machine ID installed in order for them to be able to access cloud APIs.

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

sub: system:proxy:discover

This allows the configured trusted relationship to be explicitly for the discover flow.

strideynet · 2023-03-08T12:50:36Z

+- JWKS URI: the endpoint where the provider returns the public keys.
+- Claims Supported: a list of supported claims that the OpenID Provider (ie, Teleport) MAY be able to supply values for when issuing a token:
+  - `iss`: the issuer identity, on our case it will be the same as the `issuer` key.
+  - `sub`: identifies the subject of the token, on our case it will contain the user's name.


Do we want to prefix with some string to identify that the token has been issued to a user ? I can foresee use-cases in future where we may want the teleport system itself to be capable of talking to an API using OIDC for authentication without a user action triggering it.

Perhaps, for a user: user:noah.stride@goteleport.com and for a system service something like system:proxy ?

Sure, seems fair 👍

marcoandredinis · 2023-03-08T17:13:08Z

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

Proxy is making the call but the user was the one who triggered it 🤔
We can set the sub to be the system (proxy/discover) but I think we should also somehow identify the user.
We can always add another claim 🤔

strideynet · 2023-03-08T17:14:47Z

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

Proxy is making the call but the user was the one who triggered it 🤔 We can set the sub to be the system (proxy/discover) but I think we should also somehow identify the user. We can always add another claim 🤔

Perhaps a "on-behalf-of" claim or similar ? I'm unsure, but I do think that identifying the proxy discover subsystem lets us craft a much more specific trust rule in AWS.

r0mant · 2023-03-13T21:23:06Z

+
+After this step, when the user tries to list RDS DBs, Teleport generates a token and issues the API call.
+
+To generate a token, Teleport creates a JWT with the claims described above and the configured role, and signs it with the private key (the public is provided in the public JWKS endpoint).


JWT with the claims described above

What are these claims? I don't see any claims being described above.

They are below and not above, sorry.
Replaced with below with a markdown link to it.

r0mant · 2023-03-13T21:24:32Z

+The first interaction between AWS and Teleport happens when the user clicks on `Get Thumbprint` (step 4 of the User's Point of View flow).
+This will trigger a request started by AWS to Teleport hitting the OpenID Configuration endpoint and subsequently the JSON Web Key Set endpoint.
+
+After this step, when the user tries to list RDS DBs, Teleport generates a token and issues the API call.


Can you have multiple AWS integrations e.g. for different accounts? How do we pick which one to use?

Yes, multiple AWS Integrations must be supported.
I'll add that.

I'll also add that the user must be able to select which AWS Integration to use.

r0mant · 2023-03-13T21:28:05Z

+$ tctl create aws-integration.yaml
+```
+
+Resource representation as `yaml`:


Can you add a section covering IaC flows to the RFD as well? Are there any changes needed to Terraform provider, Kube operator and Helm charts?

r0mant · 2023-03-13T21:29:46Z

+We'll be able to call `DescribeDBInstances` AWS endpoint, however we must ensure that this is protected by RBAC.
+
+In order to do so, we'll re-use the `db` resource with the `list` verb to allow listing RDS Databases.
+
+```yaml
+kind: role
+metadata:
+  name: editor
+spec:
+  allow:
+    rules:
+    - resources:
+      - db
+      verbs:
+      - list
+```


I still don't get this tbh. In your comment you said:

We'll have an HTTP endpoint in Proxy that calls the AWS API Describe DB Instance.

What endpoint is this and why would it be unauthenticated?

There's still authentication, however my concern is authorization.

Let's say we have a UserA that has editor role and is able to List AWS RDS DBs.

However, there's also UserB that has access role.
Should UserB be allowed to call the List AWS RDS DBs?

I added some more details on Listing AWS RDS DBs
83c735b

Let me know if that helps in understanding my concern

strideynet · 2023-03-14T08:41:17Z

+            "Action": "sts:AssumeRoleWithWebIdentity",
+            "Condition": {
+                "StringEquals": {
+                    "proxy.example.com:aud": "sts.amazonaws.com"


If we used an ID token subject that was specific to the Teleport Discovery process then we could specify this here to ensure that this role can only be used by that process and not by other entities which have been issued tokens.

Something like this?

"Condition": { "StringEquals": { "proxy.example.com:aud": "sts.amazonaws.com", "proxy.example.com:sub": "system:proxy:discover",

Assuming we issue tokens whose sub is system:proxy:discover.

What would be the pros of this approach?
Allow for multiple roles to be associated with the same provider but having different policies.

If we are strict here, then we would need users to create extra roles in order to have multiple Teleport "systems" interacting with it.

Honestly, I think we should not add that constraint to not increase scope.
We can always add that condition later on if needed.

Yeah, something like that.

My main "pro" for it is that when we start issuing tokens from the Teleport OIDC provider to users or machines, it prevents them from assuming this role. I know we can change this recommendation in future, but it'll be up to operators to actually go back and update this IAM role configuration.

This isn't a hard requirement from me, I'm happy to proceed as is but I just think we need to be aware that we'll be issuing these ID tokens to things other than the Teleport system in the near future.

I'd rather avoid this as it requires further setup to the trust relationship (which is created automatically in the IdP setup flow) with not much added benefit

Mhm, I can definitely see the concern from the UX point of view.

But, I really think as part of the RFD we need to acknowledge that we are opening up access to list all RDS DBs to any machine or user in future if an operator misses an update advisory and does not modify their existing trust relationships created before we issued the change in advice. Whilst this is certainly a low privilege action, one concern I have is that the design from this RfD is used with another feature that requires more privileged permissions and because this wasn't openly acknowledged in this RfD, that this will be missed.

We can add

"StringEquals": { "proxy.example.com:aud": "sts.amazonaws.com", "proxy.example.com:sub": "system:proxy",

And if we ever need to evolve that, we can just add another suffix to the sub.
What do you think?

@strideynet I might be misinterpreting your concern, let's chat on Slack if I'm off -

But, I really think as part of the RFD we need to acknowledge that we are opening up access to list all RDS DBs to any machine or user in future if an operator misses an update advisory and does not modify their existing trust relationships created before we issued the change in advice

Hmm, I don't see this happening - the AWS provider is a one-time setup with an IAM policy, that grants the Teleport auth service access to AWS, which can only be accessed through the AWS integration. This mechanism would be the only thing that sets the sub when signing the OIDC JWT, so I don't see the advantage in adding a condition that's always going to evaluate to true

Whilst this is certainly a low privilege action, one concern I have is that the design from this RfD is used with another feature that requires more privileged permissions and because this wasn't openly acknowledged in this RfD, that this will be missed.

Like above, there's only one feature that's going to use this

@marcoandredinis - @strideynet and I had a chat last night about this section and came to the compromise of setting the audience to something other than sts.amazonaws.com - maybe discover.teleport or discover.proxy-url

As per the AWS docs the audience is meant to be the client ID of the application, so setting it to something Discover related is more suitable anyway.

This means we can avoid setting a subject as we're not going to conflict with any future developments of the IdP provider (they'll be requesting a different audience).

mdwn

We should specifically call out which auth flows we support for this OIDC. One thing I ran into while experimenting with this is back channel vs. front channel auth. Back channel meaning that there has to be direct communication between the IdP and the receiving service, which may or may not be the case depending on a user setup.

I mention this because Okta in particular is opinionated about the use of the id_token flow, which could be a problem when generalizing this.

Handy reference; https://www.scottbrady91.com/openid-connect/openid-connect-flows

mdwn · 2023-03-14T14:36:37Z

+3. Configures the Teleport instance as an OIDC IdP:
+   1. Clicks `Add provider` of type `OpenID Connect`.
+   2. Fills in the Provider URL as provided by the Discover Wizard.
+   3. Fills in the Audience as provided by the Discover Wizard `sts.amazonaws.com`.


Does the audience have to be sts.amazonaws.com? We could skip this step if we had a fixed audience as part of the OIDC IdP in Teleport.

Whilst AWS doesn't enforce it, and I'd argue they should, sts.amazonaws.com definitely makes the most semantic sense for the value here as that's the host of the AWS service that allows the OIDC ID token to be swapped for AWS credentials. I imagine they don't enforce it universally as not too long ago some CI/CD providers did not allow you to customise the audience of the ID tokens issued to workloads.

I've changed the audience to reflect the client
Per aws docs

The client ID (also known as audience) is a unique identifier for your app that is issued to you when you register your app with the IdP

So, the audience is going to be discover.teleport

r0mant · 2023-04-04T17:28:43Z

Got this question after reviewing #23989. What's the use-case for having different statuses? I.e. why not just create the integration in the "running" state? And what's the scenario in which it will be placed in "error" state and how to recover from it?

Just wondering if it's really needed.

The integration may become unhealthy (OIDC IdP was removed from AWS, Role doesn't allow for required api calls, thumbprint changed).
In that case, there will be a Cluster Alert and the Integration will switch to Error Status.

The Paused status is to allow the user to disable the Integration but not removing it.
Maybe that's something we can work later on, and is not really needed for this first iteration.

If the proxy rotates its SSL certificate the integration will also become unhealthy as AWS saves and compares the fingerprint of the SSL certificate on setup

thumbprint*

It's not quite that drastic, it actually uses the fingerprint of the TLS certificate's CA. So you can rotate certificates as long as it's issued by the same CA.

Ah I misunderstood. Cheers

@marcoandredinis I understand the general intent but tbh it just raises more questions: how would Teleport detect that it's unhealthy and why, and whether it's a permanent or temporary unhealthiness, and would we detect it's healthy after being unhealthy, and what does the user need to do in that case? And so on.

We did have some ideas (or rather, "aspirations") for upstream resource health checks (e.g. #20544) but it feels like it's a large enough problem that would require a separate RFD and consistent approach we can use for all resources.

I would cut this from the scope of this RFD and implementation for now.

Sounds fair 👍
I'll remove the status from the RFD and implementation (backport to V12 is not merged yet, so it will wait to include this removal)

klizhentas · 2023-04-05T01:48:05Z

@r0mant I'd like to emphasize that I think this flow will work great for smaller teams that just start with their AWS account.

We must have a parallel IAC terraform flow for more mature teams that would like to configure everything without magic declaratively.

marcoandredinis changed the title ~~first draft~~ RFD9999 - OIDC IdP for AWS resource discovery Mar 2, 2023

marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch 3 times, most recently from b2bcaa9 to f94fc24 Compare March 3, 2023 17:22

marcoandredinis changed the title ~~RFD9999 - OIDC IdP for AWS resource discovery~~ RFD9999 - OIDC IdP for AWS RDS DBs discovery Mar 3, 2023

r0mant mentioned this pull request Mar 6, 2023

Support AWS organizations in IAM joining #22675

Open

ryanclark mentioned this pull request Mar 7, 2023

Teleport Discover new RDS flow #21222

Closed

r0mant reviewed Mar 8, 2023

View reviewed changes

r0mant requested a review from mdwn March 8, 2023 03:22

marcoandredinis changed the title ~~RFD9999 - OIDC IdP for AWS RDS DBs discovery~~ RFD 999 - Integrate with AWS using OIDC Mar 8, 2023

strideynet self-requested a review March 8, 2023 12:36

strideynet reviewed Mar 8, 2023

View reviewed changes

marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from f94fc24 to bb1e569 Compare March 8, 2023 16:59

marcoandredinis requested review from r0mant and ryanclark March 8, 2023 17:01

marcoandredinis requested a review from strideynet March 8, 2023 17:13

marcoandredinis marked this pull request as ready for review March 10, 2023 10:50

github-actions Bot requested review from rosstimothy and rudream March 10, 2023 10:51

github-actions Bot added rfd Request for Discussion size/md labels Mar 10, 2023

marcoandredinis removed request for rosstimothy and rudream March 10, 2023 11:00

r0mant reviewed Mar 13, 2023

View reviewed changes

strideynet reviewed Mar 14, 2023

View reviewed changes

mdwn reviewed Mar 14, 2023

View reviewed changes

marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from bb1e569 to 07b1cd3 Compare March 20, 2023 15:00

strideynet reviewed Mar 20, 2023

View reviewed changes

Comment thread rfd/9999-aws-api-integration-using-oidc.md Outdated

marcoandredinis changed the title ~~RFD 999 - Integrate with AWS using OIDC~~ RFD 0119 - Integrate with AWS using OIDC Apr 4, 2023

marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from 323adba to 79bf009 Compare April 4, 2023 17:16

r0mant reviewed Apr 4, 2023

View reviewed changes

klizhentas approved these changes Apr 5, 2023

View reviewed changes

marcoandredinis added 14 commits April 5, 2023 16:25

first draft

8202039

integration resource and new key for oidc

211d614

review pt1

c5a4fe9

review

75ed584

integration action

0ad5ad9

updated rbac rules to use the integration

36076d7

fix aws policy actions list

9ca517f

update aud claim to discover.teleport

b8f7491

add sec notes about thumbprint

4c1c1bd

add cluster alert for outdate thumbprint

dc6fc61

add integration status

e0fe667

add proto representation of integration

2903f9b

add rfd number

f6ef363

remove status from integration

036c2b7

marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from 79bf009 to 036c2b7 Compare April 5, 2023 15:27

marcoandredinis added this pull request to the merge queue Apr 5, 2023

marcoandredinis mentioned this pull request Apr 5, 2023

Integration resource: remove status #24121

Merged

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 5, 2023

marcoandredinis added this pull request to the merge queue Apr 5, 2023

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 5, 2023

marcoandredinis added this pull request to the merge queue Apr 5, 2023

Merged via the queue into master with commit 7426e59 Apr 5, 2023

marcoandredinis deleted the marco/rfd_discover_oidc_aws branch April 5, 2023 16:48

marcoandredinis mentioned this pull request Apr 5, 2023

Integrations: web API and tctl #24145

Merged


		After this step, when the user tries to list RDS DBs, Teleport generates a token and issues the API call.

		To generate a token, Teleport creates a JWT with the claims described above and the configured role, and signs it with the private key (the public is provided in the public JWKS endpoint).

Conversation

marcoandredinis commented Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r0mant left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

strideynet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcoandredinis commented Mar 8, 2023

Uh oh!

strideynet commented Mar 8, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

marcoandredinis commented Mar 2, 2023 •

edited

Loading

strideynet left a comment •

edited

Loading

strideynet Mar 14, 2023 •

edited

Loading