Skip to content

RFD 0119 - Integrate with AWS using OIDC#22556

Merged
marcoandredinis merged 14 commits intomasterfrom
marco/rfd_discover_oidc_aws
Apr 5, 2023
Merged

RFD 0119 - Integrate with AWS using OIDC#22556
marcoandredinis merged 14 commits intomasterfrom
marco/rfd_discover_oidc_aws

Conversation

@marcoandredinis
Copy link
Copy Markdown
Contributor

@marcoandredinis marcoandredinis commented Mar 2, 2023

Rendered

Part of #22129

@marcoandredinis marcoandredinis changed the title first draft RFD9999 - OIDC IdP for AWS resource discovery Mar 2, 2023
@marcoandredinis marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch 3 times, most recently from b2bcaa9 to f94fc24 Compare March 3, 2023 17:22
@marcoandredinis marcoandredinis changed the title RFD9999 - OIDC IdP for AWS resource discovery RFD9999 - OIDC IdP for AWS RDS DBs discovery Mar 3, 2023
Copy link
Copy Markdown
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. @mdwn Could you take a look as well when you get a chance, in particular at OIDC-related bits?

Comment on lines +14 to +20
## What

Discover AWS RDS Databases without manual configuration.

#### Goals

Easily discover AWS RDS Databases of an account.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the goal to just have a generic way to create an AWS integration in Teleport? RDS discovery/configuration would be just one of potential users of this integration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was targeting something more specific to avoid the "where are we going to use this"

Let me rephrase the goal and RFD title to "Integrate with AWS using OIDC"

Comment on lines +40 to +43
Our current discovery process for AWS resources requires multiple steps before the user gets to the "aha moment".
Some of those steps are a little complicated and require the user to know some Teleport specific internals.

We can reduce the number of steps and remove most of the Teleport specific configuration by creating an OIDC IdP in Teleport.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific: instead of saying "some steps", "a little complicated" and "some specific internals", say that IAM setup/configuration is the challenge and that using AWS OIDC integration will allow Teleport proxy/auth services to gain some limited privileges in users' AWS accounts so simplify these steps. And that it will work with Cloud.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Mentioning cloud here seems out of place 🤔

I'll add it to the goals instead, if that's ok


```yaml
kind: integration
subkind: aws-oidc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need -oidc suffix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to leave space/pattern to have other types of AWS integrations (using secrets or using an ec2 instance).

Should I remove the -oidc anyway?

Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
Comment thread rfd/9999-oidc-idp-for-aws-rds-dbs-discovery.md Outdated
#### AWS Calls from unauthorized Teleport Users
We'll be able to call `DescribeDBInstances` AWS endpoint, however we must ensure that this is protected by RBAC.

In order to do so, we'll re-use the `db` resource with the `list` verb to allow listing RDS Databases.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not very clear tbh. Can you explain what we're trying to protect against here? How can unauthorized Teleport users can leverage AWS integration? Reusing existing database RBAC rules for AWS integration seems strange tbh.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picture the state where we have an Integration with AWS.
We'll have an HTTP endpoint in Proxy that calls the AWS API Describe DB Instance.
We'll have a new resource Integration, and subsequent rules (create, delete, list, update), but that only works for CRUD operations over the resource itself.

Should any user be able to list the AWS RDS DBs? Or do we want some kind of check before calling the AWS Endpoint?
I'm overloading the db.list rule with something else not entirely relevant.
However, I think we should add some check before calling the AWS Endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have a new verb: use
If the user has integration.use then they can issue Integration API Calls

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have a new verb: use If the user has integration.use then they can issue Integration API Calls

Apologies for a naive, after-the-merge question, but what is stopping us from using one of the existing verbs like "read" for this? Does it have to do with the part where you mention rules working only for CRUD operations over the resource itself?

We have managed to avoid adding a verb like "use" for resources where one could theoretically make use of it as well ("using" a db server or a node) so I'm wondering how this situation is different.

If it's related to the POST .../integration/:id/action/aws-oidc-list-databases endpoint mentioned in the RFD, then we could even argue that describing access to call an integration action could be made through checking the "create" verb for a new resource called "integration action". OTOH I've seen someone mention that it's hard to implement those checks for "virtual" resources that don't actually exist on the backend.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some discussion here, which provide some context
#24256 (comment)

We have managed to avoid adding a verb like "use" for resources where one could theoretically make use of it as well ("using" a db server or a node) so I'm wondering how this situation is different.

They are protected by <resource>_labels, so we could also use that here.
We discussed that approach but ended deciding on using the VerbUse

As for creating a virtual resource integration action, I don't think it was considered.
I'm not aware of any issues of implementing resources without a backend. But I didn't try it.

I think it could be an alternative, but don't actually see a big disadvantage in using the VerbUse

@r0mant r0mant requested a review from mdwn March 8, 2023 03:22
@marcoandredinis marcoandredinis changed the title RFD9999 - OIDC IdP for AWS RDS DBs discovery RFD 999 - Integrate with AWS using OIDC Mar 8, 2023
@strideynet strideynet self-requested a review March 8, 2023 12:36
Copy link
Copy Markdown
Contributor

@strideynet strideynet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited for an OIDC IdP starting to be added to Teleport. This will eventually unlock the ability for Teleport to issue ID tokens to processes running on hosts with Machine ID installed in order for them to be able to access cloud APIs.

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

sub: system:proxy:discover

This allows the configured trusted relationship to be explicitly for the discover flow.

- JWKS URI: the endpoint where the provider returns the public keys.
- Claims Supported: a list of supported claims that the OpenID Provider (ie, Teleport) MAY be able to supply values for when issuing a token:
- `iss`: the issuer identity, on our case it will be the same as the `issuer` key.
- `sub`: identifies the subject of the token, on our case it will contain the user's name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to prefix with some string to identify that the token has been issued to a user ? I can foresee use-cases in future where we may want the teleport system itself to be capable of talking to an API using OIDC for authentication without a user action triggering it.

Perhaps, for a user: user:noah.stride@goteleport.com and for a system service something like system:proxy ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, seems fair 👍

@marcoandredinis marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from f94fc24 to bb1e569 Compare March 8, 2023 16:59
@marcoandredinis
Copy link
Copy Markdown
Contributor Author

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

Proxy is making the call but the user was the one who triggered it 🤔
We can set the sub to be the system (proxy/discover) but I think we should also somehow identify the user.
We can always add another claim 🤔

@strideynet
Copy link
Copy Markdown
Contributor

One thing I would consider would be actually issuing the token in the name of the Teleport proxy rather than the user for the purposes of AWS RDS DB discovery - especially if the Proxy itself is going to actually be making the call. This accounts for a scenario in future where Teleport is also issuing tokens directly to users, but they do not want to grant the ability to all their users the ability to list RDS DBs. For example:

Proxy is making the call but the user was the one who triggered it 🤔 We can set the sub to be the system (proxy/discover) but I think we should also somehow identify the user. We can always add another claim 🤔

Perhaps a "on-behalf-of" claim or similar ? I'm unsure, but I do think that identifying the proxy discover subsystem lets us craft a much more specific trust rule in AWS.

@marcoandredinis marcoandredinis marked this pull request as ready for review March 10, 2023 10:50
@github-actions github-actions Bot requested review from rosstimothy and rudream March 10, 2023 10:51
@github-actions github-actions Bot added rfd Request for Discussion size/md labels Mar 10, 2023
Comment thread rfd/9999-aws-api-integration-using-oidc.md Outdated

After this step, when the user tries to list RDS DBs, Teleport generates a token and issues the API call.

To generate a token, Teleport creates a JWT with the claims described above and the configured role, and signs it with the private key (the public is provided in the public JWKS endpoint).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JWT with the claims described above

What are these claims? I don't see any claims being described above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are below and not above, sorry.
Replaced with below with a markdown link to it.

The first interaction between AWS and Teleport happens when the user clicks on `Get Thumbprint` (step 4 of the User's Point of View flow).
This will trigger a request started by AWS to Teleport hitting the OpenID Configuration endpoint and subsequently the JSON Web Key Set endpoint.

After this step, when the user tries to list RDS DBs, Teleport generates a token and issues the API call.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have multiple AWS integrations e.g. for different accounts? How do we pick which one to use?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, multiple AWS Integrations must be supported.
I'll add that.

I'll also add that the user must be able to select which AWS Integration to use.

$ tctl create aws-integration.yaml
```

Resource representation as `yaml`:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a section covering IaC flows to the RFD as well? Are there any changes needed to Terraform provider, Kube operator and Helm charts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍

Comment on lines +335 to +482
We'll be able to call `DescribeDBInstances` AWS endpoint, however we must ensure that this is protected by RBAC.

In order to do so, we'll re-use the `db` resource with the `list` verb to allow listing RDS Databases.

```yaml
kind: role
metadata:
name: editor
spec:
allow:
rules:
- resources:
- db
verbs:
- list
```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get this tbh. In your comment you said:

We'll have an HTTP endpoint in Proxy that calls the AWS API Describe DB Instance.

What endpoint is this and why would it be unauthenticated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still authentication, however my concern is authorization.

Let's say we have a UserA that has editor role and is able to List AWS RDS DBs.

However, there's also UserB that has access role.
Should UserB be allowed to call the List AWS RDS DBs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more details on Listing AWS RDS DBs
83c735b

Let me know if that helps in understanding my concern

Comment thread rfd/9999-aws-api-integration-using-oidc.md
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"proxy.example.com:aud": "sts.amazonaws.com"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used an ID token subject that was specific to the Teleport Discovery process then we could specify this here to ensure that this role can only be used by that process and not by other entities which have been issued tokens.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

            "Condition": {
                "StringEquals": {
                    "proxy.example.com:aud": "sts.amazonaws.com",
                    "proxy.example.com:sub": "system:proxy:discover",

Assuming we issue tokens whose sub is system:proxy:discover.

What would be the pros of this approach?
Allow for multiple roles to be associated with the same provider but having different policies.

If we are strict here, then we would need users to create extra roles in order to have multiple Teleport "systems" interacting with it.

Honestly, I think we should not add that constraint to not increase scope.
We can always add that condition later on if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that.

My main "pro" for it is that when we start issuing tokens from the Teleport OIDC provider to users or machines, it prevents them from assuming this role. I know we can change this recommendation in future, but it'll be up to operators to actually go back and update this IAM role configuration.

This isn't a hard requirement from me, I'm happy to proceed as is but I just think we need to be aware that we'll be issuing these ID tokens to things other than the Teleport system in the near future.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid this as it requires further setup to the trust relationship (which is created automatically in the IdP setup flow) with not much added benefit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, I can definitely see the concern from the UX point of view.

But, I really think as part of the RFD we need to acknowledge that we are opening up access to list all RDS DBs to any machine or user in future if an operator misses an update advisory and does not modify their existing trust relationships created before we issued the change in advice. Whilst this is certainly a low privilege action, one concern I have is that the design from this RfD is used with another feature that requires more privileged permissions and because this wasn't openly acknowledged in this RfD, that this will be missed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add

 "StringEquals": {
                    "proxy.example.com:aud": "sts.amazonaws.com",
                    "proxy.example.com:sub": "system:proxy",

And if we ever need to evolve that, we can just add another suffix to the sub.
What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strideynet I might be misinterpreting your concern, let's chat on Slack if I'm off -

But, I really think as part of the RFD we need to acknowledge that we are opening up access to list all RDS DBs to any machine or user in future if an operator misses an update advisory and does not modify their existing trust relationships created before we issued the change in advice

Hmm, I don't see this happening - the AWS provider is a one-time setup with an IAM policy, that grants the Teleport auth service access to AWS, which can only be accessed through the AWS integration. This mechanism would be the only thing that sets the sub when signing the OIDC JWT, so I don't see the advantage in adding a condition that's always going to evaluate to true

Whilst this is certainly a low privilege action, one concern I have is that the design from this RfD is used with another feature that requires more privileged permissions and because this wasn't openly acknowledged in this RfD, that this will be missed.

Like above, there's only one feature that's going to use this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoandredinis - @strideynet and I had a chat last night about this section and came to the compromise of setting the audience to something other than sts.amazonaws.com - maybe discover.teleport or discover.proxy-url

As per the AWS docs the audience is meant to be the client ID of the application, so setting it to something Discover related is more suitable anyway.

This means we can avoid setting a subject as we're not going to conflict with any future developments of the IdP provider (they'll be requesting a different audience).

Copy link
Copy Markdown
Contributor

@mdwn mdwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specifically call out which auth flows we support for this OIDC. One thing I ran into while experimenting with this is back channel vs. front channel auth. Back channel meaning that there has to be direct communication between the IdP and the receiving service, which may or may not be the case depending on a user setup.

I mention this because Okta in particular is opinionated about the use of the id_token flow, which could be a problem when generalizing this.

Handy reference; https://www.scottbrady91.com/openid-connect/openid-connect-flows

Comment thread rfd/9999-aws-api-integration-using-oidc.md
3. Configures the Teleport instance as an OIDC IdP:
1. Clicks `Add provider` of type `OpenID Connect`.
2. Fills in the Provider URL as provided by the Discover Wizard.
3. Fills in the Audience as provided by the Discover Wizard `sts.amazonaws.com`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the audience have to be sts.amazonaws.com? We could skip this step if we had a fixed audience as part of the OIDC IdP in Teleport.

Copy link
Copy Markdown
Contributor

@strideynet strideynet Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst AWS doesn't enforce it, and I'd argue they should, sts.amazonaws.com definitely makes the most semantic sense for the value here as that's the host of the AWS service that allows the OIDC ID token to be swapped for AWS credentials. I imagine they don't enforce it universally as not too long ago some CI/CD providers did not allow you to customise the audience of the ID tokens issued to workloads.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the audience to reflect the client
Per aws docs

The client ID (also known as audience) is a unique identifier for your app that is issued to you when you register your app with the IdP

So, the audience is going to be discover.teleport

@marcoandredinis marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from bb1e569 to 07b1cd3 Compare March 20, 2023 15:00
Comment thread rfd/9999-aws-api-integration-using-oidc.md Outdated
@marcoandredinis marcoandredinis changed the title RFD 999 - Integrate with AWS using OIDC RFD 0119 - Integrate with AWS using OIDC Apr 4, 2023
@marcoandredinis marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from 323adba to 79bf009 Compare April 4, 2023 17:16
Comment on lines 85 to 96
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this question after reviewing #23989. What's the use-case for having different statuses? I.e. why not just create the integration in the "running" state? And what's the scenario in which it will be placed in "error" state and how to recover from it?

Just wondering if it's really needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration may become unhealthy (OIDC IdP was removed from AWS, Role doesn't allow for required api calls, thumbprint changed).
In that case, there will be a Cluster Alert and the Integration will switch to Error Status.

The Paused status is to allow the user to disable the Integration but not removing it.
Maybe that's something we can work later on, and is not really needed for this first iteration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the proxy rotates its SSL certificate the integration will also become unhealthy as AWS saves and compares the fingerprint of the SSL certificate on setup

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thumbprint*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite that drastic, it actually uses the fingerprint of the TLS certificate's CA. So you can rotate certificates as long as it's issued by the same CA.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood. Cheers

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoandredinis I understand the general intent but tbh it just raises more questions: how would Teleport detect that it's unhealthy and why, and whether it's a permanent or temporary unhealthiness, and would we detect it's healthy after being unhealthy, and what does the user need to do in that case? And so on.

We did have some ideas (or rather, "aspirations") for upstream resource health checks (e.g. #20544) but it feels like it's a large enough problem that would require a separate RFD and consistent approach we can use for all resources.

I would cut this from the scope of this RFD and implementation for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair 👍
I'll remove the status from the RFD and implementation (backport to V12 is not merged yet, so it will wait to include this removal)

@klizhentas
Copy link
Copy Markdown
Contributor

@r0mant I'd like to emphasize that I think this flow will work great for smaller teams that just start with their AWS account.

We must have a parallel IAC terraform flow for more mature teams that would like to configure everything without magic declaratively.

@marcoandredinis marcoandredinis force-pushed the marco/rfd_discover_oidc_aws branch from 79bf009 to 036c2b7 Compare April 5, 2023 15:27
@marcoandredinis marcoandredinis added this pull request to the merge queue Apr 5, 2023
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 5, 2023
@marcoandredinis marcoandredinis added this pull request to the merge queue Apr 5, 2023
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 5, 2023
@marcoandredinis marcoandredinis added this pull request to the merge queue Apr 5, 2023
Merged via the queue into master with commit 7426e59 Apr 5, 2023
@marcoandredinis marcoandredinis deleted the marco/rfd_discover_oidc_aws branch April 5, 2023 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discover Issues related to Teleport Discover rfd Request for Discussion size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants