Skip to content

Conversation

@fmorg-git
Copy link
Contributor

Please describe your PR in detail:
Documents the design for STS in Ozone.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13857

How was this patch tested?

No tests needed for documentation

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @fmorg-git. Overall looks good, but we should add more details about how the Ranger and Native authorizers fit in with this.

Additionally, we need to choose how much development can/should happen on master vs. a feature branch. For Ranger integration within the OM, as long as it is not connected to a client facing API, that should be fine to go into the master branch if it helps iterate with Ranger devs quicker.

Once the endpoint to consume and process tokens is added, that should probably be in a feature branch. We try to keep master releasable and have had issues with security related features being partially developed on master in the past. See CVE-2024-45106.

@errose28 errose28 added design sts Changes for Ozone's S3 Security Token Service labels Nov 3, 2025
@fmorg-git
Copy link
Contributor Author

fmorg-git commented Nov 4, 2025

Thanks for writing this up @fmorg-git. Overall looks good, but we should add more details about how the Ranger and Native authorizers fit in with this.

Additionally, we need to choose how much development can/should happen on master vs. a feature branch. For Ranger integration within the OM, as long as it is not connected to a client facing API, that should be fine to go into the master branch if it helps iterate with Ranger devs quicker.

Once the endpoint to consume and process tokens is added, that should probably be in a feature branch. We try to keep master releasable and have had issues with security related features being partially developed on master in the past. See CVE-2024-45106.

hi @errose28, thanks for your comments. From the call today, it was decided:

  1. All code (even Ranger interface updates per @errose28) will go into a feature branch - @errose28 already created it called HDDS-13323-sts. The reason is to prevent (potential) security issues from getting out and causing issues before full end-to-test tests are run. @fmorg-git will need to remember to periodically ask a committer to rebase master so the feature branch doesn't get too far out of date. The design doc is fine to go to master.

  2. OzoneNativeAuthorizer is not a part of phase 1 and we can take a look at supporting it in a future phase. RangerOzoneAuthorizer has more alignment with IAM policy. Also OzoneNativeAuthorizer doesn't have roles.

  3. @errose28 is ok with having a table of revoked tokens as he doesn't see another way to support it currently and customers might not like if we have no way to revoke. Also he mentioned MinIO has revoke as well. Per discussion with @sodonnel, he is not against having table of revoked tokens.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc updates @fmorg-git. Just a few more minor comments.

Comment on lines +165 to +166
The format of this String is entirely up to the Ranger team. What is required from the Ozone side is to supply this String to Ranger when any
subsequent S3 API calls are made that use STS tokens. In order to achieve this, the sessionPolicy String from Ranger will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a Ranger PMC member to sign off on this doc as well to indicate we will be working together on this integration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent Slack messages and an email asking for Ranger input.

@peterxcli peterxcli self-requested a review November 7, 2025 16:24
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view we just need to sort out the port/server part of this here but otherwise looks good to me. Let's leave this open for now to make sure others get a chance to review and work through any questions/concerns.

The AWS IAM policy specification is vast and wide-ranging. The initial Ozone STS supports a limited
subset of its capabilities. The restrictions are outlined below:

- The only supported prefix in ResourceArn is `arn:aws:s3:::` - all others will be rejected. **Note**: a ResourceArn
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are both reject and silently ignored. Shall we unify the behavior to reject if operations and fields are not supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design behavior came after (much) discussion with external teams. One external team will send S3 actions that Ozone doesn't support, and they don't have flexibility to change what they are sending, so that's one reason for the silent ignore. This external team also mentioned that some research indicates AWS also does not fail the AssumeRole request just because the inline session policy references unknown or unsupported actions, but rather it will fail when the temporary credentials are used, so this design is accordance with it. Another external team agreed that behavior is fine for actions, but does not work for Conditions, because one can have a Condition to restrict calls by sourceIp, and if we silently ignore this, the client may incorrectly think the temporary credentials are restricted for use by that IP address, so consensus was to reject the request for that scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you update the doc about the background reason why some are rejected and some are silently ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


# 3. How Ozone STS Works

The initial implementation of Ozone STS supports only the [AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fmorg-git for preparing the doc.

The AssumeRole has a few input parameters, some are optional. Are they all supported? If not, could you clarify what parameters are supported, and what are not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated - I added the supported required and optional parameters

- The roleArn - the role used in the original AssumeRole call
- The encrypted secretAccessKey - this will be used to validate the AWS signature when the temporary credentials are used
to make S3 API calls
- (Optional) sessionPolicy - when using the RangerOzoneAuthorizer, if Ranger successfully authorizes the AssumeRole call,
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When both roleArn and sessionPolicy exist, the final decision made by Ranger is the intersection of roleArn and sessionPolicy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct - I added additional verbiage to hopefully make that more clear


## 3.5 STS Token Revocation

In the rare event temporary credentials need to be revoked (ex. for security reasons), a table in RocksDB will be created
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will OM be involved in the assumeRole API call and following S3 API calls using the sessionToken authorization check? Is it OM RocksDB here used, or other RocksDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, OM is involved in the assumeRole API and S3 API calls (OM is currently involved in S3 calls with permanent credentials). Yes, OM RocksDB is used - I added some verbiage around that.

storing temporary credentials server-side in Ozone, the sessionToken will comprise various components needed to validate
subsequent S3 calls that use the token. The sessionToken will have the following information encoded:

- The originalAccessKeyId - this is the Kerberos identity of the user that created the sessionToken via the AssumeRole call.
Copy link
Contributor

@ChenSammi ChenSammi Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why originalAccessKeyId is needed in sessionToken, In which step it will be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to add verbiage about why originalAccessKeyId is needed. I also updated the flow section to indicate where originalAccessKeyId is used.

This identity is included in the sessionToken because S3 API calls (such as PutObject) require a Kerberos identity, but the temporary credentials don't have a Kerberos identity associated to them, therefore the Kerberos identity of the user that created the token will be used in these cases.

- The expiration time of the token (via `ShortLivedTokenIdentifier#getExpiry()`)
- The UUID of the secret key used to sign the sessionToken and encrypt the secretAccessKey (via `ShortLivedTokenIdentifier#getSecretKeyId()`)
- expiration time of the token (via `ShortLivedTokenIdentifier#getExpiry()`)
- UUID of the OzoneManager private key used to sign the sessionToken and encrypt the secretAccessKey (via `ShortLivedTokenIdentifier#getSecretKeyId()`)
Copy link
Contributor

@ChenSammi ChenSammi Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous "secret key" is accurate. It's not OzoneManager private key. It's a symmentic key generated by SCM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@ChenSammi
Copy link
Contributor

Thanks @fmorg-git for the update. The last design doc looks good to me.

@fmorg-git
Copy link
Contributor Author

Thanks @ChenSammi . Hi @mneethiraj just a friendly reminder to review from Ranger side when you have a chance. Thanks!

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Dec 9, 2025
@errose28 errose28 removed the stale label Dec 9, 2025
@errose28
Copy link
Contributor

errose28 commented Dec 9, 2025

I think we are just waiting for a review from the Ranger side to merge this. @mneethiraj can you give this a look?

Similarly, an invalid S3 action will be silently ignored.
- Supported wildcard expansions in Actions are: `s3:*`, `s3:Get*`, `s3:Put*`, `s3:List*`,
`s3:Create*`, and `s3:Delete*`.
- If using OzoneNativeAuthorizer, bucket wildcards (ex. ResourceArn `arn:aws:s3:::*`, `arn:aws:s3:::bucket*` or `*`) will be rejected,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given STS is only available with Ranger authorizer (see #3 above), I suggest removing lines 93 to 97, to avoid potential misinterpretation that STS is supported with Ozone native authorizer as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

must be called to issue permanent S3 credentials. With these credentials, the AssumeRole API call can be made, but additional
steps below are needed for the call to be successfully authorized.
When using RangerOzoneAuthorizer, a role must be configured in Ranger UI for each role the AssumeRole API
can be used with. The user making the AssumeRole call must be in this role, and this role must have access

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user making the AssumeRole call must be in this role

This is not accurate.

I suggest replacing the sentence from 166 to 168 with the following:

Further, Apache Ranger policies should be in place to grant the user permission to assume the role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear design doc. I just had one question as commented below. LGTM otherwise.

Comment on lines +88 to +89
- If a (currently) unsupported S3 action is requested, such as `s3:GetAccelerateConfiguration`, it will be silently ignored.
Similarly, an invalid S3 action will be silently ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from being mentioned in the design doc, for a user, how can we make sure that this "silently ignored" is a known behaviour? Would user docs be the only way? (As this is a design doc, a simple user of ozone may not go through this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I believe this behavior will need to be included in user docs. Also, if IIRC from previous discussions with an external team, the silently ignored behavior is what happens in AWS as well, at least in the AssumeRole call. When the token is used, it should fail though. Please see #9223 (comment)

@fmorg-git
Copy link
Contributor Author

thanks @Tejaskriya . Hi @ChenSammi - the Ranger team has approved. Can you please review and merge if it looks ok? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

design sts Changes for Ozone's S3 Security Token Service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants