Skip to content

Conversation

@jojochuang
Copy link
Owner

What changes were proposed in this pull request?

HDDS-13148. [Docs] Update Transparent Data Encryption doc.

Please describe your PR in detail:

  • Generated-by: Google Gemini 2.5 Pro (Preview) with the following prompt:
I want to update the current Ozone's Transparent Data Encryption
page https://ozone.apache.org/docs/edge/security/securingtde.html with the following instructions:

The Ozone TDE doc is written with the assumption that user is familiar with HDFS TDE, which may not be the case.

We should update the doc such that

(1) It does not require prior knowledge in HDFS TDE.

(2) Ozone can work with Hadoop KMS and Ranger KMS. We should mention Ranger KMS in the doc.

(3) For Ranger KMS, encryption key can also be managed by Ranger KMS management console or its REST API.

(4) hadoop key create enckey command has additional parameters: -size: specifies key bit length. Ozone supports 128 and 256 bits; -cipher: only AES/CTR/NoPadding (default) is supported as of now.

(5) Add reference to Transparent Encryption in HDFS: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html and Hadoop KMS doc: https://hadoop.apache.org/docs/r3.4.1/hadoop-kms/index.html

(6) For the section Using Transparent Data Encryption from S3G, we should mention Ozone does not support S3-SSE (Server-Side Encryption) or S3-CSE (Client-Side Encryption). That said, Ozone S3 buckets can be encrypted using Ranger/Hadoop KMS to provide the same guarantee as S3-SSE with client-supplied key (S3 SSE-C).

(7) For section KMS Authorization: provide examples.

Be succinct. Insert new text to the existing content, instead of rewriting everything.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13148

How was this patch tested?

User doc only update.

Change-Id: I6b8e7c50063aedc7862f2b4ab3ecdebb44bfc38e
Change-Id: Ia4c15cc0e04f31e63c8afd2b62c818fee66598d3
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jojochuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request. This PR, associated with HDDS-13148, focuses on significantly updating the documentation for Ozone's Transparent Data Encryption (TDE). The primary goal is to make the documentation more accessible to users who may not be familiar with HDFS TDE, provide more comprehensive details on KMS integration (including Ranger KMS), clarify key creation parameters, explain S3 Gateway interaction with TDE, and enhance the KMS authorization section with examples. The changes are primarily concentrated in the SecuringTDE.md documentation file.

Highlights

  • Improved TDE Documentation Clarity: The documentation has been updated to be less reliant on prior knowledge of HDFS TDE, making it easier for new users to understand Ozone's TDE setup and usage.
  • Ranger KMS Support Details: Explicitly mentions and provides guidance on using Ranger KMS with Ozone TDE, including managing keys via the Ranger console or API.
  • Detailed Key Creation Command: The hadoop key create command section now includes details on the -size (128/256 bits supported) and -cipher (AES/CTR/NoPadding) parameters.
  • S3 Gateway (S3G) TDE Usage Clarification: Explains how to use TDE with S3G, clarifies that Ozone does not support S3-SSE or S3-CSE directly, but notes that KMS encryption provides a similar guarantee to S3 SSE-C. Also updates the S3G proxy user requirements for secure and non-secure modes.
  • Enhanced KMS Authorization Examples: The KMS Authorization section has been rewritten to better explain authorization concepts for both Hadoop KMS (ACLs) and Ranger KMS (policies), including concrete examples of Ranger policies for different user roles (Ozone Manager, S3G proxy, administrators).
  • Added External References: Links to the HDFS Transparent Encryption and Hadoop KMS documentation have been added for users seeking more background information.

Changelog

Click here to see the changelog
  • hadoop-hdds/docs/content/feature/Topology.md
    • Removed a trailing newline character at the end of the file (line 108).
  • hadoop-hdds/docs/content/security/SecuringTDE.md
    • Rewrote the introduction to TDE, explaining its purpose and mentioning support for both Hadoop KMS and Ranger KMS (lines 28-30).
    • Restructured the 'Setting up the Key Management Server' section into 'Configuring TDE', providing numbered steps for setting up KMS (Hadoop/Ranger) and configuring Ozone's core-site.xml (lines 32-46).
    • Rewrote the 'Using Transparent Data Encryption' section into 'Creating an Encryption Key' and 'Creating an Encrypted Bucket' (lines 48-81).
    • Updated the hadoop key create command description to include optional -size and -cipher parameters and provided an example (lines 50-65).
    • Updated the ozone sh bucket create command description to use --key and provided an example (lines 69-79).
    • Minor formatting fix for code blocks in the S3G section (lines 113, 118).
    • Rewrote the S3G access explanation, clarifying the lack of S3-SSE/CSE support and the similarity to SSE-C with KMS (lines 85-86).
    • Updated the S3G proxy user requirements for secure and non-secure modes (lines 131-136).
    • Rewrote the 'KMS Authorization' section to explain authorization mechanisms for Hadoop KMS (ACLs) and Ranger KMS (policies) and provided detailed Ranger policy examples (lines 160-176).
    • Added a new 'Additional References' section with links to HDFS TDE and Hadoop KMS documentation (lines 178-181).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Docs updated with care,
KMS options now laid bare,
Encryption clear,
Dispelling fear,
For data resting there.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the Transparent Data Encryption (TDE) documentation for Ozone. The updated document is more self-contained, clearly explains integration with both Hadoop KMS and Ranger KMS, details command parameters, and provides much-needed clarification on S3 Gateway's TDE behavior and KMS authorization. The use of examples and links to further documentation is also very helpful. The changes largely align with the goals outlined in the JIRA ticket and the generation prompt.

While the update is a substantial improvement, there are a few points regarding configuration key accuracy and example clarity that should be addressed to ensure the documentation is precise and guides users correctly.

Summary of Findings

  • Accuracy of S3G Proxy User Configuration: The documentation refers to an ozone.s3g.proxy.user configuration key (line 134, SecuringTDE.md) which seems incorrect. This should likely refer to the S3G service principal.
  • Clarity of S3G Bucket Layout: The examples for creating S3G-accessible encrypted buckets (lines 92-101, SecuringTDE.md) removed the explicit --layout=OBJECT_STORE option. This could lead to confusion or misconfiguration, especially when linking buckets, as S3G works best with OBJECT_STORE layout.
  • Formatting/Typo in Configuration Key: A configuration key ozone.s3g.kerberos.principal is split across two lines (lines 138-139, SecuringTDE.md), which should be corrected.
  • Clarity of S3G Proxy User in Policy Examples: The Ranger policy example for the S3 Gateway proxy user (line 169, SecuringTDE.md) could be more specific about identifying this user.
  • Minor Formatting (Not Commented): A newline was added to the end of hadoop-hdds/docs/content/feature/Topology.md. This is a good practice but was not commented on due to low severity and review settings.
  • Minor Markdown Change (Not Commented): The code block specifier in hadoop-hdds/docs/content/security/SecuringTDE.md (Diff 2, lines 113 and 118 RIGHT) was changed slightly (indentation of backticks). This is a minor formatting adjustment and was not commented on due to low severity and review settings.

Merge Readiness

The pull request significantly improves the TDE documentation. However, there are a few high and medium severity issues identified in the review comments, primarily concerning the accuracy of a configuration key reference and clarity in S3G bucket creation examples. It is recommended that these points be addressed before merging to ensure the documentation is as accurate and helpful as possible. As the reviewer, I am not authorized to approve pull requests; please ensure further review and approval by authorized maintainers after addressing the feedback.

When accessing an S3G-enabled encrypted bucket:

* **Secure Mode (Kerberos enabled):**
The S3 Gateway proxy user (configured by `ozone.s3g.proxy.user`) must have permissions to decrypt the encryption key. This user also needs proxy user privileges for the end-user (e.g., be configured as a proxy user in `core-site.xml` for Hadoop’s proxy user mechanism).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The text mentions "The S3 Gateway proxy user (configured by ozone.s3g.proxy.user)". However, ozone.s3g.proxy.user does not appear to be a standard Ozone S3 Gateway configuration key.

Typically, in secure mode, the S3 Gateway service itself, identified by its Kerberos principal (e.g., configured via ozone.s3g.kerberos.principal), acts as the proxy user for KMS operations and for Hadoop's proxy user mechanism.

Could you please verify if ozone.s3g.proxy.user is an actual configuration key? If not, perhaps this should be rephrased to refer to the S3 Gateway's service principal (e.g., "The S3 Gateway service principal (configured by ozone.s3g.kerberos.principal)..." or similar)? This clarification would also apply to the Ranger policy example on line 169 for the "S3 Gateway proxy user".

Comment on lines 92 to 101
```shell
ozone sh bucket create --key <key_name> /s3v/<bucket_name>
```

```bash
ozone sh bucket create -k enckey --layout=OBJECT_STORE /vol/encryptedbucket
ozone sh bucket link /vol/encryptedbucket /s3v/linkencryptedbucket
```
2. **Alternatively, create an encrypted bucket elsewhere and link it:**

```shell
ozone sh bucket create --key <key_name> /<volume_name>/<bucket_name>
ozone sh bucket link /<volume_name>/<bucket_name> /s3v/<link_name>
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The examples for creating S3G-accessible encrypted buckets have removed the --layout=OBJECT_STORE option, which was present in the previous documentation.

While OBJECT_STORE is the default layout for S3 Gateway (ozone.s3g.default.bucket.layout), explicitly specifying it, or at least noting this default and its importance for S3 compatibility, could be beneficial.

Specifically for Option 2 (creating a bucket elsewhere and linking it):

ozone sh bucket create --key <key_name> /<volume_name>/<bucket_name>

If <volume_name> is not the S3 default volume (/s3v) and --layout is not specified, the bucket might default to FILE_SYSTEM_OPTIMIZED (FSO) if ozone.default.bucket.layout is FSO or empty. Using an FSO bucket with S3G might lead to compatibility issues, as FSO layout prioritizes HCFS compatibility over S3 compatibility (PrefixFSO.md notes: "Some irregular S3 key names may be rejected or normalized").

Could we consider re-adding --layout=OBJECT_STORE to these examples, particularly for the first step of Option 2, or adding a note clarifying the recommended layout for S3G usage to prevent potential issues?

Comment on lines 138 to 139
The below two configurations must be added to the kms-site.xml to allow the S3Gateway principal to act as a proxy for other users. In this example, "ozone.
s3g.kerberos.principal" is assumed to be "s3g"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a formatting issue or typo where ozone.s3g.kerberos.principal is split across two lines: "ozone.\ns3g.kerberos.principal".

Should this be corrected to appear on a single line for clarity?

Suggested change
The below two configurations must be added to the kms-site.xml to allow the S3Gateway principal to act as a proxy for other users. In this example, "ozone.
s3g.kerberos.principal" is assumed to be "s3g"
The below two configurations must be added to the kms-site.xml to allow the S3Gateway principal to act as a proxy for other users. In this example, "ozone.s3g.kerberos.principal" is assumed to be "s3g"

* **Policy for `om` user (or the user running the Ozone Manager):**
* Resource: `keyname=enckey`
* Permissions: `DECRYPT_EEK` (Decrypt Encrypted Encryption Key)
* **Policy for S3 Gateway proxy user (e.g., `s3g_proxy`):**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Following up on the comment for line 134, the example "Policy for S3 Gateway proxy user (e.g., s3g_proxy)" should clearly identify who this proxy user is.

If it's the S3 Gateway's service principal, it might be clearer to state that, for instance: "Policy for S3 Gateway service principal (e.g., the user specified in ozone.s3g.kerberos.principal)". This would provide more concrete guidance to users setting up these policies.

Change-Id: I59521eb995ac4d4323f9ddf02fd19d767a496e9b
Change-Id: Ifdc8ba98417966fea4d1f58fc01b8ccd5d8f8ce8
@jojochuang
Copy link
Owner Author

/gemini

@gemini-code-assist
Copy link

It looks like you're trying to invoke Gemini Code Assist, but a command is missing after /gemini.

You can use one of the following commands:

  • /gemini review: Performs a code review for the current pull request.
  • /gemini summary: Provides a summary of the current pull request.
  • /gemini help: Displays a list of available commands and usage instructions.

Please specify the command you'd like to use. For more details, you can refer to the help message by typing /gemini help.

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

@jojochuang
Copy link
Owner Author

/gemini review

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the Transparent Data Encryption (TDE) documentation for Ozone, making it more comprehensive, self-contained, and user-friendly. The changes successfully address the requirements outlined in HDDS-13148, such as removing the dependency on prior HDFS TDE knowledge, incorporating Ranger KMS, detailing hadoop key create parameters, adding S3G clarifications, and providing KMS authorization examples.

The updated document is well-structured and much clearer. The use of examples and explicit steps for configuration and usage is commendable. I have a few suggestions aimed at further improving clarity and accuracy in specific sections.

Summary of Findings

  • HadoopKMS Key Size Information: The documentation regarding supported key sizes for HadoopKMS could be more specific or provide clearer guidance for users to enhance understanding.
  • S3G Proxy User Context: The rationale behind S3G proxy user configurations in kms-site.xml could be made clearer by reintroducing a brief explanation of the S3G proxy mechanism, which was present in the previous version.
  • KMS Proxy User Configuration Descriptions: The descriptions for hadoop.kms.proxyuser.s3g.users and hadoop.kms.proxyuser.s3g.groups in kms-site.xml examples could be more precise, particularly regarding the meaning of the wildcard *.
  • File Ending Newline (Topology.md): A newline was added at the end of hadoop-hdds/docs/content/feature/Topology.md. This is a good practice for file consistency. (Not commented due to review settings for severity).
  • Code Block Formatting (SecuringTDE.md): A minor formatting fix was made to a bash code block specifier in hadoop-hdds/docs/content/security/SecuringTDE.md, improving markdown rendering. (Not commented due to review settings for severity).

Merge Readiness

The pull request represents a substantial improvement to the TDE documentation and is in good overall shape. The changes make the documentation much more accessible and informative.

I have identified a few medium-severity areas where clarity and accuracy could be further enhanced. I recommend addressing these suggestions to ensure the documentation is as robust and helpful as possible.

As a reviewer, I am not authorized to approve pull requests. However, after these suggested revisions are considered, I believe the changes will be in an excellent state for merging.

For this purpose on security enabled cluster, during S3Gateway server startup
logins using configured
**ozone.s3g.kerberos.keytab.file** and **ozone.s3g.kerberos.principal**.
When accessing an S3G-enabled encrypted bucket:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new introductory line for S3G configuration is concise. However, the previous version of the document (lines 115-120 on the LEFT side of the diff) contained a useful explanation of why S3G requires proxy user settings in kms-site.xml (detailing S3G's role as a proxy, its Kerberos principal, and how it accesses keys on behalf of users).

While the goal is to be succinct, would it be beneficial to re-integrate a brief explanation of this S3G proxy mechanism? This could be placed after the current line 132 and before the kms-site.xml configuration examples (line 134), providing valuable context for users on the necessity of these settings.

<description>
Here the value can be all the S3G accesskey ids accessing Ozone S3
Here the value can be all the S3G accesskey ids accessing Ozone S3
or set to '*' to allow all the accesskey ids.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the description for hadoop.kms.proxyuser.s3g.users, the phrase allow all the accesskey ids when * is used might be slightly ambiguous. Since this property pertains to users, would allow all S3G users or simply allow all users be clearer?

Suggested change
or set to '*' to allow all the accesskey ids.
or set to '*' to allow all S3G users.

<value>group1,group2,group3</value>
<description>
Here the value can be all the S3G accesskey groups accessing Ozone S3
or set to '*' to allow all the accesskey ids.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly, for hadoop.kms.proxyuser.s3g.groups, using allow all the accesskey ids when * is specified could be improved. Given this property is for groups, would allow all S3G user groups or allow all groups provide better clarity?

Suggested change
or set to '*' to allow all the accesskey ids.
or set to '*' to allow all S3G user groups.

jojochuang and others added 2 commits June 4, 2025 09:54
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Change-Id: I49b3ff2f24eb0544945f9fc8839c62da29170590
@jojochuang
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the Transparent Data Encryption (TDE) documentation for Ozone. The changes successfully address the objectives outlined in the PR description, such as making the document independent of prior HDFS TDE knowledge, incorporating Ranger KMS details, clarifying hadoop key create parameters, explaining S3G TDE behavior, and providing KMS authorization examples.

The restructuring and new content greatly improve clarity and completeness. The new introduction, separation of key/bucket creation steps, and the S3G SSE/CSE clarification are excellent. The KMS authorization examples for Ranger are also very helpful.

I have a couple of minor suggestions to further improve consistency and clarity.

Summary of Findings

  • Documentation Consistency: In SecuringTDE.md, terms like 'RangerKMS' and 'HadoopKMS' should be consistently written as 'Ranger KMS' and 'Hadoop KMS' (with a space) to match usage elsewhere in the document and improve readability.
  • Clarity on Ozone's Supported Key Sizes: In SecuringTDE.md, there's a slight ambiguity regarding Ozone's supported key sizes for TDE. The document mentions general AES key lengths (128, 192, 256 bits) while the original requirement stated Ozone supports 128 and 256 bits. Explicitly stating Ozone's specific limitations would prevent user confusion.
  • Minor Formatting (Not commented due to review settings): The newline addition at the end of hadoop-hdds/docs/content/feature/Topology.md is a good practice. Some minor wording/formatting suggestions for SecuringTDE.md (e.g., 'below' vs 'following', backticks for s3g principal) were noted but not commented on as they are low severity.

Merge Readiness

The pull request is in good shape and the documentation is much improved. Addressing the medium-severity comments regarding consistency and clarity on key sizes would make it even better. I am not authorized to approve pull requests, but once these points are considered, the changes would be a valuable addition. I recommend addressing these suggestions before merging.

Comment on lines 57 to 59
* **`-size <key_bit_length>` (Optional):** Specifies the key bit length. The default is 128 bits (defined by `hadoop.security.key.default.bitlength`).
RangerKMS supports both 128 and 256 bits. HadoopKMS is also commonly used with 128 and 256 bit keys; for specific version capabilities, consult the Hadoop KMS documentation. Valid AES key lengths are 128, 192, and 256 bits.
* **`-cipher <cipher_suite>` (Optional):** Specifies the cipher suite. Currently, only **`AES/CTR/NoPadding`** (the default) is supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PR description (point 4 from the Gemini prompt) mentioned that for the -size parameter, 'Ozone supports 128 and 256 bits'. The current documentation text states, 'Valid AES key lengths are 128, 192, and 256 bits,' after discussing KMS support. This might create slight ambiguity, potentially implying Ozone could use 192-bit keys if the underlying KMS supports it.

Could we explicitly clarify Ozone's specific supported key lengths here to align with the initial requirement and avoid potential user confusion? For instance, if Ozone is strictly limited to 128 and 256 bits, it might be beneficial to state that directly.

A possible phrasing, if Ozone indeed only supports 128/256, could be:
"Ozone supports 128 and 256 bit key lengths. Ranger KMS and Hadoop KMS are commonly used with these lengths. While AES itself supports 128, 192, and 256 bit keys, ensure your chosen length is supported by both Ozone and your KMS."

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

jojochuang and others added 2 commits June 4, 2025 10:18
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Change-Id: Iee100e57085e7873d85d91254d7bc05c9c91cb1c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants