-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HADOOP-18562]: S3A: support custom S3 and STS headers. #6550
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a reasonable start.
- move the spark jira to a hadoop one; this is a hadoop module change; component is fs/s3
- use
StringUtilsgetTrimmedStringCollectionSplitByEqual()
. If your code has stricter validation (which it seems to), add the validation to that method + a unit test.
read the "testing s3a" connector docs. Afraid you need to be set up test against a public or private S3 store.
Regarding tests for this, some unit tests should be possible.
How do the s3 and STS stores react to unknown headers? What do they support today? I'm curious
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
Can use HADOOP-18562 for the JIRA ID here; hadoop codebase see. thanks |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
however, as per testing.md, I do need to know which s3 region/endpoint you tested against, even if it is some minio instance on a raspberry pi. Its our way of ensuring you've done the integration tests. once merged the rest of us get to run those same tests in our configs, we quite often find the odd quirk; i'll add a test header to include coverage.
import java.time.Duration; | ||
import java.util.Arrays; | ||
|
||
import org.apache.hadoop.util.Lists; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to the org.apache.* section (we aren't as strict as spark and leave existing files mostly alone to avoid cherrypicking, but we try...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, as per testing.md, I do need to know which s3 region/endpoint you tested against, even if it is some minio instance on a raspberry pi. Its our way of ensuring you've done the integration tests. once merged the rest of us get to run those same tests in our configs, we quite often find the odd quirk; i'll add a test header to include coverage.
I have added us-west-2 as bucket region where I ran integ test in my PR description.
Below is my auth-keys.xml
<configuration>
<property>
<name>test.fs.s3a.name</name>
<value>**redacted**</value>
</property>
<property>
<name>fs.contract.test.fs.s3a</name>
<value>${test.fs.s3a.name}</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<description>AWS access key ID. Omit for IAM role-based authentication.</description>
<value>**redacted**</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<description>AWS secret key. Omit for IAM role-based authentication.</description>
<value>**redacted**</value>
</property>
<property>
<name>test.sts.endpoint</name>
<description>Specific endpoint to use for STS requests.</description>
<value>sts.amazonaws.com</value>
</property>
<property>
<name>fs.s3a.session.token</name>
<value>**redacted**</value>
</property>
<property>
<name>test.fs.s3a.sts.enabled</name>
<value>false</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.us-west-2.amazonaws.com</value>
</property>
</configuration>
* Examples | ||
* CustomHeader {@literal ->} 'Header1:Value1' | ||
* CustomHeaders {@literal ->} 'Header1=Value1:Value2,Header2=Value1' | ||
*/ | ||
public static final String CUSTOM_HEADERS_STS = | ||
"fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase() | ||
"fs.s3a.client." + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd pull this out as something like CUSTOM_HEADER_PREFIX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean something like:
public static final String FS_S3A_CUSTOM_HEADER_PREFIX = "fs.s3a.client.";
public static final String FS_S3A_CUSTOM_HEADER_POSTFIX = ".custom.headers";
public static final String CUSTOM_HEADERS_STS = FS_S3A_CUSTOM_HEADER_PREFIX + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase() +
FS_S3A_CUSTOM_HEADER_POSTFIX;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but without the double ".." that would produce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would not right? Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase()
is just STS.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestAwsClientConfig.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java
Outdated
Show resolved
Hide resolved
thanks. we didn't kneed to know all the secrets, just aws "us-west-2". something like tested |
got it, will keep in mind |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
@PradhanPrerak - Please fix the checkstyle issues as mentioned here (https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6550/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt) |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final premerge review of production code.
only serious issue is to use Locale.ROOT whenever doing case conversions. In some locales, e.g. turkey "I".toLowerCase() != "i".
@@ -775,6 +775,29 @@ private Constants() { | |||
"fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase() | |||
+ ".signing-algorithm"; | |||
|
|||
/** Prefix for S3A client-specific properties. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an {@value} tag for this and the strings below, for better IDE experience
* CustomHeaders {@literal ->} 'Header1=Value1:Value2,Header2=Value1' | ||
*/ | ||
public static final String CUSTOM_HEADERS_STS = | ||
FS_S3A_CLIENT_PREFIX + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- no
Constants
reference needed here or below - Use
toLowerCase(Locale. ROOT)
unless you want to field support calls from unusual locales
@@ -407,6 +414,36 @@ private static void initSigner(Configuration conf, | |||
} | |||
} | |||
|
|||
/** | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a title here
configKey = CUSTOM_HEADERS_STS; | ||
break; | ||
default: | ||
// Nothing to do. The original signer override is already setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is wrong. just say "no known service"
|
||
/** | ||
* List of custom headers to be set on the service client. | ||
* Multiple parameters can be used to specify custom headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value: {@value}
wouldn't render with toLowerCase(), added for other statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'm going to propose here doing all the next lines
in a single <pre> as there's enough formatting it wouldn't render right. you can then cut the {@literal} bits.
The goal here is generally "looks good on an IDE rendered view"
💔 -1 overall
This message was automatically generated. |
Map<String, String> awsClientCustomHeadersMap = | ||
S3AUtils.getTrimmedStringCollectionSplitByEquals(conf, configKey); | ||
awsClientCustomHeadersMap.forEach((header, valueString) -> { | ||
List<String> headerValues = Arrays.asList(valueString.split(";")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the largest change in this revision. Having ":" as value separator is not a good idea. Because,
1/ Some header names in aws start with aws:<header_name>
2/ Some values such as arns allow ":" in it, https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html#arns-paths
Hence changed delimiter to ";" which is not so common - https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines
let me know if you have better ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good.
yetus failure is in |
🎊 +1 overall
This message was automatically generated. |
@steveloughran Hoping to get your attention on this. We have a feature dependency on this change and was hoping to get a closure. I would really appreciate if you can help with the review closure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code all looks good. That leaves on little detail: documentation. Can you add a paragraph + example into the hadoop-aws docs?
thanks
|
||
/** | ||
* List of custom headers to be set on the service client. | ||
* Multiple parameters can be used to specify custom headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'm going to propose here doing all the next lines
in a single <pre> as there's enough formatting it wouldn't render right. you can then cut the {@literal} bits.
The goal here is generally "looks good on an IDE rendered view"
Map<String, String> awsClientCustomHeadersMap = | ||
S3AUtils.getTrimmedStringCollectionSplitByEquals(conf, configKey); | ||
awsClientCustomHeadersMap.forEach((header, valueString) -> { | ||
List<String> headerValues = Arrays.asList(valueString.split(";")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good.
Description of PR
Introducing new custom.headers config for each client type, Upon setting this header each s3/sts request will set these custom headers by setting overrideClientConfig on clients
How was this patch tested?
Ran integ tests using
mvn -Dparallel-tests -DtestsThreadCount=8 clean verify
for a bucket in us-west-2UTs added which succeeded as well
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?