Skip to content

Conversation

@holdenk
Copy link

@holdenk holdenk commented Aug 19, 2021

Adds support for user specified content encoding for S3A in the option fs.s3a.content.encoding

…cts they are putting. This is useful for people loading the data into other tools in the AWS ecosystem which don't use file extensions to infer compression type (e.g. serving compressed files from S3 or importing into RDS)
Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks OK at a quick glance, tests nice too.

Hadoop 3.3.1+ serves up the headers in the getXAttr call ... you can use that to test without asking for the AWS S3 client.

regarding generic setting of arbitrary headers; something which I could imagine the createFile() builder letting you set, with a set of .opt(s3a.header.x-something, "value") builder options which would get picked up and passed in.

not done any implementation for that createFile call yet tho'

@holdenk
Copy link
Author

holdenk commented Aug 20, 2021

I need to re-run the tests after the changes but figured I'd push to see if I'm doing the right thing around getXattr since I haven't used it before.

@holdenk holdenk force-pushed the add-content-encoding-support-for-s3a branch from 32d4895 to bb6cdfe Compare August 20, 2021 20:20
@apache apache deleted a comment from hadoop-yetus Aug 21, 2021
@apache apache deleted a comment from hadoop-yetus Aug 26, 2021
@steveloughran
Copy link
Contributor

not compiling I'm afraid. other than that though, looks good

we need a test to verify that directory markers still are tagged as this.
Rather than add a new test, just patch ITestXAttrCost to set the content encoding before the dir.

Actually, looking at that existing test suite, I think you could tune those tests to check the content option too. The current tests look for the default content encoding...if in the test setup you set it to gzip, all the probes in that test could be changed to look for it and they would then be validating your code.

And then, your new suite only needs to cover new behaviours, like rename.

(Yes, I know, mixing things is a key forbidden aspect of JUnit tests, but I'm looking at saving $ as well as execution time.)

Copy link
Contributor

@mehakmeet mehakmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting patch and looks useful as well, I would have to check the AWS SDK code to better understand how encoding and decoding are actually happening based on the property being set or is it happening on the AWS side? 🤔
Seems like one of the imports was a static one, so it wasn't building, fix that, and yetus might be happy.


StoreContext storeContext = fs.createStoreContext();
String key = storeContext.pathToKey(path);
String encoding = fs.getXAttrs(XA_CONTENT_ENCODING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about how this test would work. File attributes would relate to a path, which would return Map<String, byte[]>. Then we would need to get XA_CONTENT_ENCODING's value from the map and decode the byte[] to get the actual value. I recently stumbled upon its use too, a little confusing to say the least, you can check ITestS3AClientSideEncryptionKms#assertEncrypted(path) where I used file attributes as well in a test for reference.

initCannedAcls(getConf());

// Any encoding type
String contentEncoding = getConf().get(CONTENT_ENCODING, DEFAULT_CONTENT_ENCODING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use getTrimmed()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if I pass any random value via the property? Should there be a check of any kind that verifies if the passed value is valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any content type is valid, even application/x-shockwave-flash. there's probably some rules but it's probably not trying to validate as it'll only create maintenance. This option should go into storediag for a bit of extra details...I'll add it once this patch is in

public class ITestS3AContentEncoding extends AbstractS3ATestBase {

private static final Logger LOG =
LoggerFactory.getLogger(ITestS3ACannedACLs.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger never used, and not the right class.

String key = storeContext.pathToKey(path);
String encoding = fs.getXAttrs(XA_CONTENT_ENCODING);
Assertions.assertThat(encoding)
.describedAs("Encoding of object %s is gzip", path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description here would represent what the error message would say in case of an assert failure. So, maybe something like "Mismatch in the encoding of object %s"?

assertObjectHasEncoding(path);
Path path2 = new Path(dir, "2");
fs.rename(path, path2);
assertObjectHasEncoding(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to assert here for path2?

if (StringUtils.isBlank(kmsKey) || !c.get(SERVER_SIDE_ENCRYPTION_ALGORITHM)
.equals(S3AEncryptionMethods.CSE_KMS.name())) {
String encryptionAlgorithm = c.get(SERVER_SIDE_ENCRYPTION_ALGORITHM);
if (kmsKey == null || StringUtils.isBlank(kmsKey) || encryptionAlgorithm == null ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as stated above.

private boolean requesterPays = false;

/** Content Encoding. */
private String contentEncoding = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to set null.

@steveloughran
Copy link
Contributor

@holdenk -we're looking at a 3.3.2 release very soon...can we get this in? There's only some minor nits left

@holdenk
Copy link
Author

holdenk commented Sep 10, 2021

Sure, sorry about that I've been moving and just sort of dropped all of my tasks on the floor in the same way the movers did with my computers :p But I've got some cycles this weekend I can hack on this now that my desktop boots again.

@steveloughran
Copy link
Contributor

Sure, sorry about that I've been moving and just sort of dropped all of my tasks on the floor in the same way the movers did with my computers

ouch

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 30m 46s trunk passed
+1 💚 compile 0m 46s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 41s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 48s trunk passed
+1 💚 javadoc 0m 27s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 14m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 36s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 2 new + 5 unchanged - 0 fixed = 7 total (was 5)
+1 💚 mvnsite 0m 37s the patch passed
+1 💚 javadoc 0m 17s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 10s the patch passed
+1 💚 shadedclient 14m 16s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
73m 47s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/4/artifact/out/Dockerfile
GITHUB PR #3312
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux d7cf0f727d8d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 939059b
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/4/testReport/
Max. process+thread count 542 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@holdenk
Copy link
Author

holdenk commented Sep 10, 2021

re-ran the content encoding tests since those changed during code review and they pass :)
For the refactoring the test into ITestXAttrCost I'm not super sure about that, it seems like I've got to make the file to test the rename anyways so I'm not sure we'd be saving many cycles by moving that check.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 37m 27s trunk passed
+1 💚 compile 0m 47s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 30s trunk passed
+1 💚 mvnsite 0m 47s trunk passed
+1 💚 javadoc 0m 24s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 11s trunk passed
+1 💚 shadedclient 14m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 35s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 31s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 20s /results-checkstyle-hadoop-tools_hadoop-aws.txt hadoop-tools/hadoop-aws: The patch generated 3 new + 5 unchanged - 0 fixed = 8 total (was 5)
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 16s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 13s the patch passed
+1 💚 shadedclient 14m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 21s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
81m 19s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/6/artifact/out/Dockerfile
GITHUB PR #3312
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 684500896bc2 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3d45dc7
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/6/testReport/
Max. process+thread count 675 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3312/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Sep 13, 2021
@apache apache deleted a comment from hadoop-yetus Sep 13, 2021
@steveloughran
Copy link
Contributor

@holdenk afraid other changes have broken the merge. Can you resync?

@dannycjones
Copy link
Contributor

With #3498 merged, should this be closed?

@steveloughran
Copy link
Contributor

you are right. will close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants