-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-6926. Add support for shaded protobufs used by hadoop-client/spark. #3915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(which is used by spark)
|
@jojochuang Here is my attempt at fixing the shaded protobuf problem. I'd like to try and get it into the 1.3 release. Would you mind taking a look? |
| <version>1.3.0-SNAPSHOT</version> | ||
| </parent> | ||
| <artifactId>ozone-filesystem-hadoop-client</artifactId> | ||
| <name>Apache Ozone FS Hadoop shaded 3.x compatibility</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this still be a hadoop3 only client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we include 3 in artifactId then?
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgeJahad for the patch. Tried with Spark 3.2.1 per your steps, it works fine.
| <version>1.3.0-SNAPSHOT</version> | ||
| </parent> | ||
| <artifactId>ozone-filesystem-hadoop-client</artifactId> | ||
| <name>Apache Ozone FS Hadoop shaded 3.x compatibility</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we include 3 in artifactId then?
@adoroszlai: |
| <artifactId>ozone</artifactId> | ||
| <version>1.3.0-SNAPSHOT</version> | ||
| </parent> | ||
| <artifactId>ozone-filesystem-hadoop3-client</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the documentation for why the name was selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerneltime Done. Please cherry-pick into the 1.3 branch when you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious~ how to know which jar is shaded and which isn't?
are they documented somewhere~?
| <resource>META-INF/BC1024KE.DSA</resource> | ||
| <resource>META-INF/BC2048KE.DSA</resource> | ||
| <resource>META-INF/BC1024KE.SF</resource> | ||
| <resource>META-INF/BC2048KE.SF</resource> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious~ what do BC1024KE.DSA and BC2048KE.SF mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that is used when shading jars to prevent the changes from causing security exceptions. We use it here as well:
ozone/hadoop-ozone/ozonefs-shaded/pom.xml
Lines 94 to 103 in 48e3b49
| <transformers> | |
| <transformer | |
| implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer"> | |
| <resources> | |
| <resource>META-INF/BC1024KE.DSA</resource> | |
| <resource>META-INF/BC2048KE.DSA</resource> | |
| <resource>META-INF/BC1024KE.SF</resource> | |
| <resource>META-INF/BC2048KE.SF</resource> | |
| </resources> | |
| </transformer> |
That is where I got it from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh!! thanks!!
|
Thanks for the merge/cherry-pick! @adoroszlai |
|
Hi @GeorgeJahad @adoroszlai @kerneltime, I made a comparison and found that the tarball of 1.2.1 before is less than 300M, while the current tarball of ozone-1.3.0 is 408M. I found that this PR introduces a 59MB ozone-filesystem-hadoop3-client-1.3.0.jar. I was wondering if we could move the shaded change of protobufs in this PR directly to ozone-filesystem-hadoop3-1.3.0.jar? Instead of introducing a new client jar. |
|
@captainzmc I don't think it's possible to have them in the same jar, as the same Ozone code cannot use shaded/unshaded version of protobuf. However, I think we can try to split up these two jars to avoid duplicating the common content. Slightly more inconvenient (having to add two jars to the classpath), but it may be worth it, if possible. |
|
Thanks to @adoroszlai for your explanation. However, I opened these two fat jars and found that they did not contain the above two packages. Did I get something wrong here? |
The difference is that usage of the protobuf library is shaded vs. not shaded. Protobuf classes are not part of either fat jar, they are provided by users (or the applications they use Ozone with). |
|
As @adoroszlai mentioned, I think the best solution is to try to split the jar files so they don't have so much duplication. I'll take a look when I get a chance. |
|
@GeorgeJahad @adoroszlai. Splitting the fat jar may be inconvenient for users. Let me contact the INFRA team first, to see if they can solve the upload size upper limit problem for Ozone. I see hadoop's tarball has reached 600+ MB, so it makes sense to increase Ozone's upper limit. |
|
The INFRA team has helped us resolve the Ozone tarball ceiling issue. See: https://issues.apache.org/jira/browse/INFRA-23892 |
|
That is good news @captainzmc! Thank you! |
Why not make the protobuf be part of the shaded client jar? |
That wouldn't help, as they are already a part of the hadoop jars that these ozone jars are designed to complement: hadoop-client-api-3.3.1.jar and hadoop-common-3.3.4.jar |
…ient/spark. (apache#3915)" This reverts commit 8dded6c. Conflicts: hadoop-ozone/dist/src/main/license/jar-report.txt hadoop-ozone/ozonefs-hadoop3-client/pom.xml Change-Id: Ibdab722cfd0d34aeb6f4f61b8b9d410353fdac9b
…adoop-client/spark. (apache#3915)"" This reverts commit ec588ee.


What changes were proposed in this pull request?
Hadoop distributes two versions of their hdfs client:
hadoop-client-api-3.3.1.jar and hadoop-common-3.3.4.jar
The first uses shaded protobufs, eg: org.apache.hadoop.shaded.com.google.protobuf.Message
The second unshaded protobufs, eg: com.google.protobuf.Message
Currently, Ozone only supports unshaded protobufs, (with ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar)
But projects like spark use shaded protobufs, (through hadoop-client-api-3.3.1.jar).
This PR adds the ozone-filesystem-hadoop3-client-1.3.0-SNAPSHOT.jar which is identical to the ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar, except that it uses the shaded protobufs, (so as to work with spark and other systems distributed with the hadoop-client jars.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6926
How was this patch tested?
I installed spark and confirmed reading ozone keys with the new jar, (using the instructions below). Note that the same instructions with the unshaded jar file, ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar, will cause the cast exception reported in the jira ticket.