Skip to content

Conversation

@GeorgeJahad
Copy link
Contributor

@GeorgeJahad GeorgeJahad commented Oct 31, 2022

What changes were proposed in this pull request?

Hadoop distributes two versions of their hdfs client:

hadoop-client-api-3.3.1.jar and hadoop-common-3.3.4.jar

The first uses shaded protobufs, eg: org.apache.hadoop.shaded.com.google.protobuf.Message

The second unshaded protobufs, eg: com.google.protobuf.Message

Currently, Ozone only supports unshaded protobufs, (with ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar)

But projects like spark use shaded protobufs, (through hadoop-client-api-3.3.1.jar).

This PR adds the ozone-filesystem-hadoop3-client-1.3.0-SNAPSHOT.jar which is identical to the ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar, except that it uses the shaded protobufs, (so as to work with spark and other systems distributed with the hadoop-client jars.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6926

How was this patch tested?

I installed spark and confirmed reading ozone keys with the new jar, (using the instructions below). Note that the same instructions with the unshaded jar file, ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar, will cause the cast exception reported in the jira ticket.

# install spark
<Download https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz>
cd $OZONE_ROOT/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT
mkdir spark
cd spark
tar -xzf ~/Downloads/spark-3.2.1-bin-hadoop3.2.tgz

# copy over the shaded jar file
cp   $OZONE_ROOT/hadoop-ozone/ozonefs-hadoop3-client/target/ozone-filesystem-hadoop3-client-1.3.0-SNAPSHOT.jar $OZONE_ROOT/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/spark/spark-3.2.1-bin-hadoop3.2/jars


# start up docker cluster
cd $OZONE_ROOT/hadoop-ozone/dist/target/ozone-1.3.0-SNAPSHOT/compose/ozone
docker-compose up --no-recreate --scale datanode=3 -d

# init docker cluster
docker exec -it ozone_om_1 bash
cd /opt/hadoop/spark/spark-3.2.1-bin-hadoop3.2/conf
cp /etc/hadoop/ozone-site.xml .
cd /opt/hadoop/spark/spark-3.2.1-bin-hadoop3.2/bin


# init vol/bucket/key
ozone sh volume create testgbj2
ozone sh bucket create testgbj2/bucket1
echo k1 > k1.orig
ozone sh key put testgbj2/bucket1/k1 k1.orig

# read ozone from spark
./spark-shell
sc.setLogLevel("DEBUG")
spark.read.text("ofs://om/testgbj2/bucket1/k1").show()

@GeorgeJahad
Copy link
Contributor Author

@jojochuang Here is my attempt at fixing the shaded protobuf problem. I'd like to try and get it into the 1.3 release. Would you mind taking a look?

<version>1.3.0-SNAPSHOT</version>
</parent>
<artifactId>ozone-filesystem-hadoop-client</artifactId>
<name>Apache Ozone FS Hadoop shaded 3.x compatibility</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this still be a hadoop3 only client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we include 3 in artifactId then?

@kerneltime
Copy link
Contributor

cc @tanvipenumudy

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgeJahad for the patch. Tried with Spark 3.2.1 per your steps, it works fine.

<version>1.3.0-SNAPSHOT</version>
</parent>
<artifactId>ozone-filesystem-hadoop-client</artifactId>
<name>Apache Ozone FS Hadoop shaded 3.x compatibility</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we include 3 in artifactId then?

@GeorgeJahad
Copy link
Contributor Author

Shouldn't we include 3 in artifactId then?

@adoroszlai:
Done. I'd like to get this in the 1.3 release. Would you mind merging/cherry-picking it for me?

<artifactId>ozone</artifactId>
<version>1.3.0-SNAPSHOT</version>
</parent>
<artifactId>ozone-filesystem-hadoop3-client</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the documentation for why the name was selected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerneltime Done. Please cherry-pick into the 1.3 branch when you can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious~ how to know which jar is shaded and which isn't?
are they documented somewhere~?

<resource>META-INF/BC1024KE.DSA</resource>
<resource>META-INF/BC2048KE.DSA</resource>
<resource>META-INF/BC1024KE.SF</resource>
<resource>META-INF/BC2048KE.SF</resource>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious~ what do BC1024KE.DSA and BC2048KE.SF mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that is used when shading jars to prevent the changes from causing security exceptions. We use it here as well:

<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
<resources>
<resource>META-INF/BC1024KE.DSA</resource>
<resource>META-INF/BC2048KE.DSA</resource>
<resource>META-INF/BC1024KE.SF</resource>
<resource>META-INF/BC2048KE.SF</resource>
</resources>
</transformer>

That is where I got it from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh!! thanks!!

@adoroszlai adoroszlai merged commit 8dded6c into apache:master Nov 1, 2022
adoroszlai pushed a commit that referenced this pull request Nov 1, 2022
@GeorgeJahad
Copy link
Contributor Author

Thanks for the merge/cherry-pick! @adoroszlai

@captainzmc
Copy link
Member

captainzmc commented Nov 10, 2022

Hi @GeorgeJahad @adoroszlai @kerneltime,
Recently I was preparing the package ozone-1.3 rc0. When I uploaded the tarball to apache svn server, I found the uploading error. The Apache servers currently have a limit of 350mb for release artifacts.

I made a comparison and found that the tarball of 1.2.1 before is less than 300M, while the current tarball of ozone-1.3.0 is 408M. I found that this PR introduces a 59MB ozone-filesystem-hadoop3-client-1.3.0.jar. I was wondering if we could move the shaded change of protobufs in this PR directly to ozone-filesystem-hadoop3-1.3.0.jar? Instead of introducing a new client jar.
image

@adoroszlai
Copy link
Contributor

@captainzmc I don't think it's possible to have them in the same jar, as the same Ozone code cannot use shaded/unshaded version of protobuf. However, I think we can try to split up these two jars to avoid duplicating the common content. Slightly more inconvenient (having to add two jars to the classpath), but it may be worth it, if possible.

@captainzmc
Copy link
Member

Thanks to @adoroszlai for your explanation.
hi, @GeorgeJahad, I have another question. My understanding of the new client.jar should be the same as the original except for shaded protobufs. So:
ozone-filesystem-hadoop3-1.3.0-SNAPSHOT.jar should contain package "com.google.protobuf"
ozone-filesystem-hadoop3-client-1.3.0-SNAPSHOT.jar should contain package "org.apache.hadoop.shaded.com.google.protobuf"

However, I opened these two fat jars and found that they did not contain the above two packages. Did I get something wrong here?

image

@adoroszlai
Copy link
Contributor

My understanding of the new client.jar should be the same as the original except for shaded protobufs
However, I opened these two fat jars and found that they did not contain the above two packages.

The difference is that usage of the protobuf library is shaded vs. not shaded. Protobuf classes are not part of either fat jar, they are provided by users (or the applications they use Ozone with).

@GeorgeJahad
Copy link
Contributor Author

As @adoroszlai mentioned, I think the best solution is to try to split the jar files so they don't have so much duplication. I'll take a look when I get a chance.

@captainzmc
Copy link
Member

captainzmc commented Nov 11, 2022

@GeorgeJahad @adoroszlai. Splitting the fat jar may be inconvenient for users. Let me contact the INFRA team first, to see if they can solve the upload size upper limit problem for Ozone. I see hadoop's tarball has reached 600+ MB, so it makes sense to increase Ozone's upper limit.
I opened jira at https://issues.apache.org/jira/browse/INFRA-23892.

@captainzmc
Copy link
Member

The INFRA team has helped us resolve the Ozone tarball ceiling issue. See: https://issues.apache.org/jira/browse/INFRA-23892

@GeorgeJahad
Copy link
Contributor Author

That is good news @captainzmc! Thank you!

@kerneltime
Copy link
Contributor

My understanding of the new client.jar should be the same as the original except for shaded protobufs
However, I opened these two fat jars and found that they did not contain the above two packages.

The difference is that usage of the protobuf library is shaded vs. not shaded. Protobuf classes are not part of either fat jar, they are provided by users (or the applications they use Ozone with).

Why not make the protobuf be part of the shaded client jar?

@GeorgeJahad
Copy link
Contributor Author

Why not make the protobuf be part of the shaded client jar?

That wouldn't help, as they are already a part of the hadoop jars that these ozone jars are designed to complement: hadoop-client-api-3.3.1.jar and hadoop-common-3.3.4.jar

jojochuang added a commit to jojochuang/ozone that referenced this pull request Nov 8, 2024
…ient/spark. (apache#3915)"

This reverts commit 8dded6c.

 Conflicts:
	hadoop-ozone/dist/src/main/license/jar-report.txt
	hadoop-ozone/ozonefs-hadoop3-client/pom.xml

Change-Id: Ibdab722cfd0d34aeb6f4f61b8b9d410353fdac9b
adoroszlai added a commit to jojochuang/ozone that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants