Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented May 12, 2023

What changes were proposed in this pull request?

Spark does not use protobuf 2.5.0 directly, instead, it comes from other dependencies, with the following changes, now, Spark does not require protobuf 2.5.0 (please let me know if I miss something),

  • SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a shaded protobuf 3
  • SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded client, also removed the protobuf 2 dependency. SPARK-42452 removed the support for Hadoop 2.
  • SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by the kinesis client, into the kinesis assembly jar
  • Spark itself's core/connect/protobuf modules use protobuf 3, also shaded and relocated all protobuf 3 deps.

Why are the changes needed?

Remove the obsolete dependency, which is EOL long ago, and has CVEs CVE-2022-3510 CVE-2022-3509 CVE-2022-3171 CVE-2021-22569

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA.

@pan3793 pan3793 changed the title [DRAFT] Remove protobuf 2.5.0 [SPARK-43489][BUILD] Remove protobuf 2.5.0 May 12, 2023
@pan3793 pan3793 marked this pull request as ready for review May 12, 2023 17:08
@pan3793
Copy link
Member Author

pan3793 commented May 12, 2023

<useSubDirectoryPerType>true</useSubDirectoryPerType>
<includeArtifactIds>
guava,jetty-io,jetty-servlet,jetty-servlets,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client
guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf is in the same position w/ guava and jetty: spark uses it internally, shades and relocates it into spark-core.jar instead of including it into binary tgz directly, to avoid leaking downstream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable if it works. The less we expose protobuf the better

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this with nice analysis. Ya, definitely, this was the goal. BTW, could you send an email to dev@spark because this is an important removal of dependency, @pan3793 ?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side.

cc @mridulm , too

@LuciferYang
Copy link
Contributor

LuciferYang commented May 12, 2023

<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<scope>provided</scope>
</dependency>

This one should be 2.5.0 before, I think we should manually specify its version to keep the behavior unchanged ?

@LuciferYang
Copy link
Contributor

Thank you for doing this with nice analysis. Ya, definitely, this was the goal. BTW, could you send an email to dev@spark because this is an important removal of dependency, @pan3793 ?

+1, Agree

@pan3793
Copy link
Member Author

pan3793 commented May 13, 2023

@dongjoon-hyun @LuciferYang FYI, I have sent a mail to the mailing list
https://lists.apache.org/thread/xzrov348c3dq0d8jwcxf4j7fk7lb3r92

@pan3793
Copy link
Member Author

pan3793 commented May 13, 2023

<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<scope>provided</scope>
</dependency>

This one should be 2.5.0 before, I think we should manually specify its version to keep the behavior unchanged ?

@LuciferYang The version here does not matter. What's matter is the <scope>provided</scope>, which is used to exclude deps from the assembly jar.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srowen
Copy link
Member

srowen commented May 14, 2023

Merged to master

@srowen srowen closed this in b231850 May 14, 2023
dongjoon-hyun added a commit that referenced this pull request Feb 14, 2024
…rom `kinesis-asl-assembly`

### What changes were proposed in this pull request?

This PR aims to remove the shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly`.

### Why are the changes needed?

Technically, this is a revert of SPARK-14421.
- At that time, Apache Spark 2.0.0 distribution contains `protobuf-java-2.5.0.jar`.
  - #13054
- SPARK-43489 removed `protobuf-java-2.5.0.jar` from Apache Spark 3.5+ distribution.
  - #41153

### Does this PR introduce _any_ user-facing change?

No, but the user can inject the required protobuf-jar from application layers.

### How was this patch tested?

Pass the CIs and manual review.

**BEFORE**
```
$ build/mvn dependency:tree -pl connector/kinesis-asl-assembly -Pkinesis-asl | grep protobuf-java
...
[INFO] +- com.google.protobuf:protobuf-java:jar:2.6.1:provided
```

**AFTER**
```
$ build/mvn dependency:tree -pl connector/kinesis-asl-assembly -Pkinesis-asl | grep protobuf-java
...
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:3.25.1:provided
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45096 from dongjoon-hyun/SPARK-47038.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants