Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 5, 2023

What changes were proposed in this pull request?

This PR aims to publish SBOM artifacts.

Why are the changes needed?

Here is an article to give some context.

Software Bill of Materials (SBOM) are additional artifacts containing the aggregate of all direct and transitive dependencies of a project. The US Government (based on NIST recommendations) currently accepts only the three most popular SBOM standards as valid, namely: CycloneDX, Software Identification (SWID) tag, Software Package Data Exchange® (SPDX).

This PR uses CycloneDX maven plugin, a lightweight software bill of materials (SBOM) standard designed for use in application security contexts and supply chain component analysis.

For example, spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml and spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.json files are attached to spark-tags_2.12-3.4.0-SNAPSHOT.jar.

$ ls -al ~/.m2/repository/org/apache/spark/spark-tags_2.12/3.4.0-SNAPSHOT
total 2488
drwxr-xr-x  12 dongjoon  staff      384 Jan  4 23:36 .
drwxr-xr-x   4 dongjoon  staff      128 Jan  4 23:36 ..
-rw-r--r--   1 dongjoon  staff      492 Jan  4 23:36 _remote.repositories
-rw-r--r--   1 dongjoon  staff     1955 Jan  4 23:36 maven-metadata-local.xml
-rw-r--r--   1 dongjoon  staff    16310 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.json
-rw-r--r--   1 dongjoon  staff    14045 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
-rw-r--r--   1 dongjoon  staff  1162027 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-javadoc.jar
-rw-r--r--   1 dongjoon  staff    16272 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-sources.jar
-rw-r--r--   1 dongjoon  staff    12453 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-test-sources.jar
-rw-r--r--   1 dongjoon  staff    10387 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT-tests.jar
-rw-r--r--   1 dongjoon  staff    15181 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT.jar
-rw-r--r--   1 dongjoon  staff     5822 Jan  4 23:36 spark-tags_2.12-3.4.0-SNAPSHOT.pom

Does this PR introduce any user-facing change?

Yes, but dev-only changes.

How was this patch tested?

Manually test.

$ mvn install -DskipTests
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 10.501 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 12.900 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 24.315 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 25.406 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 36.217 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 31.532 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 33.338 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 19.204 s]
[INFO] Spark Project Core ................................. SUCCESS [05:24 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [01:20 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:41 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:36 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [06:44 min]
[INFO] Spark Project SQL .................................. SUCCESS [07:10 min]
[INFO] Spark Project ML Library ........................... SUCCESS [05:48 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 17.132 s]
[INFO] Spark Project Hive ................................. SUCCESS [02:49 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 50.149 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  6.706 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 44.131 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:08 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [01:45 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:19 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 11.574 s]
[INFO] Spark Avro ......................................... SUCCESS [01:33 min]
[INFO] Spark Project Connect Common ....................... SUCCESS [ 48.653 s]
[INFO] Spark Project Connect Server ....................... SUCCESS [01:28 min]
[INFO] Spark Project Connect Client ....................... SUCCESS [ 19.989 s]
[INFO] Spark Protobuf ..................................... SUCCESS [01:24 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  49:49 min
[INFO] Finished at: 2023-01-05T02:06:51-08:00
[INFO] ------------------------------------------------------------------------

$ tree ~/.m2/repository/org/apache/spark | grep cyclonedx.xml
│   │   ├── spark-avro_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-catalyst_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect-client-jvm_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect-common_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-connect_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-graphx_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-hive_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-kvstore_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-launcher_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-mllib-local_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-mllib_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-network-common_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-network-shuffle_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-parent_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-protobuf_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-repl_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sketch_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sql-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-sql_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming-kafka-0-10-assembly_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-streaming_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-tags_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
│   │   ├── spark-token-provider-kafka-0-10_2.12-3.4.0-SNAPSHOT-cyclonedx.xml
    │   ├── spark-unsafe_2.12-3.4.0-SNAPSHOT-cyclonedx.xml

@github-actions github-actions bot added the BUILD label Jan 5, 2023
@dongjoon-hyun
Copy link
Member Author

cc @srowen and @HyukjinKwon

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 5, 2023 07:44
@dongjoon-hyun
Copy link
Member Author

Ah, it seems that I missed some failures. I convert this as Draft. Let me dig this.

[WARNING] An unexpected issue occurred attempting to resolve the effective pom for  org.xerial.snappy:snappy-java:1.1.8.4
org.apache.maven.project.ProjectBuildingException: Some problems were encountered while processing the POMs:
[ERROR] Unknown packaging: bundle @ line 6, column 16

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 5, 2023 09:51
@dongjoon-hyun
Copy link
Member Author

The PR is ready for review now. Could you review when you have some time?
@srowen , @HyukjinKwon , @cloud-fan , @viirya , @sunchao , @huaxingao

@srowen
Copy link
Member

srowen commented Jan 5, 2023

Seems fine to me. I'm not sure if the maven release plugin will also push this to Maven Central, but maybe that's not essential. Do the files look plausible, like they appear to contain the transitive dependencies and more or less match what's in the "deps" files in the repo?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 5, 2023

Thank you, @srowen .

  1. It goes to Maven Central. I did this at the Apache ORC project first via ORC-1342. Here is the published SBOM on the snapshot versions after merging ORC-1342. Apache Spark snapshot also will have this. So, we can verify this before releasing.
**1.9.0**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.9.0-SNAPSHOT/orc-core-1.9.0-20230105.074036-185-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.9.0-SNAPSHOT/orc-core-1.9.0-20230105.074036-185-cyclonedx.json

**1.8.2**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.8.2-SNAPSHOT/orc-core-1.8.2-20230105.074040-16-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.8.2-SNAPSHOT/orc-core-1.8.2-20230105.074040-16-cyclonedx.json

**1.7.8**
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.7.8-SNAPSHOT/orc-core-1.7.8-20230105.074050-2-cyclonedx.xml
- https://repository.apache.org/content/repositories/snapshots/org/apache/orc/orc-core/1.7.8-SNAPSHOT/orc-core-1.7.8-20230105.074050-2-cyclonedx.json
  1. While deps files are only for Spark binary distributions, this is a jar-level manifest. For example, spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.xml shows its dependency only without Kubernetes dependencies.
$ cat ~/.m2/repository/org/apache/spark/spark-core_2.12/3.4.0-SNAPSHOT/spark-core_2.12-3.4.0-SNAPSHOT-cyclonedx.json  | jq .components | grep \"name\" | head -n5
    "name": "avro",
    "name": "jackson-core",
    "name": "commons-compress",
    "name": "avro-mapred",
    "name": "avro-ipc",

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it basically generates aggregate of dependencies as xml and json files and attaches into jar files, right?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the context, looks good to me.

@viirya
Copy link
Member

viirya commented Jan 5, 2023

Looks good but maybe wait for a while for others to chime in if they have some opinions.

@dongjoon-hyun
Copy link
Member Author

Yes, right. Thank you, @viirya .

So it basically generates aggregate of dependencies as xml and json files and attaches into jar files, right?

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me too.

@dongjoon-hyun
Copy link
Member Author

Thank you, @sunchao

@dongjoon-hyun
Copy link
Member Author

Thank you all. Let me merge this.

@dongjoon-hyun
Copy link
Member Author

We can see the published SBOM tomorrow after tomorrow's snapshot publishing.

@dongjoon-hyun
Copy link
Member Author

@viirya
Copy link
Member

viirya commented Jan 7, 2023

Thanks @dongjoon-hyun !

dongjoon-hyun added a commit to apache/spark-kubernetes-operator that referenced this pull request Sep 23, 2025
### What changes were proposed in this pull request?

Since Apache Spark 3.4.0, Apache Spark main repository has been providing `SBOM` artifact. Like the main repository, this PR aims to publish `SBOM` artifacts of `Apache Spark K8s Operator` artifacts.

- apache/spark#39401
  - https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.13/4.0.1/spark-core_2.13-4.0.1-cyclonedx.xml

### Why are the changes needed?

Here is an article to give some context.

- https://www.activestate.com/blog/why-the-us-government-is-mandating-software-bill-of-materials-sbom/

Software Bill of Materials (SBOM) are additional artifacts containing the aggregate of all direct and transitive dependencies of a project. The US Government (based on NIST recommendations) currently accepts only the three most popular SBOM standards as valid, namely: [CycloneDX](https://cyclonedx.org/), [Software Identification (SWID) tag](https://csrc.nist.gov/projects/Software-Identification-SWID), [Software Package Data Exchange® (SPDX)](https://spdx.dev/).

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Manually run the following command and check the local Maven directory.

**COMMAND**

```
$ gradle publishApachePublicationToMavenLocal -Prelease
```

**BEFORE**

```
$ ls -al ~/.m2/repository/org/apache/spark/spark-operator-api/0.5.0-SNAPSHOT
total 976
drwxr-xr-x 15 dongjoon  staff     480 Sep 22 16:26 .
drwxr-xr-x  4 dongjoon  staff     128 Sep 22 16:26 ..
-rw-r--r--  1 dongjoon  staff    2632 Sep 22 16:26 maven-metadata-local.xml
-rw-r--r--  1 dongjoon  staff  233151 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-javadoc.jar
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-javadoc.jar.asc
-rw-r--r--  1 dongjoon  staff   52522 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-sources.jar
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-sources.jar.asc
-rw-r--r--  1 dongjoon  staff   17387 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-tests.jar
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT-tests.jar.asc
-rw-r--r--  1 dongjoon  staff  154249 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.jar
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.jar.asc
-rw-r--r--  1 dongjoon  staff    2683 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.module
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.module.asc
-rw-r--r--  1 dongjoon  staff    2289 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.pom
-rw-r--r--  1 dongjoon  staff     833 Sep 22 16:26 spark-operator-api-0.5.0-SNAPSHOT.pom.asc
```

**AFTER**

```
$ ls -al ~/.m2/repository/org/apache/spark/spark-operator-api/0.5.0-SNAPSHOT
total 5880
drwxr-xr-x 17 dongjoon  staff      544 Sep 22 16:27 .
drwxr-xr-x  4 dongjoon  staff      128 Sep 22 16:27 ..
-rw-r--r--  1 dongjoon  staff     3050 Sep 22 16:27 maven-metadata-local.xml
-rw-r--r--  1 dongjoon  staff  2505028 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-cyclonedx.xml
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-cyclonedx.xml.asc
-rw-r--r--  1 dongjoon  staff   233151 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-javadoc.jar
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-javadoc.jar.asc
-rw-r--r--  1 dongjoon  staff    52522 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-sources.jar
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-sources.jar.asc
-rw-r--r--  1 dongjoon  staff    17387 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-tests.jar
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT-tests.jar.asc
-rw-r--r--  1 dongjoon  staff   154249 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.jar
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.jar.asc
-rw-r--r--  1 dongjoon  staff     2683 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.module
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.module.asc
-rw-r--r--  1 dongjoon  staff     2289 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.pom
-rw-r--r--  1 dongjoon  staff      833 Sep 22 16:27 spark-operator-api-0.5.0-SNAPSHOT.pom.asc
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #332 from dongjoon-hyun/SPARK-53669.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants