Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 15, 2023

What changes were proposed in this pull request?

This PR aims to enable KryoSerializer in TPCDSQueryBenchmark to enforce build-in SQL class registration.

Why are the changes needed?

GitHub Action CI will ensure that all new SQL related classes to be registered .

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs. I also manually tested like the following.

$ build/sbt "sql/Test/runMain org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --data-location /tmp/tpcds-sf-1"
...
[success] Total time: 2050 s (34:10), completed Jan 15, 2023 4:06:12 PM

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-42074][SQL] Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration [SPARK-42074][SQL] Enable KryoSerializer in TPCDSQueryBenchmark to enforce SQL class registration Jan 15, 2023
@dongjoon-hyun
Copy link
Member Author

Could you review this please, @viirya ?

.set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024).toString)
.set("spark.sql.crossJoin.enabled", "true")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause failure in TPCDS SF=1 GitHub Action job if some PR missed the class registration.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 3.4.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-42074 branch January 16, 2023 00:15
@LuciferYang
Copy link
Contributor

Late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants