Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 10, 2024

What changes were proposed in this pull request?

This PR aims to remove a hard-coded compression codec name from benchmark case name in TPCDSQueryBenchmark.

Why are the changes needed?

GenTPCDSData can generate dataset with the other compression codecs than snappy. So, we had better remove the hard-coded Snappy because it's misleading.

$ JDK_JAVA_OPTIONS='spark.sql.parquet.compression.codec=zstd' \
build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData --dsdgenDir /Users/dongjoon/DATA/tpcds-kit/tools --location /Users/dongjoon/DATA/tpcds-sf-1-zstd --scaleFactor 1 --numPartitions 1"

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jan 10, 2024
@dongjoon-hyun
Copy link
Member Author

cc @yaooqinn

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Taking into account the gold files, I'm okay with leaving it as is for now.

@dongjoon-hyun
Copy link
Member Author

Thank you all! Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-46652 branch January 10, 2024 10:29
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
…benchmark case name

This PR aims to remove a hard-coded compression codec name from `benchmark case name` in `TPCDSQueryBenchmark`.

`GenTPCDSData` can generate dataset with the other compression codecs than `snappy`. So, we had better remove the hard-coded `Snappy` because it's misleading.

```
$ JDK_JAVA_OPTIONS='spark.sql.parquet.compression.codec=zstd' \
build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData --dsdgenDir /Users/dongjoon/DATA/tpcds-kit/tools --location /Users/dongjoon/DATA/tpcds-sf-1-zstd --scaleFactor 1 --numPartitions 1"
```

No.

Manual test.

No.

Closes apache#44657 from dongjoon-hyun/SPARK-46652.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants