-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@HyukjinKwon @cloud-fan @tomvanbussel @ala @mswit-databricks @bart-samwel Please, review this PR. |
| after 1900, rebase LEGACY 27305 27305 0 3.7 273.0 0.1X | ||
| after 1900, rebase CORRECTED 27715 27715 0 3.6 277.2 0.1X | ||
| before 1900, rebase LEGACY 30911 30911 0 3.2 309.1 0.1X | ||
| before 1900, rebase CORRECTED 27944 27944 0 3.6 279.4 0.1X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet writer without rebasing is ~10% faster.
| before 1900, vec off, rebase LEGACY 20371 20458 81 4.9 203.7 0.8X | ||
| before 1900, vec off, rebase CORRECTED 17484 17541 54 5.7 174.8 1.0X | ||
| before 1900, vec on, rebase LEGACY 10284 10327 45 9.7 102.8 1.6X | ||
| before 1900, vec on, rebase CORRECTED 7044 7073 37 14.2 70.4 2.4X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vectorized Reader speed up: ~30%
| after 1900, vec on, rebase LEGACY 7183 7255 94 13.9 71.8 2.3X | ||
| after 1900, vec on, rebase CORRECTED 7047 7137 86 14.2 70.5 2.4X | ||
| before 1900, vec off, rebase LEGACY 20371 20458 81 4.9 203.7 0.8X | ||
| before 1900, vec off, rebase CORRECTED 17484 17541 54 5.7 174.8 1.0X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet-MR speed up ~15%
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #130090 has finished for PR 30118 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
spark.sql.legacy.parquet.int96RebaseModeInWritewhich was added by [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing #30056 inDateTimeRebaseBenchmark. The parquet readers should infer correct rebasing mode automatically from metadata.DateTimeRebaseBenchmarkin the environment:sudo add-apt-repository ppa:openjdk-r/ppa&sudo apt install openjdk-11-jdkWhy are the changes needed?
To have up-to-date info about INT96 performance which is the default type for Catalyst's timestamp type.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By updating benchmark results: