[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118

MaxGekk · 2020-10-21T10:30:24Z

What changes were proposed in this pull request?

Turn off/on the SQL config spark.sql.legacy.parquet.int96RebaseModeInWrite which was added by [SPARK-33160][SQL] Allow saving/loading INT96 in parquet w/o rebasing #30056 in DateTimeRebaseBenchmark. The parquet readers should infer correct rebasing mode automatically from metadata.
Regenerate benchmark results of DateTimeRebaseBenchmark in the environment:

Item	Description
Region	us-west-2 (Oregon)
Instance	r3.xlarge (spot instance)
AMI	ami-06f2f779464715dc5 (ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1)
Java	OpenJDK8/11 installed by`sudo add-apt-repository ppa:openjdk-r/ppa` & `sudo apt install openjdk-11-jdk`

Why are the changes needed?

To have up-to-date info about INT96 performance which is the default type for Catalyst's timestamp type.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By updating benchmark results:

$ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.DateTimeRebaseBenchmark"

MaxGekk · 2020-10-21T10:31:58Z

@HyukjinKwon @cloud-fan @tomvanbussel @ala @mswit-databricks @bart-samwel Please, review this PR.

MaxGekk · 2020-10-21T10:34:39Z

sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt

+after 1900, rebase LEGACY                         27305          27305           0          3.7         273.0       0.1X
+after 1900, rebase CORRECTED                      27715          27715           0          3.6         277.2       0.1X
+before 1900, rebase LEGACY                        30911          30911           0          3.2         309.1       0.1X
+before 1900, rebase CORRECTED                     27944          27944           0          3.6         279.4       0.1X


Parquet writer without rebasing is ~10% faster.

MaxGekk · 2020-10-21T10:35:29Z

sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt

+before 1900, vec off, rebase LEGACY               20371          20458          81          4.9         203.7       0.8X
+before 1900, vec off, rebase CORRECTED            17484          17541          54          5.7         174.8       1.0X
+before 1900, vec on, rebase LEGACY                10284          10327          45          9.7         102.8       1.6X
+before 1900, vec on, rebase CORRECTED              7044           7073          37         14.2          70.4       2.4X


Vectorized Reader speed up: ~30%

MaxGekk · 2020-10-21T10:36:45Z

sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt

+after 1900, vec on, rebase LEGACY                  7183           7255          94         13.9          71.8       2.3X
+after 1900, vec on, rebase CORRECTED               7047           7137          86         14.2          70.5       2.4X
+before 1900, vec off, rebase LEGACY               20371          20458          81          4.9         203.7       0.8X
+before 1900, vec off, rebase CORRECTED            17484          17541          54          5.7         174.8       1.0X


Parquet-MR speed up ~15%

SparkQA · 2020-10-21T11:18:10Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34699/

SparkQA · 2020-10-21T11:42:41Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34699/

SparkQA · 2020-10-21T14:45:44Z

Test build #130090 has finished for PR 30118 at commit c6d5b5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-10-22T01:04:14Z

Merged to master.

MaxGekk added 3 commits October 21, 2020 10:48

Set mode for int96

4d7b24b

Update DateTimeRebaseBenchmark-jdk11-results.txt

b871ae8

Update DateTimeRebaseBenchmark-results.txt

c6d5b5c

MaxGekk commented Oct 21, 2020

View reviewed changes

HyukjinKwon approved these changes Oct 21, 2020

View reviewed changes

HyukjinKwon closed this in bbf2d6f Oct 22, 2020

MaxGekk deleted the int96-rebase-benchmark branch December 11, 2020 20:28

MaxGekk mentioned this pull request Mar 22, 2021

[SPARK-34815][SQL] Update CSVBenchmark #31917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118

[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118

Uh oh!

MaxGekk commented Oct 21, 2020 •

edited

Loading

Uh oh!

MaxGekk commented Oct 21, 2020

Uh oh!

MaxGekk Oct 21, 2020 •

edited

Loading

Uh oh!

MaxGekk Oct 21, 2020 •

edited

Loading

Uh oh!

MaxGekk Oct 21, 2020

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

HyukjinKwon commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118

[SPARK-33160][SQL][FOLLOWUP] Update benchmarks of INT96 type rebasing #30118

Uh oh!

Conversation

MaxGekk commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MaxGekk commented Oct 21, 2020

Uh oh!

MaxGekk Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

SparkQA commented Oct 21, 2020

Uh oh!

HyukjinKwon commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented Oct 21, 2020 •

edited

Loading

MaxGekk Oct 21, 2020 •

edited

Loading

MaxGekk Oct 21, 2020 •

edited

Loading