-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36516][SQL] Support File Metadata Cache for ORC #33748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
45f4827
support orc file meta cache
LuciferYang a74c793
remove ForTailCacheReader
LuciferYang 1a9bd3b
rename test case
LuciferYang 30df269
remove private[sql] and add comments
LuciferYang 42d2bfd
Reduce method encapsulation
LuciferYang 0e6c52b
use PrivateMethodTester
LuciferYang 95bae3c
add a configable maximumSize
LuciferYang ebb7e0b
rename config
LuciferYang c36a569
move test
LuciferYang b02de85
update benchmark result
LuciferYang 406f91c
revert config name
LuciferYang 3b15a82
add compile same type
LuciferYang 2b99983
update mirco bench
LuciferYang 82ddf4f
change the default value of ttlSinceLastAccess
LuciferYang 1dd174e
change the default value of ttlSinceLastAccess
LuciferYang a339b1b
update conf doc to add Warning
LuciferYang 7153d2a
use a list config
LuciferYang c3838e6
Add checkValue to spark.sql.fileMetaCache.enabledSourceList and test …
LuciferYang 59d5bb9
change to use guava cache and update benchmark
LuciferYang 4adeb62
rename test case
LuciferYang e5f9497
add SEC to ttl
LuciferYang ec8fa1c
Revert "change to use guava cache and update benchmark"
LuciferYang db90daf
Revert "Revert "change to use guava cache and update benchmark""
LuciferYang 7327fdb
Merge branch 'upmaster' into SPARK-36516
LuciferYang 2907b2c
Merge branch 'master' of github.com:apache/spark into SPARK-36516
LuciferYang a7eff43
Merge branch 'upmaster' into SPARK-36516
LuciferYang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
95 changes: 95 additions & 0 deletions
95
sql/core/benchmarks/FileMetaCacheReadBenchmark-jdk11-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| ================================================================================================ | ||
| count(*) From 100 files | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 217 225 5 24.1 41.5 1.0X | ||
| count(*): fileMetaCacheEnabled = true 153 156 2 34.3 29.1 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 436 444 7 12.0 83.1 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 377 379 2 13.9 72.0 0.6X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 221 239 16 23.7 42.2 1.0X | ||
| count(*): fileMetaCacheEnabled = true 173 183 8 30.2 33.1 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 494 496 2 10.6 94.3 0.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 431 433 2 12.2 82.2 0.5X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 287 289 2 18.3 54.8 1.0X | ||
| count(*): fileMetaCacheEnabled = true 221 228 6 23.8 42.1 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 553 555 2 9.5 105.4 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 504 506 2 10.4 96.1 0.6X | ||
|
|
||
|
|
||
| ================================================================================================ | ||
| count(*) From 500 files | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 772 814 72 6.8 147.3 1.0X | ||
| count(*): fileMetaCacheEnabled = true 534 537 2 9.8 101.9 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1341 1343 3 3.9 255.8 0.6X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1115 1116 1 4.7 212.7 0.7X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 793 881 117 6.6 151.3 1.0X | ||
| count(*): fileMetaCacheEnabled = true 564 569 4 9.3 107.6 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1473 1475 3 3.6 281.0 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1253 1254 1 4.2 238.9 0.6X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 862 902 45 6.1 164.4 1.0X | ||
| count(*): fileMetaCacheEnabled = true 623 631 9 8.4 118.9 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1695 1698 4 3.1 323.3 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1437 1445 11 3.6 274.1 0.6X | ||
|
|
||
|
|
||
| ================================================================================================ | ||
| count(*) From 1000 files | ||
| ================================================================================================ | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1459 1501 59 3.6 278.3 1.0X | ||
| count(*): fileMetaCacheEnabled = true 1091 1092 1 4.8 208.0 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 2518 2520 3 2.1 480.2 0.6X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 2122 2130 11 2.5 404.7 0.7X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1505 1506 1 3.5 287.0 1.0X | ||
| count(*): fileMetaCacheEnabled = true 1138 1138 1 4.6 217.1 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 2787 2798 16 1.9 531.5 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 2405 2405 1 2.2 458.7 0.6X | ||
|
|
||
| OpenJDK 64-Bit Server VM 11.0.12+7-LTS on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1610 1610 1 3.3 307.0 1.0X | ||
| count(*): fileMetaCacheEnabled = true 1299 1308 13 4.0 247.7 1.2X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 3121 3123 3 1.7 595.4 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 2828 2828 1 1.9 539.3 0.6X | ||
|
|
95 changes: 95 additions & 0 deletions
95
sql/core/benchmarks/FileMetaCacheReadBenchmark-results.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| ================================================================================================ | ||
| count(*) From 100 files | ||
| ================================================================================================ | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 190 196 8 27.6 36.2 1.0X | ||
| count(*): fileMetaCacheEnabled = true 134 138 5 39.2 25.5 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 377 384 8 13.9 72.0 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 328 333 6 16.0 62.6 0.6X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 187 192 8 28.0 35.7 1.0X | ||
| count(*): fileMetaCacheEnabled = true 146 150 6 35.9 27.9 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 396 400 7 13.2 75.5 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 351 355 5 14.9 67.0 0.5X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 100 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 237 241 6 22.1 45.2 1.0X | ||
| count(*): fileMetaCacheEnabled = true 192 197 6 27.3 36.6 1.2X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 465 471 8 11.3 88.8 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 422 426 7 12.4 80.5 0.6X | ||
|
|
||
|
|
||
| ================================================================================================ | ||
| count(*) From 500 files | ||
| ================================================================================================ | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 647 656 6 8.1 123.4 1.0X | ||
| count(*): fileMetaCacheEnabled = true 431 437 7 12.2 82.3 1.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1157 1160 5 4.5 220.7 0.6X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 934 947 11 5.6 178.2 0.7X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 673 684 9 7.8 128.5 1.0X | ||
| count(*): fileMetaCacheEnabled = true 461 468 9 11.4 87.9 1.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1277 1280 5 4.1 243.5 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1052 1066 20 5.0 200.6 0.6X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 500 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 720 726 11 7.3 137.3 1.0X | ||
| count(*): fileMetaCacheEnabled = true 503 509 10 10.4 96.0 1.4X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 1468 1469 1 3.6 280.0 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1232 1238 9 4.3 234.9 0.6X | ||
|
|
||
|
|
||
| ================================================================================================ | ||
| count(*) From 1000 files | ||
| ================================================================================================ | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 10 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1239 1245 9 4.2 236.3 1.0X | ||
| count(*): fileMetaCacheEnabled = true 995 996 2 5.3 189.7 1.2X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 2161 2169 12 2.4 412.1 0.6X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 1864 1865 1 2.8 355.5 0.7X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 50 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1292 1294 3 4.1 246.5 1.0X | ||
| count(*): fileMetaCacheEnabled = true 1086 1097 16 4.8 207.2 1.2X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 2388 2396 12 2.2 455.4 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 2176 2177 0 2.4 415.1 0.6X | ||
|
|
||
| Java HotSpot(TM) 64-Bit Server VM 1.8.0_152-b16 on Linux 4.14.0_1-0-0-42 | ||
| Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz | ||
| count(*) from 100 columns with 1000 files: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative | ||
| ---------------------------------------------------------------------------------------------------------------------------------- | ||
| count(*): fileMetaCacheEnabled = false 1371 1372 2 3.8 261.5 1.0X | ||
| count(*): fileMetaCacheEnabled = true 1084 1096 17 4.8 206.7 1.3X | ||
| count(*) with Filter: fileMetaCacheEnabled = false 2698 2708 13 1.9 514.7 0.5X | ||
| count(*) with Filter: fileMetaCacheEnabled = true 2408 2408 0 2.2 459.2 0.6X | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add
.checkValue? The valid value is onlyorcin this PR.After merging this PR, you can extend it to
parquetinside Parquet PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c3838e6 add
.checkValueand test case