Skip to content

Test Iceberg cost-based plans with small files on TPC-DS#16645

Merged
findepi merged 1 commit intotrinodb:masterfrom
krvikash:iceberg-small-files-cost-based-plan
Mar 22, 2023
Merged

Test Iceberg cost-based plans with small files on TPC-DS#16645
findepi merged 1 commit intotrinodb:masterfrom
krvikash:iceberg-small-files-cost-based-plan

Conversation

@krvikash
Copy link
Contributor

@krvikash krvikash commented Mar 21, 2023

Test Iceberg cost-based plans with small files on TPC-DS. Test against unpartitioned small Parquet files. The total metadata file size added here is 2.4 MB.

The TPC-DS tables were generated by setting iceberg.target_max_file_size = '50MB' session property.

Data Location for tables and their respective data file count and size:

============== NDV: [iceberg-50MB-files-tpcds-sf1000-PARQUET] ============== iceberg.target_max_file_size = '50MB'; ==============
01. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/call_center/data], fileCount: [1], totalContentSize: [10.2 kB], averageFileSize: [10.2 kB]
02. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_page/data], fileCount: [1], totalContentSize: [932.3 kB], averageFileSize: [932.3 kB]
03. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_returns/data], fileCount: [183], totalContentSize: [8.4 GB], averageFileSize: [45.9 MB]
04. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_sales/data], fileCount: [1715], totalContentSize: [83.8 GB], averageFileSize: [48.9 MB]
05. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer/data], fileCount: [12], totalContentSize: [447.1 MB], averageFileSize: [37.3 MB]
06. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer_address/data], fileCount: [3], totalContentSize: [71.4 MB], averageFileSize: [23.8 MB]
07. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer_demographics/data], fileCount: [1], totalContentSize: [2.1 MB], averageFileSize: [2.1 MB]
08. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/date_dim/data], fileCount: [1], totalContentSize: [965.4 kB], averageFileSize: [965.4 kB]
09. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/household_demographics/data], fileCount: [1], totalContentSize: [9.4 kB], averageFileSize: [9.4 kB]
10. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/income_band/data], fileCount: [1], totalContentSize: [743 B], averageFileSize: [743 B]
11. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/inventory/data], fileCount: [46], totalContentSize: [1.9 GB], averageFileSize: [40.6 MB]
12. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/item/data], fileCount: [1], totalContentSize: [17.5 MB], averageFileSize: [17.5 MB]
13. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/promotion/data], fileCount: [1], totalContentSize: [46.4 kB], averageFileSize: [46.4 kB]
14. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/reason/data], fileCount: [1], totalContentSize: [1.3 kB], averageFileSize: [1.3 kB]
15. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/ship_mode/data], fileCount: [1], totalContentSize: [1.7 kB], averageFileSize: [1.7 kB]
16. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store/data], fileCount: [1], totalContentSize: [63.7 kB], averageFileSize: [63.7 kB]
17. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store_returns/data], fileCount: [261], totalContentSize: [12.6 GB], averageFileSize: [48.2 MB]
18. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store_sales/data], fileCount: [2086], totalContentSize: [104.6 GB], averageFileSize: [50.1 MB]
19. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/time_dim/data], fileCount: [1], totalContentSize: [405.4 kB], averageFileSize: [405.4 kB]
20. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/warehouse/data], fileCount: [1], totalContentSize: [3.8 kB], averageFileSize: [3.8 kB]
21. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_page/data], fileCount: [1], totalContentSize: [36.7 kB], averageFileSize: [36.7 kB]
22. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_returns/data], fileCount: [94], totalContentSize: [4.1 GB], averageFileSize: [43.9 MB]
23. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_sales/data], fileCount: [875], totalContentSize: [38.9 GB], averageFileSize: [44.5 MB]
24. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_site/data], fileCount: [1], totalContentSize: [10.9 kB], averageFileSize: [10.9 kB]
============== Total Size: [254.8 GB] ==============

Test against unpartitioned small Parquet files
@cla-bot cla-bot bot added the cla-signed label Mar 21, 2023
@krvikash krvikash self-assigned this Mar 21, 2023
@krvikash krvikash added the no-release-notes This pull request does not require release notes entry label Mar 21, 2023
@findepi
Copy link
Member

findepi commented Mar 21, 2023

Please add in PR description what is the total file size being added here.

@krvikash
Copy link
Contributor Author

Please add in PR description what is the total file size being added here.

Updated.

@findepi findepi merged commit 8eda7a5 into trinodb:master Mar 22, 2023
@github-actions github-actions bot added this to the 411 milestone Mar 23, 2023
@krvikash krvikash deleted the iceberg-small-files-cost-based-plan branch March 23, 2023 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry

Development

Successfully merging this pull request may close these issues.

3 participants