Add default value of Glue GetPartitions MaxResults#3024
Conversation
|
Discussed in: #2996 |
presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/glue/GlueHiveMetastore.java
Outdated
Show resolved
Hide resolved
|
Thanks a lot @l1x. We are personally hit hard by this issue since we want to move to Presto from Athena but the planning performance was abysmal for tables with significant number of partitions (>10k). Just wanted to let you know I appreciate this. PS: I came here when I myself was digging into this issue and came to the same conclusion that you did. |
|
AWS doesn't recommend any value but in practice from a small script I made to test I can see that there are no downsides to setting this to the max of 1000. The total time with a smaller value like 128 vs the max value is appreciable (3x slower for fetching almost 8k partitions). You can simulate this in a very crude way using the aws glue get-partitions --database-name db --table-name table --expression "dt>='2020-02-01'" --page-size 128
aws glue get-partitions --database-name db --table-name table --expression "dt>='2020-02-01'" --page-size 1000The aws cli internally implements the |
|
@hashhar I am glad I could help! Let me know how it goes. We could bump the number even higher or make it configurable. |
|
@l1x We are now running PrestoSQL on production and it haven't seen any practical issues with the smaller page size for queries scanning upto 50k partitions. Will keep an eye out. |
Cherry pick of trinodb/trino#3024 and trinodb/trino#4938 Co-authored-by: Istvan <istvan@lambdainsight.com> Co-authored-by: Ashhar Hasan <hashhar_dev@outlook.com>
Cherry pick of trinodb/trino#3024 and trinodb/trino#4938 Co-authored-by: Istvan <istvan@lambdainsight.com> Co-authored-by: Ashhar Hasan <hashhar_dev@outlook.com>
Fixes #2996
As per our discussion with @findepi and @ebyhr here is the fix for using batched requests with AWS Glue.