Allow multiple or missing Hive bucket files#822
Conversation
|
@electrum do you happen to know if this os functionality similar to prestodb/presto#6282? |
|
@findepi It is similar but more flexible, as this allows any number of files per bucket, whereas that one seems to require the file count to be a multiple of the bucket count. This also changes Presto to write files using the Hive naming convention. |
|
With EMR 5-21 (Presto 0.215), still got this issue: Query 20191107_222140_00006_rf89j failed: Hive table 'dev.wifi_logs' is corrupt. The number of files in the directory (256) does not match the declared bucket count (64) for partition: date_key=2019-11-05 Are there any configurations to ignore this check? |
|
@jiegzhan If you can't upgrade just yet for some reason, please ask for more advice on |
@jiegzhan You are referring to the Prestodb distribution (release 0.215) that comes along with EMR while this feature is a part of Prestosql release 312. |
Cherry-pick of trinodb/trino#822, trinodb/trino#848 and trinodb/trino#1375 Co-authored-by: David Phillips <david@acz.org> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Cherry-pick of trinodb/trino#822. The following commits are included: Move table name to end of error message - trinodb/trino@a78f930 Add getSchemaTableName method - trinodb/trino@c761b47 Make HiveWritableTableHandle field final - trinodb/trino@6a06017 Remove unnecessary schemaTableName utility method - trinodb/trino@6323b9a Cleanup code in HiveMetadata - trinodb/trino@f0d5e52 Remove explicit file prefix for Hive writer handles - trinodb/trino@3d2f977 Allow query ID to be a file name prefix or suffix - trinodb/trino@7b6d37e Use Hive naming convention for bucket file names - trinodb/trino@b56b285 Simplify code in getBucketedSplits - trinodb/trino@f814cd6 Allow multiple or missing Hive bucket files - trinodb/trino@ebcbf22 Allow disabling the creation of empty bucket files trinodb/trino@dfaa70c Co-authored-by: David Phillips <david@acz.org>
Cherry-pick of trinodb/trino#822. The following commits are included: Move table name to end of error message - trinodb/trino@a78f930 Add getSchemaTableName method - trinodb/trino@c761b47 Make HiveWritableTableHandle field final - trinodb/trino@6a06017 Remove unnecessary schemaTableName utility method - trinodb/trino@6323b9a Cleanup code in HiveMetadata - trinodb/trino@f0d5e52 Remove explicit file prefix for Hive writer handles - trinodb/trino@3d2f977 Allow query ID to be a file name prefix or suffix - trinodb/trino@7b6d37e Use Hive naming convention for bucket file names - trinodb/trino@b56b285 Simplify code in getBucketedSplits - trinodb/trino@f814cd6 Allow multiple or missing Hive bucket files - trinodb/trino@ebcbf22 Allow disabling the creation of empty bucket files trinodb/trino@dfaa70c Co-authored-by: David Phillips <david@acz.org>
No description provided.