Skip to content

Log missing AWS Glue partition#10501

Closed
findinpath wants to merge 1 commit intotrinodb:masterfrom
findinpath:log-missing-aws-glue-partition
Closed

Log missing AWS Glue partition#10501
findinpath wants to merge 1 commit intotrinodb:masterfrom
findinpath:log-missing-aws-glue-partition

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

No description provided.

ImmutableMap.Builder<String, Optional<Partition>> resultBuilder = ImmutableMap.builder();
for (Entry<String, List<String>> entry : partitionNameToPartitionValuesMap.entrySet()) {
Partition partition = partitionValuesToPartitionMap.get(entry.getValue());
if (partition == null) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging strategy may be rather naive.
If lots of partitions are deleted between the time their names gets retrieved and the time their metadata gets retrieved, we could end up with lots of useless warnings in the logs.

ImmutableMap.Builder<String, Optional<Partition>> resultBuilder = ImmutableMap.builder();
for (Entry<String, List<String>> entry : partitionNameToPartitionValuesMap.entrySet()) {
Partition partition = partitionValuesToPartitionMap.get(entry.getValue());
if (partition == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the partition is gone during query planning, isn't that an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. This is an error.
Would it make maybe more sense to log in the exception ALL the partitions which are missing ? Currently we just throw an exception for the first partition found missing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the throwing code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no throw here then?

BTW the code here seems to come from

resultBuilder.put(entry.getKey(), Optional.ofNullable(partition));
, which seemingly was approved in prestodb/presto#4878, so cc @dain for more context

Copy link
Copy Markdown
Contributor Author

@findinpath findinpath Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of the method

Map<String, Optional<Partition>> getPartitionsByNames(String databaseName, String tableName, List<String> partitionNames);

exists there from the first commit recorded on the HiveMetastore.
In this context I think it is considered valid to deal with NULL partitions.

@findepi findepi requested a review from electrum January 10, 2022 08:49
@findinpath
Copy link
Copy Markdown
Contributor Author

The partitions were missing due to a particularity in the AWS Glue batch_get_partition call which returns in case that the response payload exceeds a certain threshold (~ 5 MB) the partition keys which were not processed as part of the request in the UnprocessedKeys field of the response.

See #10696

@findinpath findinpath closed this Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants