feat(connector): Add support for AWS Glue Table and Column Statistics by imjalpreet · Pull Request #26297 · prestodb/presto

imjalpreet · 2025-10-13T21:51:40Z

Description

Add support for AWS Glue Table and Column Statistics

Motivation and Context

Based on trinodb/trino@f1bcfa7

Impact

Users will be able to utilize statistics and enable CBO when using AWS Glue as a metastore.

Test Plan

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Connector Changes
* Add support for AWS Glue Table and Column Statistics

sourcery-ai · 2025-10-13T21:51:47Z

Reviewer's Guide

This PR adds comprehensive AWS Glue table and column statistics support by introducing a pluggable GlueColumnStatisticsProvider abstraction (with enabled/disabled implementations), wiring new executors and configuration flags, extending GlueHiveMetastore to delegate statistics operations (read and write), improving batch partition fetch and update logic, extending the updatePartitionStatistics API across implementations, and providing converter utilities and necessary test updates.

Sequence diagram for updating partition statistics with column statistics in GlueHiveMetastore

sequenceDiagram
participant "Caller"
participant "GlueHiveMetastore"
participant "DefaultGlueColumnStatisticsProvider"
participant "AWSGlueAsync"
"Caller"->>"GlueHiveMetastore": updatePartitionStatistics(...)
"GlueHiveMetastore"->>"DefaultGlueColumnStatisticsProvider": getPartitionColumnStatistics(partitions)
"DefaultGlueColumnStatisticsProvider"->>"AWSGlueAsync": GetColumnStatisticsForPartition
"AWSGlueAsync"-->>"DefaultGlueColumnStatisticsProvider": Partition column stats
"GlueHiveMetastore"->>"AWSGlueAsync": batchUpdatePartitionAsync
"GlueHiveMetastore"->>"DefaultGlueColumnStatisticsProvider": updatePartitionStatistics(updates)
"DefaultGlueColumnStatisticsProvider"->>"AWSGlueAsync": UpdateColumnStatisticsForPartition
"DefaultGlueColumnStatisticsProvider"->>"AWSGlueAsync": DeleteColumnStatisticsForPartition (if needed)
"AWSGlueAsync"-->>"DefaultGlueColumnStatisticsProvider": Update/Delete result
"GlueHiveMetastore"-->>"Caller": Done

ER diagram for GlueHiveMetastoreConfig statistics-related properties

erDiagram
GLUE_HIVE_METASTORE_CONFIG {
  bool columnStatisticsEnabled
  int readStatisticsThreads
  int writeStatisticsThreads
}
GLUE_HIVE_METASTORE_CONFIG ||--o| GLUE_HIVE_METASTORE : "configures"

Class diagram for new and updated Glue column statistics support

classDiagram
class GlueHiveMetastore {
  - GlueColumnStatisticsProvider columnStatisticsProvider
  - boolean enableColumnStatistics
  + getSupportedColumnStatistics(type)
  + getTableStatistics(...)
  + getPartitionStatistics(...)
  + updateTableStatistics(...)
  + updatePartitionStatistics(...)
}
class GlueColumnStatisticsProvider {
  <<interface>>
  + getSupportedColumnStatistics(type)
  + getTableColumnStatistics(table)
  + getPartitionColumnStatistics(partitions)
  + updateTableColumnStatistics(table, columnStatistics)
  + updatePartitionStatistics(partitionStatisticsUpdates)
}
class DefaultGlueColumnStatisticsProvider {
  + getSupportedColumnStatistics(type)
  + getTableColumnStatistics(table)
  + getPartitionColumnStatistics(partitions)
  + updateTableColumnStatistics(table, columnStatistics)
  + updatePartitionStatistics(partitionStatisticsUpdates)
}
class DisabledGlueColumnStatisticsProvider {
  + getSupportedColumnStatistics(type)
  + getTableColumnStatistics(table)
  + getPartitionColumnStatistics(partitions)
  + updateTableColumnStatistics(table, columnStatistics)
  + updatePartitionStatistics(partitionStatisticsUpdates)
}
GlueHiveMetastore --> GlueColumnStatisticsProvider
GlueColumnStatisticsProvider <|.. DefaultGlueColumnStatisticsProvider
GlueColumnStatisticsProvider <|.. DisabledGlueColumnStatisticsProvider
class GlueHiveMetastoreConfig {
  + boolean columnStatisticsEnabled
  + int readStatisticsThreads
  + int writeStatisticsThreads
  + setColumnStatisticsEnabled(...)
  + setReadStatisticsThreads(...)
  + setWriteStatisticsThreads(...)
}
class GlueMetastoreModule {
  + createStatisticsReadExecutor(...)
  + createStatisticsWriteExecutor(...)
}
class ForGlueColumnStatisticsRead {
  <<annotation>>
}
class ForGlueColumnStatisticsWrite {
  <<annotation>>
}
GlueMetastoreModule --> ForGlueColumnStatisticsRead
GlueMetastoreModule --> ForGlueColumnStatisticsWrite
DefaultGlueColumnStatisticsProvider --> GlueMetastoreStats
class GlueMetastoreStats {
  + getGetColumnStatisticsForTable()
  + getGetColumnStatisticsForPartition()
  + getUpdateColumnStatisticsForTable()
  + getDeleteColumnStatisticsForTable()
  + getUpdateColumnStatisticsForPartition()
  + getDeleteColumnStatisticsForPartition()
}

File-Level Changes

Change	Details	Files
Introduce column statistics provider abstraction and DI	Define GlueColumnStatisticsProvider interface with methods for table and partition stats Implement DefaultGlueColumnStatisticsProvider and DisabledGlueColumnStatisticsProvider Add @ForGlueColumnStatisticsRead and @ForGlueColumnStatisticsWrite qualifiers Bind the provider in GlueMetastoreModule for dependency injection	`presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueColumnStatisticsProvider.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/DefaultGlueColumnStatisticsProvider.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/DisabledGlueColumnStatisticsProvider.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/ForGlueColumnStatisticsRead.java` `presto-hive-metastore/src/main/java/com/facebook/presto/presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/ForGlueColumnStatisticsWrite.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueMetastoreModule.java`
Enhance GlueHiveMetastore to integrate column statistics and improve batch partition operations	Add columnStatisticsProvider field, enableColumnStatistics flag, and inject separate executors Rename generic executor to partitionsReadExecutor and provide dedicated stats executors Override getSupportedColumnStatistics, getTableStatistics, getPartitionStatistics to fetch column stats Update methods updateTableStatistics, updatePartitionStatistics, addPartitions, alterPartition to delegate column stats writes Implement batchUpdatePartitionStatisticsBatch with BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE Refine batchGetPartition to handle unprocessed keys in a loop and reuse Immutable converters	`presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueHiveMetastore.java`
Add configuration options and metrics for column statistics	Introduce hive.metastore.glue.column-statistics-enabled, read-statistics-threads, write-statistics-threads in GlueHiveMetastoreConfig with validation Extend GlueMetastoreStats with get/update/delete metrics for table and partition statistics Update TestGlueHiveMetastoreConfig to assert new default and explicit mappings	`presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueHiveMetastoreConfig.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueMetastoreStats.java` `presto-hive-metastore/src/test/java/com/facebook/presto/hive/metastore/glue/TestGlueHiveMetastoreConfig.java`
Extend updatePartitionStatistics API and propagate signature change	Change ExtendedHiveMetastore.updatePartitionStatistics to accept Map<String,Function> for bulk updates Provide default overload for single-partition updates Adapt FileHiveMetastore, InMemoryCachingHiveMetastore, BridgingHiveMetastore, RecordingHiveMetastore, UnimplementedHiveMetastore to new signature	`presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/ExtendedHiveMetastore.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/file/FileHiveMetastore.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/InMemoryCachingHiveMetastore.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/BridgingHiveMetastore.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/RecordingHiveMetastore.java` `presto-hive-metastore/src/test/java/com/facebook/presto/hive/metastore/UnimplementedHiveMetastore.java`
Add converter utilities for Glue column statistics and utility enhancements	Introduce GlueStatConverter for mapping HiveColumnStatistics to AWS ColumnStatisticsData and back Add MetastoreUtil.makePartitionName helper Expose toMetastoreDistinctValuesCount and getAverageColumnLength in ThriftMetastoreUtil	`presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/converter/GlueStatConverter.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/MetastoreUtil.java` `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftMetastoreUtil.java`
Adjust tests for new statistics support and behavior	Enable column-statistics-enabled in TestHiveClientGlueMetastore setup Update AbstractTestHiveClient expected statistics values Remove or skip tests for unsupported column-level behavior	`presto-hive/src/test/java/com/facebook/presto/hive/metastore/glue/TestHiveClientGlueMetastore.java` `presto-hive/src/test/java/com/facebook/presto/hive/AbstractTestHiveClient.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Co-authored-by: Deepak Majeti <majeti.deepak@gmail.com> Co-authored-by: George Wang <fgwang7w@gmail.com>

yingsu00 · 2025-10-24T07:50:06Z

@imjalpreet Thank you for the PR! I'll review it tomorrow. @agrawalreetika Will you be able to review it first?

yingsu00 · 2025-10-25T22:45:43Z

...store/src/main/java/com/facebook/presto/hive/metastore/glue/converter/GlueStatConverter.java

+    public static List<ColumnStatistics> toGlueColumnStatistics(
+            Partition partition,
+            Map<String,
+                    HiveColumnStatistics> trinoColumnStats,


Remove "trino". Same to other places and files too.
Is this file ported over from Trino? If yes, please add co-authored-by section in the PR and commit message.

Thank you, I missed this. Yes, this class and a subset of these changes are part of a Trino PR. Additionally, we have modified the implementation to have a more optimized version. I will also add the PR details.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

In batchGetPartition the while‐loop can spin indefinitely if Glue keeps returning unprocessed keys but no partitions; consider adding a max retry or bail-out condition to avoid infinite loops.
The hard-coded batch sizes (BATCH_GET_PARTITION_MAX_PAGE_SIZE, BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE) may not match Glue’s actual limits or customer workloads—consider making them configurable or aligning them precisely with AWS Glue API docs.
Verify that the new default implementation of updatePartitionStatistics(single-partition) correctly delegates to the multi-partition overload so existing code paths and third-party metastore implementations remain fully compatible.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In batchGetPartition the while‐loop can spin indefinitely if Glue keeps returning unprocessed keys but no partitions; consider adding a max retry or bail-out condition to avoid infinite loops.
- The hard-coded batch sizes (BATCH_GET_PARTITION_MAX_PAGE_SIZE, BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE) may not match Glue’s actual limits or customer workloads—consider making them configurable or aligning them precisely with AWS Glue API docs.
- Verify that the new default implementation of updatePartitionStatistics(single-partition) correctly delegates to the multi-partition overload so existing code paths and third-party metastore implementations remain fully compatible.

## Individual Comments

### Comment 1
<location> `presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueHiveMetastore.java:437-438` </location>
<code_context>
-        if (!updatedStatistics.getColumnStatistics().isEmpty()) {
-            throw new PrestoException(NOT_SUPPORTED, "Glue metastore does not support column level statistics");
-        }
+        Iterables.partition(updates.entrySet(), BATCH_CREATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->
+                updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));
+    }
</code_context>

<issue_to_address>
**suggestion:** Partition statistics update batching uses BATCH_CREATE_PARTITION_MAX_PAGE_SIZE for both create and update.

Use BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE for update batching to match the update API's limits.

```suggestion
        Iterables.partition(updates.entrySet(), BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->
                updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));
```
</issue_to_address>

### Comment 2
<location> `presto-hive/src/test/java/com/facebook/presto/hive/metastore/glue/TestHiveClientGlueMetastore.java:178-181` </location>
<code_context>
         HdfsConfiguration hdfsConfiguration = new HiveHdfsConfiguration(new HdfsConfigurationInitializer(hiveClientConfig, metastoreClientConfig), ImmutableSet.of(), hiveClientConfig);
         HdfsEnvironment hdfsEnvironment = new HdfsEnvironment(hdfsConfiguration, metastoreClientConfig, new NoHdfsAuthentication());
-        GlueHiveMetastoreConfig glueConfig = new GlueHiveMetastoreConfig();
+        GlueHiveMetastoreConfig glueConfig = new GlueHiveMetastoreConfig().setColumnStatisticsEnabled(true);
         glueConfig.setDefaultWarehouseDir(tempDir.toURI().toString());

-        return new GlueHiveMetastore(hdfsEnvironment, glueConfig, executor);
+        return new GlueHiveMetastore(hdfsEnvironment, glueConfig, executor, executor, executor);
     }

</code_context>

<issue_to_address>
**suggestion (testing):** Test setup enables column statistics, but lacks direct tests for Glue column statistics behavior.

Please add or update tests to directly verify Glue column statistics functionality, including reading, writing, error handling, and propagation through the metastore.

Suggested implementation:

```java
    @Test
    public void testGlueColumnStatisticsWriteAndRead() throws Exception {
        // Setup: create table and column statistics
        String databaseName = "test_db";
        String tableName = "test_table";
        String columnName = "test_column";
        HiveColumnStatistics stats = HiveColumnStatistics.builder()
                .setNullsCount(1L)
                .setDistinctValuesCount(2L)
                .setMaxValue(Optional.of("z"))
                .setMinValue(Optional.of("a"))
                .build();

        // Create database and table
        glueMetastore.createDatabase(new HiveDatabase(databaseName, "owner", "location", ImmutableMap.of()));
        glueMetastore.createTable(new HiveTable(databaseName, tableName, "owner", "location", ImmutableList.of(), ImmutableMap.of()));

        // Write column statistics
        glueMetastore.updateTableColumnStatistics(databaseName, tableName, columnName, stats);

        // Read column statistics
        Optional<HiveColumnStatistics> readStats = glueMetastore.getTableColumnStatistics(databaseName, tableName, ImmutableList.of(columnName)).get(columnName);

        assertTrue(readStats.isPresent(), "Column statistics should be present");
        assertEquals(readStats.get().getNullsCount(), stats.getNullsCount(), "Nulls count should match");
        assertEquals(readStats.get().getDistinctValuesCount(), stats.getDistinctValuesCount(), "Distinct values count should match");
        assertEquals(readStats.get().getMaxValue(), stats.getMaxValue(), "Max value should match");
        assertEquals(readStats.get().getMinValue(), stats.getMinValue(), "Min value should match");
    }

    @Test
    public void testGlueColumnStatisticsErrorHandling() {
        String databaseName = "nonexistent_db";
        String tableName = "nonexistent_table";
        String columnName = "nonexistent_column";

        // Attempt to read statistics for non-existent table/column
        try {
            glueMetastore.getTableColumnStatistics(databaseName, tableName, ImmutableList.of(columnName));
            fail("Expected exception for non-existent table/column");
        } catch (Exception e) {
            // Expected: verify error propagation
            assertTrue(e.getMessage().contains("not found") || e.getMessage().contains("does not exist"));
        }
    }

    @Test
    public void testGlueColumnStatisticsPropagation() throws Exception {
        String databaseName = "propagation_db";
        String tableName = "propagation_table";
        String columnName = "propagation_column";
        HiveColumnStatistics stats = HiveColumnStatistics.builder()
                .setNullsCount(5L)
                .setDistinctValuesCount(10L)
                .build();

        glueMetastore.createDatabase(new HiveDatabase(databaseName, "owner", "location", ImmutableMap.of()));
        glueMetastore.createTable(new HiveTable(databaseName, tableName, "owner", "location", ImmutableList.of(), ImmutableMap.of()));
        glueMetastore.updateTableColumnStatistics(databaseName, tableName, columnName, stats);

        // Simulate propagation: update stats and verify new value
        HiveColumnStatistics updatedStats = HiveColumnStatistics.builder()
                .setNullsCount(7L)
                .setDistinctValuesCount(12L)
                .build();
        glueMetastore.updateTableColumnStatistics(databaseName, tableName, columnName, updatedStats);

        Optional<HiveColumnStatistics> readStats = glueMetastore.getTableColumnStatistics(databaseName, tableName, ImmutableList.of(columnName)).get(columnName);
        assertTrue(readStats.isPresent(), "Column statistics should be present after update");
        assertEquals(readStats.get().getNullsCount(), updatedStats.getNullsCount(), "Updated nulls count should match");
        assertEquals(readStats.get().getDistinctValuesCount(), updatedStats.getDistinctValuesCount(), "Updated distinct values count should match");
    }

```

- Ensure that the `glueMetastore` instance is properly initialized and available in the test class.
- If the test database/table creation or statistics update methods differ in your codebase, adjust the method calls accordingly.
- You may need to import relevant classes such as `HiveColumnStatistics`, `ImmutableList`, and assertion methods.
- If you use a different assertion library, replace `assertTrue`, `assertEquals`, and `fail` with your project's equivalents.
</issue_to_address>

### Comment 3
<location> `presto-hive/src/test/java/com/facebook/presto/hive/metastore/glue/TestHiveClientGlueMetastore.java:201-204` </location>
<code_context>
-
-    @Override
-    public void testUpdatePartitionColumnStatistics()
+    public void testUpdateTableColumnStatisticsEmptyOptionalFields() throws Exception
     {
-        // column statistics are not supported by Glue
+        // this test expects consistency between written and read stats but this is not provided by glue at the moment
+        // when writing empty min/max statistics glue will return 0 to the readers
+        // in order to avoid incorrect data we skip writes for statistics with min/max = null
     }
</code_context>

<issue_to_address>
**suggestion (testing):** Edge case for empty min/max statistics is acknowledged but not tested.

Consider adding a test that verifies Glue's handling of empty min/max statistics, ensuring the system responds correctly and helping to catch future regressions if Glue's behavior changes.

Suggested implementation:

```java
    @Test
    public void testGlueReturnsZeroForEmptyMinMaxStatistics() throws Exception
    {
        // Setup: create a table and column statistics with null min/max
        String databaseName = "test_db";
        String tableName = "test_table";
        String columnName = "test_column";
        // Create table and column if necessary (assume helper methods exist)
        createTestTable(databaseName, tableName, columnName);

        // Write column statistics with null min/max
        HiveColumnStatistics statsWithNullMinMax = HiveColumnStatistics.builder()
                .setMin(null)
                .setMax(null)
                .setNullsCount(0)
                .setDistinctValuesCount(0)
                .build();
        glueMetastore.updateTableColumnStatistics(databaseName, tableName, columnName, statsWithNullMinMax);

        // Read back statistics
        Optional<HiveColumnStatistics> readStats = glueMetastore.getTableColumnStatistics(databaseName, tableName, columnName);

        // Assert that Glue returns 0 for min/max
        assertTrue(readStats.isPresent(), "Statistics should be present");
        assertEquals(readStats.get().getMin(), 0, "Glue should return 0 for min when written as null");
        assertEquals(readStats.get().getMax(), 0, "Glue should return 0 for max when written as null");
    }

    @Override
    }

```

- You may need to implement or adjust helper methods like `createTestTable` and ensure `glueMetastore` is properly initialized for the test.
- Adjust the builder and assertion logic to match your actual `HiveColumnStatistics` API and types.
- If your statistics type is not integer, update the expected value and type accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-25T23:04:17Z

...-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/glue/GlueHiveMetastore.java

+        Iterables.partition(updates.entrySet(), BATCH_CREATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->
+                updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));


suggestion: Partition statistics update batching uses BATCH_CREATE_PARTITION_MAX_PAGE_SIZE for both create and update.

Use BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE for update batching to match the update API's limits.

Suggested change

Iterables.partition(updates.entrySet(), BATCH_CREATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->

updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));

Iterables.partition(updates.entrySet(), BATCH_UPDATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->

updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));

agrawalreetika · 2025-10-29T09:02:28Z

@imjalpreet could you please add Trino cherry-pick commits as well, whichever is relevant?

agrawalreetika · 2025-10-29T09:05:29Z

...c/main/java/com/facebook/presto/hive/metastore/glue/DefaultGlueColumnStatisticsProvider.java

+        }
+    }
+
+    private Optional<Map<String, HiveColumnStatistics>> getPartitionColumnStatisticsIfPresent(Partition partition)


imjalpreet · 2025-12-08T10:01:29Z

This PR needs some implementation changes to work with AWS SDK v2, as we are working on the upgrade: #26670. I will re-raise this feature as a separate PR with the updated implementation once the upgrade PR is merged.

prestodb-ci added the from:IBM PR from IBM label Oct 13, 2025

Add support for AWS Glue Statistics

c44ca2d

Co-authored-by: Deepak Majeti <majeti.deepak@gmail.com> Co-authored-by: George Wang <fgwang7w@gmail.com>

imjalpreet force-pushed the glueStatistics branch from b04463a to c44ca2d Compare October 13, 2025 22:19

yingsu00 requested a review from agrawalreetika October 24, 2025 07:49

yingsu00 self-assigned this Oct 24, 2025

yingsu00 reviewed Oct 25, 2025

View reviewed changes

yingsu00 marked this pull request as ready for review October 25, 2025 23:02

yingsu00 requested a review from a team as a code owner October 25, 2025 23:02

prestodb-ci requested review from a team and ScrapCodes and removed request for a team October 25, 2025 23:02

sourcery-ai bot reviewed Oct 25, 2025

View reviewed changes

agrawalreetika requested changes Oct 29, 2025

View reviewed changes

imjalpreet changed the title ~~feat(aws-glue): Add support for AWS Glue Table and Column Statistics~~ feat(connector): Add support for AWS Glue Table and Column Statistics Oct 29, 2025

imjalpreet closed this Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connector): Add support for AWS Glue Table and Column Statistics#26297

feat(connector): Add support for AWS Glue Table and Column Statistics#26297
imjalpreet wants to merge 1 commit intoprestodb:masterfrom
imjalpreet:glueStatistics

imjalpreet commented Oct 13, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot commented Oct 13, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

yingsu00 commented Oct 24, 2025

Uh oh!

yingsu00 Oct 25, 2025

Uh oh!

imjalpreet Oct 29, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Oct 25, 2025

Uh oh!

agrawalreetika commented Oct 29, 2025

Uh oh!

agrawalreetika Oct 29, 2025 •

edited

Loading

Uh oh!

imjalpreet commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		Iterables.partition(updates.entrySet(), BATCH_CREATE_PARTITION_MAX_PAGE_SIZE).forEach(partitionUpdates ->
		updatePartitionStatisticsBatch(metastoreContext, databaseName, tableName, partitionUpdates.stream().collect(toImmutableMap(Entry::getKey, Entry::getValue))));

Conversation

imjalpreet commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for updating partition statistics with column statistics in GlueHiveMetastore

ER diagram for GlueHiveMetastoreConfig statistics-related properties

Class diagram for new and updated Glue column statistics support

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

yingsu00 commented Oct 24, 2025

Uh oh!

yingsu00 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

imjalpreet Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

agrawalreetika commented Oct 29, 2025

Uh oh!

agrawalreetika Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imjalpreet commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

imjalpreet commented Oct 13, 2025 •

edited

Loading

sourcery-ai bot commented Oct 13, 2025 •

edited

Loading

agrawalreetika Oct 29, 2025 •

edited

Loading