Skip to content

feat(): Export partition resitration times as runtime metrics#27437

Merged
spershin merged 1 commit intoprestodb:masterfrom
spershin:export-D98159827
Mar 25, 2026
Merged

feat(): Export partition resitration times as runtime metrics#27437
spershin merged 1 commit intoprestodb:masterfrom
spershin:export-D98159827

Conversation

@spershin
Copy link
Copy Markdown
Contributor

@spershin spershin commented Mar 25, 2026

Summary:
Export time spent on various partition registration activities as
query runtime metrics.
To expose potential metastore regression and slowness.

Differential Revision: D98159827

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

Summary by Sourcery

Track and export metastore partition registration and statistics update timings as query runtime metrics.

New Features:

  • Record wall-clock time for add partitions, alter partition, and table/partition statistics update operations in the Hive metastore and expose them via query runtime stats.
  • Propagate the coordinator session's RuntimeStats into the semi-transactional Hive metastore to ensure partition registration metrics are attributed to the correct query.

Enhancements:

  • Wire query RuntimeStats from HiveMetadata into SemiTransactionalHiveMetastore commit paths to centralize metastore timing measurement.

Summary:
Export time spent on various partition registration activities as
query runtime metrics.
To expose potential metastore regression and slowness.

Differential Revision: D98159827
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Mar 25, 2026
@linux-foundation-easycla
Copy link
Copy Markdown

CLA Missing ID CLA Not Signed

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 25, 2026

Reviewer's Guide

Adds runtime metrics collection for Hive metastore partition registration and statistics update operations, wiring coordinator session RuntimeStats into SemiTransactionalHiveMetastore and timing key metastore calls using new RuntimeMetricName constants.

Sequence diagram for wiring RuntimeStats into metastore partition registration

sequenceDiagram
    actor User
    participant Coordinator
    participant HiveMetadata
    participant SemiTransactionalHiveMetastore
    participant MetastoreContext
    participant ExtendedHiveMetastore
    participant RuntimeStats

    User->>Coordinator: submit_query
    Coordinator->>HiveMetadata: beginCreateTable(session)
    HiveMetadata->>RuntimeStats: session.getRuntimeStats()
    HiveMetadata->>SemiTransactionalHiveMetastore: setQueryRuntimeStats(runtimeStats)
    HiveMetadata->>SemiTransactionalHiveMetastore: beginCreateTable_operations

    loop commit_phase
        HiveMetadata->>SemiTransactionalHiveMetastore: commitShared()
        SemiTransactionalHiveMetastore->>MetastoreContext: getRuntimeStats()
        MetastoreContext-->>SemiTransactionalHiveMetastore: runtimeStats
        SemiTransactionalHiveMetastore->>RuntimeStats: recordWallTime(METASTORE_ADD_PARTITIONS_TIME_NANOS, addPartitions)
        RuntimeStats->>ExtendedHiveMetastore: addPartitions(metastoreContext, schema, table, partitions)
        ExtendedHiveMetastore-->>RuntimeStats: addPartitions_result
        RuntimeStats-->>SemiTransactionalHiveMetastore: timing_recorded
    end
Loading

Class diagram for updated metastore runtime metrics integration

classDiagram

    class RuntimeStats {
        +recordWallTime(metricName String, operation Callable) Object
    }

    class RuntimeMetricName {
        <<final>>
        +METASTORE_ADD_PARTITIONS_TIME_NANOS String
        +METASTORE_ALTER_PARTITION_TIME_NANOS String
        +METASTORE_ALTER_PARTITIONS_TIME_NANOS String
        +METASTORE_UPDATE_PARTITION_STATISTICS_TIME_NANOS String
        +METASTORE_UPDATE_TABLE_STATISTICS_TIME_NANOS String
    }

    class MetastoreContext {
        +getRuntimeStats() RuntimeStats
    }

    class ExtendedHiveMetastore {
        +addPartitions(metastoreContext MetastoreContext, schemaName String, tableName String, partitions List) Object
        +alterPartition(metastoreContext MetastoreContext, databaseName String, tableName String, newPartition Object) Object
        +updatePartitionStatistics(metastoreContext MetastoreContext, schemaName String, tableName String, partitionName String, updateFunction Function) void
        +updateTableStatistics(metastoreContext MetastoreContext, schemaName String, tableName String, updateFunction Function) void
    }

    class SemiTransactionalHiveMetastore {
        -state State
        -throwOnCleanupFailure boolean
        -queryRuntimeStats RuntimeStats
        +setQueryRuntimeStats(runtimeStats RuntimeStats) void
        +commitShared() ConnectorCommitHandle
        +AddPartitionsOperation
        +AlterPartitionOperation
        +UpdateStatisticsOperation
    }

    class HiveMetadata {
        -metastore SemiTransactionalHiveMetastore
        +beginCreateTable(session ConnectorSession, tableMetadata ConnectorTableMetadata, layout Optional) HiveOutputTableHandle
        +beginInsert(session ConnectorSession, tableHandle ConnectorTableHandle) HiveInsertTableHandle
        -beginInsertInternal(session ConnectorSession, tableHandle ConnectorTableHandle) HiveInsertTableHandle
    }

    class ConnectorSession {
        +getRuntimeStats() RuntimeStats
    }

    class AddPartitionsOperation {
        +execute() void
    }

    class AlterPartitionOperation {
        +run(metastore ExtendedHiveMetastore) void
        +undo(metastore ExtendedHiveMetastore) void
    }

    class UpdateStatisticsOperation {
        +run(metastore ExtendedHiveMetastore) void
    }

    HiveMetadata --> SemiTransactionalHiveMetastore : uses
    HiveMetadata --> ConnectorSession : uses
    ConnectorSession --> RuntimeStats : provides

    SemiTransactionalHiveMetastore --> RuntimeStats : holds_queryRuntimeStats
    SemiTransactionalHiveMetastore --> MetastoreContext : uses
    SemiTransactionalHiveMetastore --> ExtendedHiveMetastore : delegates_to

    SemiTransactionalHiveMetastore *-- AddPartitionsOperation
    SemiTransactionalHiveMetastore *-- AlterPartitionOperation
    SemiTransactionalHiveMetastore *-- UpdateStatisticsOperation

    AddPartitionsOperation --> MetastoreContext : uses
    AddPartitionsOperation --> ExtendedHiveMetastore : calls_addPartitions
    AddPartitionsOperation --> RuntimeMetricName : uses_constants

    AlterPartitionOperation --> MetastoreContext : uses
    AlterPartitionOperation --> ExtendedHiveMetastore : calls_alterPartition
    AlterPartitionOperation --> RuntimeMetricName : uses_constants

    UpdateStatisticsOperation --> MetastoreContext : uses
    UpdateStatisticsOperation --> ExtendedHiveMetastore : calls_updateStatistics
    UpdateStatisticsOperation --> RuntimeMetricName : uses_constants

    RuntimeStats --> RuntimeMetricName : uses_metricName
Loading

File-Level Changes

Change Details Files
Propagate the coordinator session RuntimeStats into SemiTransactionalHiveMetastore commit paths so partition registration metrics are recorded against the correct query.
  • Introduce a volatile RuntimeStats field on SemiTransactionalHiveMetastore to hold the query-level RuntimeStats instance.
  • Add a setter to inject the query RuntimeStats from the coordinator session during beginCreateTable/beginInsert.
  • Use the stored queryRuntimeStats when constructing Hive committers instead of pulling RuntimeStats from HdfsContext sessions that may be reconstructed copies.
presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java
presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java
Add runtime timing metrics around key Hive metastore partition and statistics operations during commit.
  • Wrap metastore.alterPartition calls in RuntimeStats.recordWallTime using a METASTORE_ALTER_PARTITION_TIME_NANOS metric.
  • Wrap updatePartitionStatistics and updateTableStatistics calls in RuntimeStats.recordWallTime with dedicated partition/table statistics metrics.
  • Wrap batched metastore.addPartitions calls in RuntimeStats.recordWallTime using a METASTORE_ADD_PARTITIONS_TIME_NANOS metric.
presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java
Define new runtime metric names for metastore partition registration and statistics operations.
  • Add constants for metastore add partitions, alter partition(s), and update partition/table statistics time metrics to RuntimeMetricName.
presto-common/src/main/java/com/facebook/presto/common/RuntimeMetricName.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The mutable, shared queryRuntimeStats field in SemiTransactionalHiveMetastore relies on setQueryRuntimeStats being called on the correct instance before commit; if this class is ever reused across queries or threads, consider enforcing per-query scoping or passing RuntimeStats explicitly through commit APIs instead of a volatile field to avoid mis-attribution of metrics.
  • The new runtime metrics are wired only via beginCreateTable and beginInsert; if there are other write/commit paths (e.g., deletes, stats-only operations, or other SemiTransactionalHiveMetastore usages) that should contribute to the same query-level metrics, consider setting queryRuntimeStats for those entry points as well to keep attribution consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The mutable, shared `queryRuntimeStats` field in `SemiTransactionalHiveMetastore` relies on `setQueryRuntimeStats` being called on the correct instance before commit; if this class is ever reused across queries or threads, consider enforcing per-query scoping or passing `RuntimeStats` explicitly through commit APIs instead of a volatile field to avoid mis-attribution of metrics.
- The new runtime metrics are wired only via `beginCreateTable` and `beginInsert`; if there are other write/commit paths (e.g., deletes, stats-only operations, or other SemiTransactionalHiveMetastore usages) that should contribute to the same query-level metrics, consider setting `queryRuntimeStats` for those entry points as well to keep attribution consistent.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment

columnConverterProvider,
hdfsContext.getSession().map(ConnectorSession::getWarningCollector).orElse(NOOP),
hdfsContext.getSession().map(ConnectorSession::getRuntimeStats).orElseGet(RuntimeStats::new));
queryRuntimeStats);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to store the runtime stats as a filed? Is it possible to get it from session directly (e.g.: hdfsContext.getSession().getRuntimeStats())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arhimondr
According to AI the session of the hdfsContext is a different instance and any metrics dumped there won't be in the query's metrics. :(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My early test runs showed no new metric added to the query if we use hdfsContext.
That is a very different context. Not sure we even need that.
Something could probably be improved here, but it is out of scope for what we want to achieve.

@spershin spershin merged commit 9fb538c into prestodb:master Mar 25, 2026
83 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants