Skip to content

Retry updating the table statistics in the context of concurrent modifications#19463

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:findinpath/update-table-stats-retry
Oct 26, 2023
Merged

Retry updating the table statistics in the context of concurrent modifications#19463
findepi merged 1 commit intotrinodb:masterfrom
findinpath:findinpath/update-table-stats-retry

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Oct 20, 2023

Description

Retry the updating the statistics of the AWS Glue table, in the situation that another query is also trying to perform concurrently the same action.

e.g. : two or more concurrent INSERT operations in a Hive table.

Context

io.trino.spi.TrinoException: All operations other than the following update operations were completed: replace table parameters myschema.mytable
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore$Committer.executeUpdateStatisticsOperations(SemiTransactionalHiveMetastore.java:2163)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commitShared(SemiTransactionalHiveMetastore.java:1567)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:1264)
...
 Caused by: com.amazonaws.services.glue.model.ConcurrentModificationException: Update table failed due to concurrent modifications. (Service: AWSGlue; Status Code: 400; Error Code: ConcurrentModificationException; Request ID: 653dcadf-aaa1-4c1a-aaea-49accf729ca0; Proxy: null)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
        at com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:13784)
        at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13751)
        at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:13740)
        at com.amazonaws.services.glue.AWSGlueClient.executeUpdateTable(AWSGlueClient.java:13509)
        at com.amazonaws.services.glue.AWSGlueClient.updateTable(AWSGlueClient.java:13478)
        at io.trino.plugin.hive.metastore.glue.GlueHiveMetastore.lambda$updateTableStatistics$8(GlueHiveMetastore.java:342)

Additional context and related issues

Same as done already in #16092 we are doing here retries to eventually succeed in updating the table statistics.

A further potential area which can be improved could be io.trino.plugin.hive.metastore.glue.GlueHiveMetastore#updatePartitionStatistics

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Hive
* Retry updating the table statistics in Glue in the context of concurrent modfications. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 20, 2023
@findinpath findinpath added release-notes hive Hive connector and removed cla-signed labels Oct 20, 2023
@findinpath findinpath self-assigned this Oct 20, 2023
@findinpath findinpath force-pushed the findinpath/update-table-stats-retry branch from 2aa537a to a98c945 Compare October 20, 2023 10:46
@findinpath findinpath force-pushed the findinpath/update-table-stats-retry branch from a98c945 to d832b32 Compare October 23, 2023 09:36
@findinpath findinpath requested a review from findepi October 23, 2023 09:37
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it used somewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pls check the proxiedGlueClient

Copy link
Copy Markdown
Contributor

@ssheikin ssheikin Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I checked one more time and see that it's asserted in the @test method. Didn't notice it before.

…fications

Retry the updating the statistics of the AWS Glue table,
in the situation that another query is also trying to perform
concurrently the same action.
@findinpath findinpath force-pushed the findinpath/update-table-stats-retry branch from d832b32 to 78f2727 Compare October 23, 2023 19:21
@findepi findepi merged commit c73a140 into trinodb:master Oct 26, 2023
@github-actions github-actions bot added this to the 431 milestone Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

3 participants