Skip to content

Mitigate Glue ThrottlingException via backoff retry strategy#28815

Merged
chenjian2664 merged 1 commit intotrinodb:masterfrom
chenjian2664:jack/retry-glue-request
Mar 25, 2026
Merged

Mitigate Glue ThrottlingException via backoff retry strategy#28815
chenjian2664 merged 1 commit intotrinodb:masterfrom
chenjian2664:jack/retry-glue-request

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

Retry on AWS SDK’s ThrottlingException to mitigate intermittent (flaky) issues:

Error:  io.trino.plugin.iceberg.catalog.glue.TestTrinoGlueCatalog.testTableWithVariantColumn -- Time elapsed: 52.78 s <<< ERROR!
java.lang.RuntimeException: io.trino.spi.TrinoException: Rate exceeded (Service: Glue, Status Code: 400, Request ID: 242257bb-5fdc-49e5-834d-2befa425645a) (SDK Attempt Count: 4)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog.listTables(TrinoGlueCatalog.java:399)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog.listTables(TrinoGlueCatalog.java:371)
	at io.trino.plugin.iceberg.catalog.BaseTrinoCatalogTest.testTableWithVariantColumn(BaseTrinoCatalogTest.java:624)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1490)
	at java.base/java.util.concurrent.ForkJoinPool.helpJoin(ForkJoinPool.java:2248)
	at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:499)
	at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:666)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1490)
	at java.base/java.util.concurrent.ForkJoinPool.helpJoin(ForkJoinPool.java:2248)
	at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:499)
	at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:666)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1450)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2019)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
Caused by: io.trino.spi.TrinoException: Rate exceeded (Service: Glue, Status Code: 400, Request ID: 242257bb-5fdc-49e5-834d-2befa425645a) (SDK Attempt Count: 4)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog$1.computeNext(TrinoGlueCatalog.java:1698)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog$1.computeNext(TrinoGlueCatalog.java:1668)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:723)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog.lambda$listTables$2(TrinoGlueCatalog.java:391)
	at io.trino.plugin.base.util.ExecutorUtil.lambda$processWithAdditionalThreads$0(ExecutorUtil.java:70)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at java.base/java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:187)
	at io.trino.plugin.base.util.ExecutorUtil.processWithAdditionalThreads(ExecutorUtil.java:65)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog.listTables(TrinoGlueCatalog.java:394)
	... 17 more
Caused by: software.amazon.awssdk.services.glue.model.ThrottlingException: Rate exceeded (Service: Glue, Status Code: 400, Request ID: 242257bb-5fdc-49e5-834d-2befa425645a) (SDK Attempt Count: 4)
	at software.amazon.awssdk.services.glue.model.ThrottlingException$BuilderImpl.build(ThrottlingException.java:150)
	at software.amazon.awssdk.services.glue.model.ThrottlingException$BuilderImpl.build(ThrottlingException.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
	at software.amazon.awssdk.services.glue.DefaultGlueClient.getTables(DefaultGlueClient.java:32587)
	at software.amazon.awssdk.services.glue.paginators.GetTablesIterable$GetTablesResponseFetcher.nextPage(GetTablesIterable.java:111)
	at software.amazon.awssdk.services.glue.paginators.GetTablesIterable$GetTablesResponseFetcher.nextPage(GetTablesIterable.java:102)
	at software.amazon.awssdk.core.pagination.sync.PaginatedResponsesIterator.next(PaginatedResponsesIterator.java:58)
	at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1950)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:297)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:303)
	at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:669)
	at io.trino.plugin.iceberg.catalog.glue.TrinoGlueCatalog$1.computeNext(TrinoGlueCatalog.java:1681)
	... 36 more
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Rate exceeded (Service: Glue, Status Code: 400, Request ID: 9bc91de6-b35a-4de6-8335-8fecf7d8aede)
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Rate exceeded (Service: Glue, Status Code: 400, Request ID: b73a78ab-0fc0-4f6b-8465-66c728a5bd43)
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Rate exceeded (Service: Glue, Status Code: 400, Request ID: b7c722e6-d623-45b2-b834-5d7460ad27c3)

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 24, 2026
@github-actions github-actions bot added the hive Hive connector label Mar 24, 2026
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Mar 24, 2026

Caused by: software.amazon.awssdk.services.glue.model.ThrottlingException: Rate exceeded (Service: Glue, Status Code: 400, Request ID: 242257bb-5fdc-49e5-834d-2befa425645a) (SDK Attempt Count: 4)

Does this change really improve something? The stacktrace indicates there were several attempts already.

@chenjian2664
Copy link
Copy Markdown
Contributor Author

chenjian2664 commented Mar 24, 2026

@ebyhr That retry behavior comes from the SDK’s default policy. My understanding is that we should delegate the retry logic, including backoff and jitter:

                        .backoffStrategy(BackoffStrategy.exponentialDelay(
                                java.time.Duration.ofMillis(20),
                                java.time.Duration.ofMillis(1500)))
                        .maxAttempts(config.getMaxGlueErrorRetries())));

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry on Glue ThrottlingException

The commit title is a bit misleading. It implies that retries on ThrottlingException were not happening before this change.

@chenjian2664 chenjian2664 force-pushed the jack/retry-glue-request branch from 8bbec50 to 1bfb6de Compare March 24, 2026 08:26
@findinpath findinpath requested a review from krvikash March 24, 2026 08:49
@chenjian2664 chenjian2664 changed the title Retry on Glue ThrottlingException Mitigate Glue ThrottlingException via backoff retry strategy Mar 24, 2026
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 24, 2026
@chenjian2664 chenjian2664 requested a review from findinpath March 24, 2026 13:26
@chenjian2664 chenjian2664 force-pushed the jack/retry-glue-request branch 2 times, most recently from 77d52e0 to c91f775 Compare March 25, 2026 01:10
@chenjian2664 chenjian2664 force-pushed the jack/retry-glue-request branch from c91f775 to 1887482 Compare March 25, 2026 02:03
@chenjian2664 chenjian2664 merged commit 8851758 into trinodb:master Mar 25, 2026
65 checks passed
@chenjian2664 chenjian2664 deleted the jack/retry-glue-request branch March 25, 2026 12:43
@github-actions github-actions bot added this to the 481 milestone Mar 25, 2026
@ebyhr ebyhr mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

3 participants