Add timeout and retry logic to Azure token fetch #3113

fivetran-rahulprakash · 2025-11-21T05:35:43Z

Problem

The getAccessToken() method in AzureCredentialsStorageIntegration used an unbounded blocking call which could hang indefinitely if Azure's token endpoint was slow or unresponsive. This could lead to:

Thread pool exhaustion in high-concurrency scenarios
Cascading failures when Azure AD experiences degraded performance
Poor user experience with no visibility into token fetch failures

Solution

This PR adds defensive timeout and retry mechanisms using Project Reactor's built-in capabilities:

15-second timeout per individual token request attempt to prevent indefinite blocking
Exponential backoff retry (3 attempts with delays: 2s, 4s, 8s) with 50% jitter to prevent thundering herd during mass failures
90-second overall timeout as a safety net for the complete operation
Intelligent retry filtering for known transient Azure AD errors:
- AADSTS50058 - Token endpoint timeout
- AADSTS50078 - Service temporarily unavailable
- AADSTS700084 - Token refresh required
- 503 - Service unavailable
- 429 - Too many requests
Enhanced logging for better observability (warnings on errors, info on retries)

Testing

Code leverages existing reactor dependencies (no new dependencies)
Follows existing Polaris patterns for reactive error handling

Benefits

Improves system resilience to transient Azure service issues
Prevents indefinite blocking that could cascade to request timeouts
Provides better observability with structured logging
Uses well-established retry patterns with exponential backoff and jitter

Checklist

🛡️ Don't disclose security issues! (Not applicable - this is a resilience improvement)
🔗 Clearly explained why the changes are needed
🧪 Manually tested via compilation; reactive behavior follows reactor-core semantics
💡 Added comprehensive Javadoc explaining the retry strategy
🧾 Updated CHANGELOG.md (awaiting maintainer guidance on format)
📚 Updated documentation (no user-facing config changes)

Previously, the getAccessToken method used an unbounded blocking call which could hang indefinitely if Azure's token endpoint was slow or unresponsive. This change adds defensive timeout and retry mechanisms: - 15-second timeout per individual token request attempt - Exponential backoff retry (3 attempts: 2s, 4s, 8s) with 50% jitter to prevent thundering herd during mass failures - 90-second overall timeout as a safety net - Specific retry logic for known transient Azure AD errors (AADSTS50058, AADSTS50078, AADSTS700084, 503, 429) This makes the system more resilient to transient Azure service issues and prevents indefinite blocking that could cascade to request timeouts or service degradation.

dimas-b

Nice improvement! Thanks for your contribution, @fivetran-rahulprakash !

dimas-b · 2025-11-21T17:46:56Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

        defaultAzureCredential
            .getToken(new TokenRequestContext().addScopes(scope).setTenantId(tenantId))
-            .blockOptional()
+            .timeout(Duration.ofSeconds(15)) // Per-attempt timeout


We have RealmConfig here, could you add a general setting in FeatureConfiguration for this timeout. I suppose it could applicable to other integrations too (but, of course, in this PR we can concentrate on Azure only)

I've added four new configuration constants in FeatureConfiguration:
CLOUD_API_TIMEOUT_SECONDS (default: 15) - Per-attempt timeout
CLOUD_API_RETRY_COUNT (default: 3) - Number of retry attempts
CLOUD_API_RETRY_DELAY_SECONDS (default: 2) - Initial delay for exponential backoff
CLOUD_API_RETRY_JITTER_MILLIS (default: 500) - Maximum jitter to prevent thundering herd
These use generic naming (CLOUD_API_*) rather than Azure-specific names, making them reusable for future implementations.

dimas-b · 2025-11-21T17:47:40Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+                        tenantId,
+                        error.getMessage()))
+            .retryWhen(
+                Retry.backoff(3, Duration.ofSeconds(2)) // 3 retries: 2s, 4s, 8s


Having backoff settings configurable could also be helpful.

dimas-b · 2025-11-21T17:48:42Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+                    .jitter(0.5) // ±50% jitter to prevent thundering herd
+                    .filter(
+                        throwable ->
+                            throwable instanceof java.util.concurrent.TimeoutException


TimeoutException is already handled by isRetriableAzureException, right?

Yes, you're absolutely right! I missed that. Removed the duplicate check now. Thank you for catching that!

dimas-b · 2025-11-21T17:49:56Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+                                    "Azure token fetch exhausted after %d attempts for tenant %s",
+                                    retrySignal.totalRetries(), tenantId),
+                                retrySignal.failure())))
+            .blockOptional(Duration.ofSeconds(90)) // Maximum total wait time


Why do we need this on top of .timeout() (line 337)?

Good point! I initially added the overall timeout as a safety net to ensure we never block indefinitely, but you're right it's unnecessary. The combination of per-attempt timeout and .retryWhen() with exponential backoff already provides sufficient protection. Removed it now. Thanks for the feedback!

- Add 4 generic cloud provider API configuration constants: CLOUD_API_TIMEOUT_SECONDS (default: 15) CLOUD_API_RETRY_COUNT (default: 3) CLOUD_API_RETRY_DELAY_SECONDS (default: 2) CLOUD_API_RETRY_JITTER_MILLIS (default: 500) - Update AzureCredentialsStorageIntegration to use configurable values - Remove hardcoded 90s overall timeout (per-attempt timeout + retries sufficient) - Improve error logging and retry logic documentation - Generic naming allows future reuse by AWS/GCP storage integrations Addresses review comments from dimas-b on PR 3113

fivetran-rahulprakash · 2025-11-24T06:45:15Z

Thank you @dimas-b for the thorough review and excellent suggestions! I've addressed all your comments

fivetran-rahulprakash · 2025-11-24T06:55:29Z

polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java

+          .defaultValue(2)
+          .buildFeatureConfiguration();
+
+  public static final FeatureConfiguration<Integer> CLOUD_API_RETRY_JITTER_MILLIS =


I chose to use milliseconds instead of a 0-1 jitter factor for a few reasons:

User clarity - It's more intuitive for operators to specify "500 milliseconds of jitter" rather than understanding what "0.5 jitter factor" means (50% of the retry delay)
Concrete vs relative - Millis gives direct control over the maximum random delay added, while a factor requires understanding how it interacts with the exponential backoff delays
Consistency - All other time-based configs use concrete units (seconds/millis) rather than abstract factors
Predictability - With millis, the max jitter is always clear regardless of retry delay values

The small conversion cost (jitterMillis / 1000.0) is negligible compared to the benefits of making the config more operator friendly. Happy to change to 0-1 factor if you prefer that approach though!

dimas-b

LGTM overall, just some minor comments about the new config.

dimas-b · 2025-11-24T19:00:40Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+  private AccessToken getAccessToken(RealmConfig realmConfig, String tenantId) {
+    int timeoutSeconds = realmConfig.getConfig(CLOUD_API_TIMEOUT_SECONDS);
+    int retryCount = realmConfig.getConfig(CLOUD_API_RETRY_COUNT);
+    int initialDelaySeconds = realmConfig.getConfig(CLOUD_API_RETRY_DELAY_SECONDS);


Would you mind using millis for initialDelaySeconds... in some cases even 1 sec may be too long. Let's delegate what the min delay should be to the admin user who configures it.

Same for timeoutSeconds... I hope Azure SDK supports millis.

Thanks for the suggestion!

Changed both to milliseconds:
CLOUD_API_TIMEOUT_MILLIS (default: 15000ms)
CLOUD_API_RETRY_DELAY_MILLIS (default: 2000ms)

- Rename CLOUD_API_TIMEOUT_SECONDS to CLOUD_API_TIMEOUT_MILLIS (default: 15000ms) - Rename CLOUD_API_RETRY_DELAY_SECONDS to CLOUD_API_RETRY_DELAY_MILLIS (default: 2000ms) - Update AzureCredentialsStorageIntegration to use Duration.ofMillis() - Allows admins to configure sub-second timeouts for finer control Addresses review feedback from dimas-b

dimas-b

Sorry, one more minor comment from my side... Otherwise LGTM 👍

dimas-b · 2025-11-25T16:26:30Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+    int retryCount = realmConfig.getConfig(CLOUD_API_RETRY_COUNT);
+    int initialDelayMillis = realmConfig.getConfig(CLOUD_API_RETRY_DELAY_MILLIS);
+    int jitterMillis = realmConfig.getConfig(CLOUD_API_RETRY_JITTER_MILLIS);
+    double jitter = jitterMillis / 1000.0; // Convert millis to fraction for jitter factor


I'm not sure, I fully understand this logic... per javadoc of reactor.util.retry.RetryBackoffSpec.jitter() the factor applies to the "computed delay", which may not be 1000 ms 🤔 How can the user reason about what the CLOUD_API_RETRY_JITTER_MILLIS value of 750 (for example) means?

Would it not be simpler to use the 0.0-1.0 factor value in the config?

Ah, I went with millis initially to keep it consistent with the other timeout/delay configs (which are all in milliseconds), thinking it'd be more straightforward to work with absolute values.

But you're right - that doesn't really make sense here since the jitter factor gets applied to the computed delay, which changes with each retry (2s → 4s → 8s). So 750 would mean something different on each attempt, which is pretty confusing.

Changed it to CLOUD_API_RETRY_JITTER_FACTOR using the 0.0-1.0 range

The jitter factor applies to the computed exponential backoff delay, not a fixed millisecond value. Using CLOUD_API_RETRY_JITTER_FACTOR (0.0-1.0 range) is clearer and conceptually correct.

dimas-b

LGTM 👍 Thanks again @fivetran-rahulprakash !

I'll leave it open for some more time for other reviewers to comment (if there's interest)

fivetran-rahulprakash · 2025-11-28T10:17:42Z

LGTM 👍 Thanks again @fivetran-rahulprakash !

I'll leave it open for some more time for other reviewers to comment (if there's interest)

Thanks! Just following up - are we good to merge 🤔

flyrain

Thanks a lot for working on it, @fivetran-rahulprakash ! LGTM overall. Left some comments!

flyrain · 2025-12-01T06:34:53Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+                    LOGGER.warn(
+                        "Error fetching Azure access token for tenant {}: {}",
+                        tenantId,
+                        error.getMessage()))


Can we log the full stack trace instead of error.getMessage() only for better debuggability?

Done! Changed to LOGGER.warn("Error fetching Azure access token for tenant {}", tenantId, error) which will log the full exception with stack trace. 😄

flyrain · 2025-12-01T06:49:11Z

.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+                                retrySignal.totalRetries() + 1,
+                                retryCount))


Should we add 1 to retryCount? Sot it matches the word "attempt" here? The total attempt number would be 1 + retryCount:

int maxAttempts = retryCount + 1; ... LOGGER.info("Retrying Azure token fetch for tenant {} (attempt {}/{})", tenantId, retrySignal.totalRetries() + 1, maxAttempts);

flyrain · 2025-12-01T06:58:16Z

polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java

          .defaultValue(true)
          .buildFeatureConfiguration();
+
+  public static final FeatureConfiguration<Integer> CLOUD_API_TIMEOUT_MILLIS =


CLOUD_API_TIMEOUT_MILLIS may over-communicate its scope. Would a more precise name like STORAGE_API_TIMEOUT_MILLIS or even AZURE_STORAGE_API_TIMEOUT_MILLIS be clearer?

If s3/GCS/MinIO doesn't have similar configuration or cannot be applied with the same timeout, I'd suggest to have a prefix AZURE_ to make it clear.

Good call! I've changed it to STORAGE_API_* to be more specific.

While only Azure uses these configs right now, the timeout/retry pattern applies to other cloud storage providers (AWS S3, GCP) too - their SDKs have similar async patterns that would benefit from this. So keeping the naming generic makes it easier to reuse when we add resilience to those integrations later.

Updated the descriptions to clarify it's currently Azure only but designed for future reuse. Let me know if you'd prefer something different!

@fivetran-rahulprakash , thanks for the change! One small concern, different storage backends may eventually need different timeout/retry configs. We can always introduce a broader-scope config like STORAGE_API_TIMEOUT_MILLIS later if we find something truly common across all storage types, but evolving in the opposite direction is harder. Once a wide-scope config is adopted and used in different places, shrinking it down to a more specific scope becomes much more painful. WDYT?

@flyrain I've renamed the configs to use AZURE_* prefix. Thanks for the feedback!

- Rename CLOUD_API_* to STORAGE_API_* for clearer scope - Log full stack trace for better debuggability - Fix retry attempt logging to show correct total (retryCount + 1) - Clarify configs are currently Azure-only but designed for future cloud provider reuse

fivetran-rahulprakash · 2025-12-04T12:27:13Z

Thanks a lot for working on it, @fivetran-rahulprakash ! LGTM overall. Left some comments!

Thanks for the thorough review, @flyrain! I've addressed all your comments

Different storage backends may need different timeout/retry configs. Using AZURE_ prefix now makes it clearer and easier to add storage-specific configs later rather than trying to narrow a broad scope.

flyrain

+1. Thanks a lot for working on it, @fivetran-rahulprakash ! Thanks for the review, @dimas-b @jbonofre !

* Doc cleanup for local deployment (apache#3213) * Doc cleanup for admin tool (apache#3214) * Bump version from 1.0.0 to 1.2.0 and fix health port (apache#3211) * fix(deps): update dependency io.opentelemetry:opentelemetry-bom to v1.57.0 (apache#3223) * fix(deps): update dependency org.apache.commons:commons-text to v1.15.0 (apache#3233) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.3 (apache#3234) * Core: Add timeout and retry logic to Azure token fetch (apache#3113) * update markdown lint check (apache#3187) use tcort/github-action-markdown-link-check, gaurav-nelson/github-action-markdown-link-check is deprecated * NoSQL: Add metastore types and mappings (apache#3207) Add the NoSQL specific metastore persistence types including the mapping from and to `*Polaris*Entity`. * NoSQL/nit: fix javadoc for `Realms` (apache#3229) * Fix build issue for docker not found when using latest docker desktop (apache#3227) * fix(deps): update dependency org.mongodb:mongodb-driver-sync to v5.6.2 (apache#3238) * fix(deps): update immutables to v2.12.0 (apache#3240) * fix(deps): update dependency io.micrometer:micrometer-bom to v1.16.1 (apache#3239) * [Core, Bug] CreateEntitiesIfNotExist/CreatePrincipal not return the same entity persisted. (apache#3219) The PR fixes the issue, "CreateEntitiesIfNotExist/CreatePrincipal not return the same entity persisted", by letting persistEntity return the entity persisted and include that in the EntityResult. The PR also include new unit tests to verify the behavior * (feat) doc: Update Makefile to fix admonitions in helm doc and remove redundant sections (apache#3232) * Change org.testcontainers:<dep> to org.testcontainers:testcontainers-<dep> (apache#3225) * Helm: add support for topologySpreadConstraints (apache#3216) * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.23-6.1764764731 (apache#3241) * NoSQL correctness tests: add missing `logback-test.xml` files (apache#3230) * Add Docker-based Ceph + Polaris cluster setup (apache#3022) --------- Co-authored-by: sarunas.svegzda <ssv@infrahub.io> * Service: Remove *CommitTableEvent, Add *UpdateTableEvent to Transactions (apache#3195) * Update dependency pydantic to >=2.12.5,<2.13.0 (apache#2807) * fix(deps): update dependency com.github.dasniko:testcontainers-keycloak to v4.0.1 (apache#3244) * fix(deps): update mockito monorepo to v5.21.0 (apache#3245) * Allow retrieving a config directly from a `Map` (apache#3220) The current implementation deserializes the catalog configuration properties for each invocation of `getConfig*()` taking a `CatalogEntity`. This change adds another `getConfig*()` variant that takes a `Map` to allow call sites to memoize the properties, where possible. * Runtime/service: move getConfig() down to `IcebergCatalogHandler` (apache#3231) All catalog specific functionality is implemented in `IcebergCatalogHandler`, whereas `IcebergCatalogAdapter` is meant to act as a "REST wrapper" to it. This change moves the implementation of `getConfig` down to the handler, no functional changes. * chore(deps): update quay.io/ceph/ceph docker tag to v20 (apache#3242) * fix(deps): update quarkus platform and group to v3.30.3 (apache#3247) * NoSQL: Prepare for NoSQL tests (apache#3235) * Add an optional `bootstrapRealm()` implementation to `PolarisAuthzTestBase` * Allow extending `IcebergCatalogHandlerAuthzTest`, move tests to `AbstractIcebergCatalogHandlerAuthzTest` * No functional changes * Shell script to verify staged release candidate artifacts (apache#2824) Performs a bunch of verifications against a proposed (staged) release candidate using the new `tools/verify-release/verify-release.sh` script against Maven artifacts, main distributions and Helm chart. Checks: * GPG signature and checksum verifications * All expected artifacts are present * Build artifacts are reproducible (minus known exceptions) * jar files * Main distribution zip/tarball * Helm chart * Build passes. * DISCLAIMER/LICENSE/NOTICE files are present in artifacts that require those More information in the added web site page. Fixes apache#2822 --------- Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr> * Core: Add GCP service account impersonation for credentials. (apache#3246) * fix(deps): update dependency ch.qos.logback:logback-classic to v1.5.22 (apache#3253) * fix(deps): update dependency com.google.cloud:google-cloud-iamcredentials to v2.80.0 (apache#3254) * feat: pass principal name as part of aws subscoped credentials session (apache#3224) * feat: pass principal name as part of aws subscoped credentials session name * feat: resolve principal from CurrentIdentityAssociation * fix: handle principal injection for async tasks * add feature flag for principal name include * add changelog, address comments * handle null identity, refactor tests * Added user token to the PolarisPrincipal (apache#3236) * Added user token to the PolarisPrincipal * added redacted * Fix compilation failures in GcpCredentialsStorageIntegrationTest (apache#3257) * chore(deps): update github artifact actions (apache#3260) * chore(deps): update medyagh/setup-minikube action to v0.0.21 (apache#3264) * NoSQL: Metastore implementation (apache#3237) * Fix typo in nosql (apache#3263) * Corrected a typo in a key configuration parameter in the 1.2.0 release notes (apache#3262) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.8 (apache#3271) * Add NOTES.txt to Helm chart with installation instructions (apache#3173) * Add NOTES.txt to Helm chart with installation instructions Provides port-forward commands, health check endpoint, and log viewing for users after installation. * Fix helm unittest for GH action (apache#3279) * [doc]: Doc fix for CLI usage (apache#3215) * [doc]: Add doc for helm prod deployment (apache#3265) * chore(deps): update docker.io/prom/prometheus docker tag to v3.8.1 (apache#3282) * chore(deps): update dependency jupyterlab to v4.5.1 (apache#3275) * fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.61.0 (apache#3274) * chore(deps): update dependency mypy to >=1.19, <=1.19.1 (apache#3272) * Bump to 1.4.0-incubating-SNAPSHOT (apache#3181) * Bump to 1.4.0-incubating-SNAPSHOT * Update Python client version * Add exclude check note in the release guide (apache#3182) * Add exclude check note in the release guide * Update site/content/release-guide.md Co-authored-by: Robert Stupp <snazy@snazy.de> --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * docs(tools): Create the Tools Section in the Docs (apache#3189) * fix(deps): update dependency org.apache.logging.log4j:log4j-core to v2.25.3 (apache#3283) * Add Polaris Community Meeting 20251211 (apache#3284) * chore(deps): update dependency pre-commit to v4.5.1 (apache#3286) * fix(deps): update dependency com.google.cloud:google-cloud-iamcredentials to v2.81.0 (apache#3287) * ensure AddressResolver supports localhost even if ipv6 is disabled in sysctl but not /etc/hosts (apache#3285) * Migrate to Jackson mapper builder pattern (apache#3269) Mappers and factories are fully immutable objects in Jackson 3. This change is rather a no-op, but migrates the code to use the builder-pattern. This is only a little building-block for "real" Jackson 3 support, there's more to do and more that's required from other frameworks. * fix(deps): update quarkus platform and group to v3.30.4 (apache#3291) * Rework release guide to include workflows (apache#3273) * Add a release guides section * Rename current release guide to manual (deprecated) * Add new semi-automated release guide * Move release verification guide under release guides section * Add scss style for better screenshot separation * Add redirection from old pages to new ones Co-authored-by: Robert Stupp <snazy@snazy.de> * Site: Fix typos in release guide (apache#3296) * [chore]: Match openapi-generator-cli version in build system to dependency (apache#3266) * Fix openapi-generator-cli version in build system * Fix openapi-generator-cli version in build system * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-1 (apache#3297) * chore(deps): update dependency openapi-generator-cli to v7.17.0 (apache#3298) * chore(deps): update docker.io/mongo docker tag to v8.2.3 (apache#3299) * chore(deps): update mongo docker tag to v8.2.3 (apache#3300) * fix(deps): update dependency io.smallrye.config:smallrye-config-core to v3.15.0 (apache#3302) * fix(deps): update dependency org.apache.httpcomponents.client5:httpclient5 to v5.6 (apache#3301) * chore(deps): update plugin com.gradle.develocity to v4.3 (apache#3248) * Unify mongo image ref (apache#3303) To prevent duplicate version-bump PRs like apache#3299 and apache#3300 * fix(deps): update dependency org.testcontainers:testcontainers-bom to v2.0.3 (apache#3277) * Disable sectionPagesMenu (apache#3312) * Remove docker-java.properties (apache#3307) * Ensure release can only run from specific SHA (apache#3295) * Ensure release publish workflow can only run from last RC (apache#3290) * Enable use of second release workflow for RC>0 * Patch 3rd workflow to support commits with multiple RC tags * Force 4th workflow to only run from a release branch * Update release guide to match new workflows * fix(deps): update dependency ch.qos.logback:logback-classic to v1.5.23 (apache#3308) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.13 (apache#3309) * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2 (apache#3313) * NoSQL: reduce heap pressure when running tests Some tests generate a lot of realms, likely one realm per test case. While the amount of data per realm is not much, it is nontheless nice to remove that data immediately (for tests). The maintenance service, which purges data of eligible realms, cannot be run against the in-memory backend (different JVM). This change adds a rather "test only" workaround to purge the realm data in the in-memory backend immediately. * NoSQL: Metastore maintenance Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database. * NoSQL: Add to runtime-service * NoSQL: Add metastore-maintenance to admin tool * NoSQL: revert LICENSE file change * Last merged commit 62d774f --------- Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: fivetran-rahulprakash <127075959+fivetran-rahulprakash@users.noreply.github.com> Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com> Co-authored-by: Honah (Jonas) J. <honahx@apache.org> Co-authored-by: Šarūnas Švėgžda <39830362+sharas2050@users.noreply.github.com> Co-authored-by: sarunas.svegzda <ssv@infrahub.io> Co-authored-by: Adnan Hemani <adnan.h@berkeley.edu> Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr> Co-authored-by: Talat UYARER <talat@uyarer.com> Co-authored-by: Tornike Gurgenidze <tokoko96@gmail.com> Co-authored-by: cccs-cat001 <56204545+cccs-cat001@users.noreply.github.com> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: zgxme <u143@qq.com> Co-authored-by: Tamas Mate <50709850+tmater@users.noreply.github.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Adam Christian <105929021+adam-christian-software@users.noreply.github.com> Co-authored-by: Romain Manni-Bucau <rmannibucau@gmail.com>

RahulPrakash96 added 2 commits November 21, 2025 10:54

Merge latest upstream changes from apache/polaris main

b0a2714

github-project-automation bot added this to Basic Kanban Board Nov 21, 2025

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Nov 21, 2025

dimas-b reviewed Nov 21, 2025

View reviewed changes

fivetran-rahulprakash requested a review from dimas-b November 24, 2025 06:45

fivetran-rahulprakash commented Nov 24, 2025

View reviewed changes

dimas-b reviewed Nov 24, 2025

View reviewed changes

fivetran-rahulprakash requested a review from dimas-b November 25, 2025 07:43

dimas-b reviewed Nov 25, 2025

View reviewed changes

Change jitter from millis to factor (0.0-1.0) as suggested by @dimas-b

13f1c04

The jitter factor applies to the computed exponential backoff delay, not a fixed millisecond value. Using CLOUD_API_RETRY_JITTER_FACTOR (0.0-1.0 range) is clearer and conceptually correct.

fivetran-rahulprakash requested a review from dimas-b November 26, 2025 14:28

Apply spotless formatting

cf3f6c0

dimas-b previously approved these changes Nov 26, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Nov 26, 2025

jbonofre previously approved these changes Nov 27, 2025

View reviewed changes

flyrain reviewed Dec 1, 2025

View reviewed changes

fivetran-rahulprakash dismissed stale reviews from jbonofre and dimas-b via 95b1817 December 4, 2025 05:54

fivetran-rahulprakash requested review from dimas-b, flyrain and jbonofre December 4, 2025 06:40

Rename configs to AZURE_* prefix as suggested by flyrain

b70c772

Different storage backends may need different timeout/retry configs. Using AZURE_ prefix now makes it clearer and easier to add storage-specific configs later rather than trying to narrow a broad scope.

flyrain approved these changes Dec 8, 2025

View reviewed changes

flyrain merged commit 5b27055 into apache:main Dec 8, 2025
15 checks passed

github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Dec 8, 2025

Add timeout and retry logic to Azure token fetch #3113

Add timeout and retry logic to Azure token fetch #3113

Uh oh!

Conversation

fivetran-rahulprakash commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Benefits

Checklist

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fivetran-rahulprakash commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

dimas-b Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

fivetran-rahulprakash commented Nov 28, 2025

Uh oh!

flyrain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RahulPrakash96 Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fivetran-rahulprakash commented Dec 4, 2025

Uh oh!

flyrain left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

fivetran-rahulprakash commented Nov 21, 2025 •

edited

Loading

dimas-b Nov 25, 2025 •

edited

Loading

RahulPrakash96 Dec 8, 2025 •

edited

Loading

flyrain left a comment •

edited

Loading