Skip to content

[Preview] SemanticReranking: Adds Configurable Request Timeout#5804

Merged
NaluTripician merged 1 commit intomasterfrom
users/ntripician/inference-request-timeout
Apr 22, 2026
Merged

[Preview] SemanticReranking: Adds Configurable Request Timeout#5804
NaluTripician merged 1 commit intomasterfrom
users/ntripician/inference-request-timeout

Conversation

@NaluTripician
Copy link
Copy Markdown
Contributor

Summary

Adds a configurable per-request timeout for the Semantic Reranking inference service via a new CosmosClientOptions.InferenceRequestTimeout property (default 5s, based on backend P999 from the reranking team) and a matching CosmosClientBuilder.WithInferenceRequestTimeout fluent setter. On timeout, the request throws a CosmosException with status code 408 (RequestTimeout). User-cancelled tokens continue to propagate as OperationCanceledException unchanged.

This supersedes #5565. Closes #5564.

Scope (intentionally minimal)

Per discussion on #5565, since we are only adding a non-retriable timeout, we are not landing the larger refactor in this PR:

  • HttpTimeoutInferencePolicy
  • HttpTimeoutPolicyHelper
  • ❌ Routing Semantic Reranking through CosmosHttpClientCore

Those can be reopened from #5565 if/when we decide to add retries for the inference endpoint.

Addresses Kiran's review comment (120s → 35s)

In #5565, httpClient.Timeout was being reduced from 120s to 35s inside CreateClientHelper, which was flagged as an unintentional behavioral change. In this PR:

  • httpClient.Timeout in InferenceService.CreateClientHelper stays at 120s (unchanged vs. master).
  • The per-request budget is enforced exclusively via a linked CancellationTokenSource with CancelAfter(InferenceRequestTimeout) around HttpClient.SendAsync. The HttpClient-level timeout continues to act only as an outer ceiling.

Implementation

  • CosmosClientOptions.InferenceRequestTimeoutTimeSpan, default InferenceService.DefaultInferenceRequestTimeout (5s). Public under PREVIEW, otherwise internal.
  • CosmosClientBuilder.WithInferenceRequestTimeout(TimeSpan) — fluent setter, PREVIEW-gated.
  • InferenceService.SemanticRerankAsync — wraps httpClient.SendAsync in a CancellationTokenSource.CreateLinkedTokenSource(userToken) + linkedCts.CancelAfter(inferenceRequestTimeout). A catch (OperationCanceledException) when (!cancellationToken.IsCancellationRequested) converts the inference-side cancellation into CosmosExceptionFactory.CreateRequestTimeoutException (HTTP 408). User-cancelled tokens bypass the catch filter and propagate as OperationCanceledException/TaskCanceledException.

Tests

  • SemanticRerankAsync_RequestExceedsInferenceTimeout_Throws408CosmosException — 10s delayed handler + short timeout → expects CosmosException with StatusCode == RequestTimeout.
  • SemanticRerankAsync_UserCancellation_PropagatesOperationCanceledException — caller cancels their own token → expects OperationCanceledException (not wrapped as a 408).
  • All existing InferenceServiceTests still pass.

Contracts

  • Regenerated DotNetPreviewSDKAPI.net6.json via UpdateContracts.ps1. Only adds entries for InferenceRequestTimeout (get/set + property) on CosmosClientOptions and WithInferenceRequestTimeout on CosmosClientBuilder. No non-preview contract changes.

Supersedes #5565 with a minimal scope: only the non-retriable per-request timeout is added. The HttpTimeoutInferencePolicy / HttpTimeoutPolicyHelper / CosmosHttpClientCore refactor from the original PR is deferred until retries are actually needed.

Addresses Kiran's review comment (120s -> 35s): httpClient.Timeout in CreateClientHelper is left unchanged at 120s. The per-request timeout is enforced via a linked CancellationTokenSource with CancelAfter, so the HttpClient-level timeout is not reduced.

Changes:

- CosmosClientOptions.InferenceRequestTimeout (PREVIEW public, else internal; default 5s)

- CosmosClientBuilder.WithInferenceRequestTimeout fluent setter (PREVIEW)

- InferenceService wraps SendAsync with a linked CTS and converts the inference-timeout OperationCanceledException into a 408 CosmosException via CosmosExceptionFactory.CreateRequestTimeoutException. User-cancelled tokens propagate unchanged.

- Unit tests for both the timeout path and user-cancellation path.

- Regenerated DotNetPreviewSDKAPI.net6.json contract.

Closes #5564.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@NaluTripician NaluTripician self-assigned this Apr 22, 2026
@NaluTripician NaluTripician added the auto-merge Enables automation to merge PRs label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@ananth7592 ananth7592 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NaluTripician NaluTripician merged commit e824e60 into master Apr 22, 2026
33 checks passed
@NaluTripician NaluTripician deleted the users/ntripician/inference-request-timeout branch April 22, 2026 22:27
@NaluTripician NaluTripician mentioned this pull request Apr 24, 2026
4 tasks
microsoft-github-policy-service Bot pushed a commit that referenced this pull request Apr 25, 2026
## Release 3.59.0

### Version Changes
- ClientOfficialVersion: 3.58.0 → 3.59.0
- ClientPreviewVersion: 3.59.0 → 3.60.0
- ClientPreviewSuffixVersion: preview.0 → preview.0

### Changelog (3.59.0 GA)

#### Added
- [5579](#5579)
Change Feed Processor: Adds Lease container export support
- [5709](#5709)
Performance: Adds caching for URL-encoded AAD authorization signature
- [5731](#5731) DNS
dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes
ndots latency
- [5755](#5755)
Exceptionless: Adds enabling exception less 400 status code
- [5756](#5756)
Exceptionless: Adds enabling exception less 404/1002 status code
- [5757](#5757)
Exceptionless: Adds enabling exception less 403
- [5779](#5779)
Direct: Adds Direct package version bump to 3.42.4
- [5786](#5786)
Region Availability: Adds missing regions from Direct 3.42.4
- [5788](#5788)
Socket Handler: Adds HTTP/2 PING keep-alive to detect broken connections
in pool

#### Fixed
- [5553](#5553)
NativeDLLs: Fixes Conditionally include win-x64 native DLLs based on
RuntimeIdentifier
- [5588](#5588)
LINQ: Fixes memory leak from Expression.Compile() in all call sites
- [5617](#5617)
ChangeFeedProcessor: Fixes first-change skip during initial startup by
anchoring StartTime
- [5636](#5636)
CosmosClientBuilder: Fixes self-referencing loop in
GetSerializedConfiguration with STJ TypeInfoResolver
- [5748](#5748)
Routing: Fixes GetOverlappingRanges CPU overhead from repeated JSON
deserialization
- [5807](#5807)
ChangeFeedProcessor: Fixes lease de-duplication for
/partitionKey-partitioned lease containers

### Changelog (3.60.0-preview.0)

#### Added
- [5804](#5804)
SemanticReranking: Adds Configurable Request Timeout

#### Fixed
- [5783](#5783)
Container: Fixes SemanticRerankAsync TypeLoadException in derived
classes

### API Contract Diff (GA)
```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt"
index 1b74a69..6fa9352 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt"
@@ -60,6 +60,7 @@ namespace Microsoft.Azure.Cosmos
         public ChangeFeedProcessor Build();
         public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate);
         public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer();
+        public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState);
         public ChangeFeedProcessorBuilder WithInstanceName(string instanceName);
         public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate);
         public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>));
@@ -956,6 +957,7 @@ namespace Microsoft.Azure.Cosmos
         public const string NorwayWest = "Norway West";
         public const string PolandCentral = "Poland Central";
         public const string QatarCentral = "Qatar Central";
+        public const string SaudiArabiaEast = "Saudi Arabia East";
         public const string SingaporeCentral = "Singapore Central";
         public const string SingaporeNorth = "Singapore North";
         public const string SouthAfricaNorth = "South Africa North";
@@ -963,6 +965,7 @@ namespace Microsoft.Azure.Cosmos
         public const string SouthCentralUS = "South Central US";
         public const string SouthCentralUS2 = "South Central US 2";
         public const string SoutheastAsia = "Southeast Asia";
+        public const string SoutheastAsia3 = "Southeast Asia 3";
         public const string SoutheastUS = "Southeast US";
         public const string SoutheastUS3 = "Southeast US 3";
         public const string SoutheastUS5 = "Southeast US 5";
@@ -990,6 +993,7 @@ namespace Microsoft.Azure.Cosmos
         public const string USSecWest = "USSec West";
         public const string USSecWestCentral = "USSec West Central";
         public const string WestCentralUS = "West Central US";
+        public const string WestCentralUSFRE = "West Central US FRE";
         public const string WestEurope = "West Europe";
         public const string WestIndia = "West India";
         public const string WestUS = "West US";

```

### API Contract Diff (Preview)
```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt"
index 1ae52c0..58df10f 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt"
@@ -91,6 +91,7 @@ namespace Microsoft.Azure.Cosmos
         public ChangeFeedProcessor Build();
         public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate);
         public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer();
+        public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState);
         public ChangeFeedProcessorBuilder WithInstanceName(string instanceName);
         public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate);
         public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>));
@@ -302,7 +303,7 @@ namespace Microsoft.Azure.Cosmos
         public abstract Task<ResponseMessage> ReplaceItemStreamAsync(Stream streamPayload, string id, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ThroughputResponse> ReplaceThroughputAsync(ThroughputProperties throughputProperties, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ThroughputResponse> ReplaceThroughputAsync(int throughput, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
-        public abstract Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken));
+        public virtual Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ItemResponse<T>> UpsertItemAsync<T>(T item, Nullable<PartitionKey> partitionKey=default(Nullable<PartitionKey>), ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ResponseMessage> UpsertItemStreamAsync(Stream streamPayload, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public delegate Task ChangeFeedHandlerWithManualCheckpoint<T>(ChangeFeedProcessorContext context, IReadOnlyCollection<T> changes, Func<Task> checkpointAsync, CancellationToken cancellationToken);
@@ -407,6 +408,7 @@ namespace Microsoft.Azure.Cosmos
         public int GatewayModeMaxConnectionLimit { get; set; }
         public Func<HttpClient> HttpClientFactory { get; set; }
         public Nullable<TimeSpan> IdleTcpConnectionTimeout { get; set; }
+        public TimeSpan InferenceRequestTimeout { get; set; }
         public bool LimitToEndpoint { get; set; }
         public Nullable<int> MaxRequestsPerTcpConnection { get; set; }
         public Nullable<int> MaxRetryAttemptsOnRateLimitedRequests { get; set; }
@@ -1092,6 +1094,7 @@ namespace Microsoft.Azure.Cosmos
         public const string NorwayWest = "Norway West";
         public const string PolandCentral = "Poland Central";
         public const string QatarCentral = "Qatar Central";
+        public const string SaudiArabiaEast = "Saudi Arabia East";
         public const string SingaporeCentral = "Singapore Central";
         public const string SingaporeNorth = "Singapore North";
         public const string SouthAfricaNorth = "South Africa North";
@@ -1099,6 +1102,7 @@ namespace Microsoft.Azure.Cosmos
         public const string SouthCentralUS = "South Central US";
         public const string SouthCentralUS2 = "South Central US 2";
         public const string SoutheastAsia = "Southeast Asia";
+        public const string SoutheastAsia3 = "Southeast Asia 3";
         public const string SoutheastUS = "Southeast US";
         public const string SoutheastUS3 = "Southeast US 3";
         public const string SoutheastUS5 = "Southeast US 5";
@@ -1126,6 +1130,7 @@ namespace Microsoft.Azure.Cosmos
         public const string USSecWest = "USSec West";
         public const string USSecWestCentral = "USSec West Central";
         public const string WestCentralUS = "West Central US";
+        public const string WestCentralUSFRE = "West Central US FRE";
         public const string WestEurope = "West Europe";
         public const string WestIndia = "West India";
         public const string WestUS = "West US";
@@ -1504,6 +1509,7 @@ namespace Microsoft.Azure.Cosmos.Fluent
         public CosmosClientBuilder WithEnableRemoteRegionPreferredForSessionRetry(bool enableRemoteRegionPreferredForSessionRetry);
         public CosmosClientBuilder WithFaultInjection(IFaultInjector faultInjector);
         public CosmosClientBuilder WithHttpClientFactory(Func<HttpClient> httpClientFactory);
+        public CosmosClientBuilder WithInferenceRequestTimeout(TimeSpan inferenceRequestTimeout);
         public CosmosClientBuilder WithLimitToEndpoint(bool limitToEndpoint);
         public CosmosClientBuilder WithPriorityLevel(PriorityLevel priorityLevel);
         public CosmosClientBuilder WithReadConsistencyStrategy(ReadConsistencyStrategy readConsistencyStrategy);

```

### Checklist
- [ ] Changelog entries reviewed by team
- [ ] API contract diff reviewed by Kiran and Kirill
- [ ] Preview APIs reviewed (email sent to
azurecosmossdkdotnet@microsoft.com)
- [ ] Kiran sign-off obtained

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Semantic Reranking: Request Timeouts

3 participants