Skip to content

DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency#5731

Merged
kirankumarkolli merged 4 commits intomasterfrom
users/kirankk/copilot-5730-feature-dns-dot-suffix
Apr 8, 2026
Merged

DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency#5731
kirankumarkolli merged 4 commits intomasterfrom
users/kirankk/copilot-5730-feature-dns-dot-suffix

Conversation

@kirankumarkolli
Copy link
Copy Markdown
Member

@kirankumarkolli kirankumarkolli commented Apr 3, 2026

Feature: Enable DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency

Description

Fixes #5730

🤖 This PR was authored by GitHub Copilot as part of an automated issue triage and resolution workflow.

On Kubernetes with ndots:5, Cosmos DB endpoints like myaccount.documents.azure.com (only 3 dots) trigger multiple failed DNS search-domain expansions before the absolute lookup succeeds. This adds 50-200ms latency per DNS resolution in Direct mode (TCP/RNTBD).

This fix leverages the existing dnsResolutionFunction injection point on StoreClientFactory to append a trailing dot (.) to hostnames at DNS resolution time, making them fully qualified (FQDN). This signals the DNS resolver to skip search-domain expansion entirely.


Issue Summary

Property Value
Issue #5730
Area DirectMode / Transport
SDK Version (reported) v3 (.NET)
Severity P3

Root Cause Analysis

Code Path

DocumentClient.cs:6761 - InitializeDirectConnectivity()
  └─> new StoreClientFactory(...) — dnsResolutionFunction defaults to null
      └─> Connection.cs:540 - this.dnsResolutionFunction(this.serverUri.DnsSafeHost)
          └─> Connection.cs:1016 - Dns.GetHostAddressesAsync(hostName) — bare hostname, no dot

Root Cause

Connection.ResolveHostAsync() calls Dns.GetHostAddressesAsync(hostName) with bare hostnames (e.g., myaccount.documents.azure.com). On Kubernetes with ndots:5, hostnames with fewer than 5 dots trigger search-domain expansion: the resolver first tries myaccount.documents.azure.com.default.svc.cluster.local, then .svc.cluster.local, etc — all failing — before the absolute lookup. This is not a regression; it has always worked this way, but the latency only manifests in Kubernetes environments.


Changes Made

Files Modified

File Change
ConfigurationManager.cs Added AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED env var constant + IsDnsDotSuffixEnabled() accessor
DnsDotSuffixHelper.cs (new) ToFqdnHostName() appends trailing dot (skips IPs/nulls/already-dotted). CreateDnsResolutionFunction() factory for the StoreClientFactory lambda
DocumentClient.cs Wired dnsResolutionFunction: into StoreClientFactory constructor, conditionally enabled via env var
DnsDotSuffixHelperTests.cs (new) 10 unit tests covering all edge cases

Code Changes

  • ConfigurationManager.cs: Follows existing env var pattern (internal static readonly string + public accessor). Added after last existing constant.
  • DnsDotSuffixHelper.cs: New internal static utility. ToFqdnHostName() uses IPAddress.TryParse to skip IPs, checks EndsWith(".") for idempotency. CreateDnsResolutionFunction() wraps Dns.GetHostAddressesAsync with dot-suffix logic.
  • DocumentClient.cs: Uses named parameter dnsResolutionFunction: in existing StoreClientFactory constructor call. Passes null when disabled (preserving default behavior).

Generated Output (Before/After)

Before (bare hostname — triggers ndots expansion):

DNS query: myaccount.documents.azure.com
  → Try: myaccount.documents.azure.com.default.svc.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.svc.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.cluster.local (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com.internal (NXDOMAIN ~10ms)
  → Try: myaccount.documents.azure.com (SUCCESS ~5ms)
Total: ~45ms

After (FQDN with trailing dot — single query):

DNS query: myaccount.documents.azure.com.
  → Try: myaccount.documents.azure.com. (SUCCESS ~5ms)
Total: ~5ms

Testing

Test Results

Test Suite Total Passed Failed
DnsDotSuffixHelperTests 10 10 0
Build - -

New Tests Added

  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDot - Standard Cosmos DB endpoint
  • DnsDotSuffixHelperTests.ToFqdnHostName_IdempotentWhenAlreadyDotSuffixed - Already FQDN
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv4Address - IPv4 passthrough
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv6Address - IPv6 loopback
  • DnsDotSuffixHelperTests.ToFqdnHostName_SkipsIPv6FullAddress - Full IPv6
  • DnsDotSuffixHelperTests.ToFqdnHostName_ReturnsNullForNull - Null guard
  • DnsDotSuffixHelperTests.ToFqdnHostName_ReturnsEmptyForEmpty - Empty guard
  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDotToLocalhost - Single-label host
  • DnsDotSuffixHelperTests.ToFqdnHostName_AppendsTrailingDotToSingleLabel - Short hostname
  • DnsDotSuffixHelperTests.CreateDnsResolutionFunction_ReturnsNonNullFunction - Factory sanity

Breaking Changes

None. Opt-in via environment variable. Default behavior unchanged.


External References


Checklist

  • Code follows project conventions
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated (XML doc comments)
  • New tests added for the fix
  • All existing tests pass
  • Remote CI gates pass (Section 7.4)

Generated by GitHub Copilot CLI Agent

…ency

Adds opt-in DNS dot-suffix (FQDN trailing dot) for Direct/TCP connections
to avoid Kubernetes ndots:5 search-domain expansion latency.

- Add AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED environment variable
- Add DnsDotSuffixHelper with ToFqdnHostName() and CreateDnsResolutionFunction()
- Wire dnsResolutionFunction into StoreClientFactory via DocumentClient
- Add unit tests for DnsDotSuffixHelper

Fixes #5730

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

Comment thread Microsoft.Azure.Cosmos/src/Util/ConfigurationManager.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/Util/ConfigurationManager.cs Outdated
- Rename AZURE_COSMOS_DNS_DOT_SUFFIX_ENABLED  AZURE_COSMOS_TCP_DNS_DOT_SUFFIX_ENABLED
- Rename DnsDotSuffixEnabled  TcpDnsDotSuffixEnabled
- Rename IsDnsDotSuffixEnabled()  IsTcpDnsDotSuffixEnabled()
- Restore original line endings in IsLengthAwareRangeComparatorEnabled()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kirankumarkolli kirankumarkolli changed the title Feature: Enable DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency Apr 3, 2026
@kirankumarkolli kirankumarkolli marked this pull request as ready for review April 3, 2026 18:22
Comment thread Microsoft.Azure.Cosmos/src/Util/DnsDotSuffixHelper.cs Outdated
Copy link
Copy Markdown
Member

@kundadebdatta kundadebdatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kirankumarkolli kirankumarkolli merged commit d1393f1 into master Apr 8, 2026
32 checks passed
@kirankumarkolli kirankumarkolli deleted the users/kirankk/copilot-5730-feature-dns-dot-suffix branch April 8, 2026 04:59
@NaluTripician NaluTripician mentioned this pull request Apr 24, 2026
4 tasks
microsoft-github-policy-service Bot pushed a commit that referenced this pull request Apr 25, 2026
## Release 3.59.0

### Version Changes
- ClientOfficialVersion: 3.58.0 → 3.59.0
- ClientPreviewVersion: 3.59.0 → 3.60.0
- ClientPreviewSuffixVersion: preview.0 → preview.0

### Changelog (3.59.0 GA)

#### Added
- [5579](#5579)
Change Feed Processor: Adds Lease container export support
- [5709](#5709)
Performance: Adds caching for URL-encoded AAD authorization signature
- [5731](#5731) DNS
dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes
ndots latency
- [5755](#5755)
Exceptionless: Adds enabling exception less 400 status code
- [5756](#5756)
Exceptionless: Adds enabling exception less 404/1002 status code
- [5757](#5757)
Exceptionless: Adds enabling exception less 403
- [5779](#5779)
Direct: Adds Direct package version bump to 3.42.4
- [5786](#5786)
Region Availability: Adds missing regions from Direct 3.42.4
- [5788](#5788)
Socket Handler: Adds HTTP/2 PING keep-alive to detect broken connections
in pool

#### Fixed
- [5553](#5553)
NativeDLLs: Fixes Conditionally include win-x64 native DLLs based on
RuntimeIdentifier
- [5588](#5588)
LINQ: Fixes memory leak from Expression.Compile() in all call sites
- [5617](#5617)
ChangeFeedProcessor: Fixes first-change skip during initial startup by
anchoring StartTime
- [5636](#5636)
CosmosClientBuilder: Fixes self-referencing loop in
GetSerializedConfiguration with STJ TypeInfoResolver
- [5748](#5748)
Routing: Fixes GetOverlappingRanges CPU overhead from repeated JSON
deserialization
- [5807](#5807)
ChangeFeedProcessor: Fixes lease de-duplication for
/partitionKey-partitioned lease containers

### Changelog (3.60.0-preview.0)

#### Added
- [5804](#5804)
SemanticReranking: Adds Configurable Request Timeout

#### Fixed
- [5783](#5783)
Container: Fixes SemanticRerankAsync TypeLoadException in derived
classes

### API Contract Diff (GA)
```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt"
index 1b74a69..6fa9352 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt"
@@ -60,6 +60,7 @@ namespace Microsoft.Azure.Cosmos
         public ChangeFeedProcessor Build();
         public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate);
         public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer();
+        public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState);
         public ChangeFeedProcessorBuilder WithInstanceName(string instanceName);
         public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate);
         public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>));
@@ -956,6 +957,7 @@ namespace Microsoft.Azure.Cosmos
         public const string NorwayWest = "Norway West";
         public const string PolandCentral = "Poland Central";
         public const string QatarCentral = "Qatar Central";
+        public const string SaudiArabiaEast = "Saudi Arabia East";
         public const string SingaporeCentral = "Singapore Central";
         public const string SingaporeNorth = "Singapore North";
         public const string SouthAfricaNorth = "South Africa North";
@@ -963,6 +965,7 @@ namespace Microsoft.Azure.Cosmos
         public const string SouthCentralUS = "South Central US";
         public const string SouthCentralUS2 = "South Central US 2";
         public const string SoutheastAsia = "Southeast Asia";
+        public const string SoutheastAsia3 = "Southeast Asia 3";
         public const string SoutheastUS = "Southeast US";
         public const string SoutheastUS3 = "Southeast US 3";
         public const string SoutheastUS5 = "Southeast US 5";
@@ -990,6 +993,7 @@ namespace Microsoft.Azure.Cosmos
         public const string USSecWest = "USSec West";
         public const string USSecWestCentral = "USSec West Central";
         public const string WestCentralUS = "West Central US";
+        public const string WestCentralUSFRE = "West Central US FRE";
         public const string WestEurope = "West Europe";
         public const string WestIndia = "West India";
         public const string WestUS = "West US";

```

### API Contract Diff (Preview)
```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt"
index 1ae52c0..58df10f 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt"
@@ -91,6 +91,7 @@ namespace Microsoft.Azure.Cosmos
         public ChangeFeedProcessor Build();
         public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate);
         public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer();
+        public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState);
         public ChangeFeedProcessorBuilder WithInstanceName(string instanceName);
         public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate);
         public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>));
@@ -302,7 +303,7 @@ namespace Microsoft.Azure.Cosmos
         public abstract Task<ResponseMessage> ReplaceItemStreamAsync(Stream streamPayload, string id, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ThroughputResponse> ReplaceThroughputAsync(ThroughputProperties throughputProperties, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ThroughputResponse> ReplaceThroughputAsync(int throughput, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
-        public abstract Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken));
+        public virtual Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ItemResponse<T>> UpsertItemAsync<T>(T item, Nullable<PartitionKey> partitionKey=default(Nullable<PartitionKey>), ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public abstract Task<ResponseMessage> UpsertItemStreamAsync(Stream streamPayload, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken));
         public delegate Task ChangeFeedHandlerWithManualCheckpoint<T>(ChangeFeedProcessorContext context, IReadOnlyCollection<T> changes, Func<Task> checkpointAsync, CancellationToken cancellationToken);
@@ -407,6 +408,7 @@ namespace Microsoft.Azure.Cosmos
         public int GatewayModeMaxConnectionLimit { get; set; }
         public Func<HttpClient> HttpClientFactory { get; set; }
         public Nullable<TimeSpan> IdleTcpConnectionTimeout { get; set; }
+        public TimeSpan InferenceRequestTimeout { get; set; }
         public bool LimitToEndpoint { get; set; }
         public Nullable<int> MaxRequestsPerTcpConnection { get; set; }
         public Nullable<int> MaxRetryAttemptsOnRateLimitedRequests { get; set; }
@@ -1092,6 +1094,7 @@ namespace Microsoft.Azure.Cosmos
         public const string NorwayWest = "Norway West";
         public const string PolandCentral = "Poland Central";
         public const string QatarCentral = "Qatar Central";
+        public const string SaudiArabiaEast = "Saudi Arabia East";
         public const string SingaporeCentral = "Singapore Central";
         public const string SingaporeNorth = "Singapore North";
         public const string SouthAfricaNorth = "South Africa North";
@@ -1099,6 +1102,7 @@ namespace Microsoft.Azure.Cosmos
         public const string SouthCentralUS = "South Central US";
         public const string SouthCentralUS2 = "South Central US 2";
         public const string SoutheastAsia = "Southeast Asia";
+        public const string SoutheastAsia3 = "Southeast Asia 3";
         public const string SoutheastUS = "Southeast US";
         public const string SoutheastUS3 = "Southeast US 3";
         public const string SoutheastUS5 = "Southeast US 5";
@@ -1126,6 +1130,7 @@ namespace Microsoft.Azure.Cosmos
         public const string USSecWest = "USSec West";
         public const string USSecWestCentral = "USSec West Central";
         public const string WestCentralUS = "West Central US";
+        public const string WestCentralUSFRE = "West Central US FRE";
         public const string WestEurope = "West Europe";
         public const string WestIndia = "West India";
         public const string WestUS = "West US";
@@ -1504,6 +1509,7 @@ namespace Microsoft.Azure.Cosmos.Fluent
         public CosmosClientBuilder WithEnableRemoteRegionPreferredForSessionRetry(bool enableRemoteRegionPreferredForSessionRetry);
         public CosmosClientBuilder WithFaultInjection(IFaultInjector faultInjector);
         public CosmosClientBuilder WithHttpClientFactory(Func<HttpClient> httpClientFactory);
+        public CosmosClientBuilder WithInferenceRequestTimeout(TimeSpan inferenceRequestTimeout);
         public CosmosClientBuilder WithLimitToEndpoint(bool limitToEndpoint);
         public CosmosClientBuilder WithPriorityLevel(PriorityLevel priorityLevel);
         public CosmosClientBuilder WithReadConsistencyStrategy(ReadConsistencyStrategy readConsistencyStrategy);

```

### Checklist
- [ ] Changelog entries reviewed by team
- [ ] API contract diff reviewed by Kiran and Kirill
- [ ] Preview APIs reviewed (email sent to
azurecosmossdkdotnet@microsoft.com)
- [ ] Kiran sign-off obtained

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cosmos SDK: Enable FQDN DNS Resolution to Avoid Kubernetes ndots Latency

2 participants