LINQ: Fixes memory leak from Expression.Compile() in all call sites#5588
Conversation
Uses preferInterpretation: true on .NET 6+ to avoid generating JIT-compiled DynamicMethods that persist in native memory and cause unbounded growth. Benchmark results show 25x performance improvement (101ms → 4ms for 1000 iterations) which validates that IL emission is being skipped. Changes: - SubtreeEvaluator.cs: Use Compile(preferInterpretation: true) on .NET 6+ - SubtreeEvaluatorMemoryBenchmark.cs: Add benchmark tests to validate fix Fixes #5487
1457e49 to
6c3f73a
Compare
Address review comment: All benchmark tests should be in Benchmark project. - Deleted: Microsoft.Azure.Cosmos.Tests/Linq/SubtreeEvaluatorMemoryBenchmark.cs - Added: Microsoft.Azure.Cosmos.Performance.Tests/Linq/SubtreeEvaluatorBenchmark.cs Converted from MSTest to BenchmarkDotNet format.
Rewrites SubtreeEvaluatorBenchmark to call actual SubtreeEvaluator.Evaluate() for the fixed code path, while the baseline duplicates the old Compile() behavior. This addresses the review comment to not duplicate fix code in benchmarks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NaluTripician
left a comment
There was a problem hiding this comment.
LGTM. I also had copilot run an analysis on this PR and it said that this same pattern appears in 3 other places (Utilities.cs, DocumentQueryEvaluator.cs, and GeometrySqlExpressionFactory.cs). Should we also investigate those instances and see if the same leak pattern applies there?
kirankumarkolli
left a comment
There was a problem hiding this comment.
Great catch @NaluTripician! I investigated all Expression.Compile() call sites in the SDK:
| File | Line(s) | Leak Risk | Priority |
|---|---|---|---|
Linq/Utilities.cs (ExpressionSimplifier) |
104-106 | ✅ Same leak — called by ConstantFolding per query |
High |
Linq/DocumentQueryEvaluator.cs |
112, 122 | ✅ Same leak — raw SQL transform path | Moderate |
Linq/GeometrySqlExpressionFactory.cs |
45-47 | ✅ Same leak — spatial queries | Moderate |
HttpClient/HttpConnectionTracker.cs |
88 | ❌ One-time setup | N/A |
Created follow-up issue #5702 to track applying the same Compile(preferInterpretation: true) fix to all 3 remaining sites. Utilities.cs is the highest priority since ConstantFolding is in the hot path for every LINQ query.
…le() sites Extract shared ExpressionCompileHelper with runtime reflection to use Compile(preferInterpretation: true) on .NET 6+, avoiding DynamicMethod IL emission that causes native memory growth in long-running services. Applied to: - SubtreeEvaluator.cs (was already fixed in #5588, now uses shared helper) - Utilities.cs (ExpressionSimplifier - called by ConstantFolding per query) - DocumentQueryEvaluator.cs (raw SQL transform path) - GeometrySqlExpressionFactory.cs (spatial query evaluation) Fixes #5702 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AI Code Review SummaryPR: LINQ: Fixes memory leak from Expression.Compile() in SubtreeEvaluator Assessment: Good fix for a real production memory leak. The runtime reflection approach is the correct design choice given the 5 recommendations posted as inline comments — focused on eliminating per-call reflection overhead, improving benchmark coverage, adding a changelog entry, and making the pattern reusable for the 3 other |
|
🟡 Recommendation · Process: Missing CHANGELOG Entry Add a changelog entry for this customer-facing bug fix This PR fixes a production memory leak reported by users in issue #5487 (unbounded native memory growth in long-running services). Users experiencing this issue need to know which SDK version contains the fix so they can upgrade. The - [5588](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5588) LINQ: Fixes memory leak from Expression.Compile() in SubtreeEvaluator |
…le() sites Extract shared ExpressionCompileHelper with runtime reflection to use Compile(preferInterpretation: true) on .NET 6+, avoiding DynamicMethod IL emission that causes native memory growth in long-running services. Applied to: - SubtreeEvaluator.cs (was already fixed in #5588, now uses shared helper) - Utilities.cs (ExpressionSimplifier - called by ConstantFolding per query) - DocumentQueryEvaluator.cs (raw SQL transform path) - GeometrySqlExpressionFactory.cs (spatial query evaluation) Fixes #5702 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merges the remaining Expression.Compile() call site fixes from PR #5703 (issue #5702) into the original fix PR #5588 (issue #5487). SubtreeEvaluator now uses the shared ExpressionCompileHelper instead of its own private implementation, reducing duplication. Fixes #5487 Fixes #5702 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Read AZURE_COSMOS_LINQ_EXPRESSION_INTERPRETATION_ENABLED once during class initialization into a static readonly bool. Eliminates repeated ConfigurationManager.GetEnvironmentVariable calls on every CompileLambda invocation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@sdkReviewAgent |
|
sdkReviewAgent | Status: ⏳ Queued Review requested by @xinlian12. I'll start shortly. |
|
sdkReviewAgent | Status: 🔍 Reviewing I'm reviewing this PR now. I'll post my findings as comments when done. |
Ofekw
left a comment
There was a problem hiding this comment.
Looks good to me!
Clever way to get around the limitation of not being able to use the conditional compilation!
- Add CompileLambda<TDelegate>(Expression<TDelegate>) generic overload that returns TDelegate directly without reflection - Update GeometrySqlExpressionFactory and Utilities to use generic overload, removing explicit casts - Add [MemoryDiagnoser] and memory growth benchmarks to demonstrate native memory leak (old path) vs stable memory (interpretation fix) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nchmarks - Add CompileLambda<TDelegate>(Expression<TDelegate>) generic overload that returns TDelegate directly without reflection - Update GeometrySqlExpressionFactory and Utilities to use generic overload, removing explicit casts - Rewrite benchmarks as realistic E2E LINQ-to-SQL query generation through SqlTranslator.TranslateExpression pipeline - Add [MemoryDiagnoser] and memory growth benchmarks comparing old Compile() path vs interpretation fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d6bb1ae to
72b8a02
Compare
…retation: true) - Delete ExpressionCompileHelper.cs and env var feature flag - Inline Compile(preferInterpretation: true) directly at all 4 call sites - Simplify benchmarks to compare Compile() vs Compile(true) directly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Scans all .cs files under src/Linq/ and fails if any bare .Compile() is found without preferInterpretation: true. Prevents reintroduction of the native memory leak from DynamicMethod IL emission (#5487). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
## Release 3.59.0 ### Version Changes - ClientOfficialVersion: 3.58.0 → 3.59.0 - ClientPreviewVersion: 3.59.0 → 3.60.0 - ClientPreviewSuffixVersion: preview.0 → preview.0 ### Changelog (3.59.0 GA) #### Added - [5579](#5579) Change Feed Processor: Adds Lease container export support - [5709](#5709) Performance: Adds caching for URL-encoded AAD authorization signature - [5731](#5731) DNS dot-suffix: Adds TCP DNS dot-suffix for Direct mode to avoid Kubernetes ndots latency - [5755](#5755) Exceptionless: Adds enabling exception less 400 status code - [5756](#5756) Exceptionless: Adds enabling exception less 404/1002 status code - [5757](#5757) Exceptionless: Adds enabling exception less 403 - [5779](#5779) Direct: Adds Direct package version bump to 3.42.4 - [5786](#5786) Region Availability: Adds missing regions from Direct 3.42.4 - [5788](#5788) Socket Handler: Adds HTTP/2 PING keep-alive to detect broken connections in pool #### Fixed - [5553](#5553) NativeDLLs: Fixes Conditionally include win-x64 native DLLs based on RuntimeIdentifier - [5588](#5588) LINQ: Fixes memory leak from Expression.Compile() in all call sites - [5617](#5617) ChangeFeedProcessor: Fixes first-change skip during initial startup by anchoring StartTime - [5636](#5636) CosmosClientBuilder: Fixes self-referencing loop in GetSerializedConfiguration with STJ TypeInfoResolver - [5748](#5748) Routing: Fixes GetOverlappingRanges CPU overhead from repeated JSON deserialization - [5807](#5807) ChangeFeedProcessor: Fixes lease de-duplication for /partitionKey-partitioned lease containers ### Changelog (3.60.0-preview.0) #### Added - [5804](#5804) SemanticReranking: Adds Configurable Request Timeout #### Fixed - [5783](#5783) Container: Fixes SemanticRerankAsync TypeLoadException in derived classes ### API Contract Diff (GA) ```diff diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt" index 1b74a69..6fa9352 100644 --- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt" +++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0.txt" @@ -60,6 +60,7 @@ namespace Microsoft.Azure.Cosmos public ChangeFeedProcessor Build(); public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate); public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(); + public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState); public ChangeFeedProcessorBuilder WithInstanceName(string instanceName); public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate); public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>)); @@ -956,6 +957,7 @@ namespace Microsoft.Azure.Cosmos public const string NorwayWest = "Norway West"; public const string PolandCentral = "Poland Central"; public const string QatarCentral = "Qatar Central"; + public const string SaudiArabiaEast = "Saudi Arabia East"; public const string SingaporeCentral = "Singapore Central"; public const string SingaporeNorth = "Singapore North"; public const string SouthAfricaNorth = "South Africa North"; @@ -963,6 +965,7 @@ namespace Microsoft.Azure.Cosmos public const string SouthCentralUS = "South Central US"; public const string SouthCentralUS2 = "South Central US 2"; public const string SoutheastAsia = "Southeast Asia"; + public const string SoutheastAsia3 = "Southeast Asia 3"; public const string SoutheastUS = "Southeast US"; public const string SoutheastUS3 = "Southeast US 3"; public const string SoutheastUS5 = "Southeast US 5"; @@ -990,6 +993,7 @@ namespace Microsoft.Azure.Cosmos public const string USSecWest = "USSec West"; public const string USSecWestCentral = "USSec West Central"; public const string WestCentralUS = "West Central US"; + public const string WestCentralUSFRE = "West Central US FRE"; public const string WestEurope = "West Europe"; public const string WestIndia = "West India"; public const string WestUS = "West US"; ``` ### API Contract Diff (Preview) ```diff diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt" index 1ae52c0..58df10f 100644 --- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt" +++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.60.0-preview.0.txt" @@ -91,6 +91,7 @@ namespace Microsoft.Azure.Cosmos public ChangeFeedProcessor Build(); public ChangeFeedProcessorBuilder WithErrorNotification(Container.ChangeFeedMonitorErrorDelegate errorDelegate); public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(); + public virtual ChangeFeedProcessorBuilder WithInMemoryLeaseContainer(MemoryStream leaseState); public ChangeFeedProcessorBuilder WithInstanceName(string instanceName); public ChangeFeedProcessorBuilder WithLeaseAcquireNotification(Container.ChangeFeedMonitorLeaseAcquireDelegate acquireDelegate); public ChangeFeedProcessorBuilder WithLeaseConfiguration(Nullable<TimeSpan> acquireInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> expirationInterval=default(Nullable<TimeSpan>), Nullable<TimeSpan> renewInterval=default(Nullable<TimeSpan>)); @@ -302,7 +303,7 @@ namespace Microsoft.Azure.Cosmos public abstract Task<ResponseMessage> ReplaceItemStreamAsync(Stream streamPayload, string id, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken)); public abstract Task<ThroughputResponse> ReplaceThroughputAsync(ThroughputProperties throughputProperties, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken)); public abstract Task<ThroughputResponse> ReplaceThroughputAsync(int throughput, RequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken)); - public abstract Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken)); + public virtual Task<SemanticRerankResult> SemanticRerankAsync(string rerankContext, IEnumerable<string> documents, IDictionary<string, object> options=null, CancellationToken cancellationToken=default(CancellationToken)); public abstract Task<ItemResponse<T>> UpsertItemAsync<T>(T item, Nullable<PartitionKey> partitionKey=default(Nullable<PartitionKey>), ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken)); public abstract Task<ResponseMessage> UpsertItemStreamAsync(Stream streamPayload, PartitionKey partitionKey, ItemRequestOptions requestOptions=null, CancellationToken cancellationToken=default(CancellationToken)); public delegate Task ChangeFeedHandlerWithManualCheckpoint<T>(ChangeFeedProcessorContext context, IReadOnlyCollection<T> changes, Func<Task> checkpointAsync, CancellationToken cancellationToken); @@ -407,6 +408,7 @@ namespace Microsoft.Azure.Cosmos public int GatewayModeMaxConnectionLimit { get; set; } public Func<HttpClient> HttpClientFactory { get; set; } public Nullable<TimeSpan> IdleTcpConnectionTimeout { get; set; } + public TimeSpan InferenceRequestTimeout { get; set; } public bool LimitToEndpoint { get; set; } public Nullable<int> MaxRequestsPerTcpConnection { get; set; } public Nullable<int> MaxRetryAttemptsOnRateLimitedRequests { get; set; } @@ -1092,6 +1094,7 @@ namespace Microsoft.Azure.Cosmos public const string NorwayWest = "Norway West"; public const string PolandCentral = "Poland Central"; public const string QatarCentral = "Qatar Central"; + public const string SaudiArabiaEast = "Saudi Arabia East"; public const string SingaporeCentral = "Singapore Central"; public const string SingaporeNorth = "Singapore North"; public const string SouthAfricaNorth = "South Africa North"; @@ -1099,6 +1102,7 @@ namespace Microsoft.Azure.Cosmos public const string SouthCentralUS = "South Central US"; public const string SouthCentralUS2 = "South Central US 2"; public const string SoutheastAsia = "Southeast Asia"; + public const string SoutheastAsia3 = "Southeast Asia 3"; public const string SoutheastUS = "Southeast US"; public const string SoutheastUS3 = "Southeast US 3"; public const string SoutheastUS5 = "Southeast US 5"; @@ -1126,6 +1130,7 @@ namespace Microsoft.Azure.Cosmos public const string USSecWest = "USSec West"; public const string USSecWestCentral = "USSec West Central"; public const string WestCentralUS = "West Central US"; + public const string WestCentralUSFRE = "West Central US FRE"; public const string WestEurope = "West Europe"; public const string WestIndia = "West India"; public const string WestUS = "West US"; @@ -1504,6 +1509,7 @@ namespace Microsoft.Azure.Cosmos.Fluent public CosmosClientBuilder WithEnableRemoteRegionPreferredForSessionRetry(bool enableRemoteRegionPreferredForSessionRetry); public CosmosClientBuilder WithFaultInjection(IFaultInjector faultInjector); public CosmosClientBuilder WithHttpClientFactory(Func<HttpClient> httpClientFactory); + public CosmosClientBuilder WithInferenceRequestTimeout(TimeSpan inferenceRequestTimeout); public CosmosClientBuilder WithLimitToEndpoint(bool limitToEndpoint); public CosmosClientBuilder WithPriorityLevel(PriorityLevel priorityLevel); public CosmosClientBuilder WithReadConsistencyStrategy(ReadConsistencyStrategy readConsistencyStrategy); ``` ### Checklist - [ ] Changelog entries reviewed by team - [ ] API contract diff reviewed by Kiran and Kirill - [ ] Preview APIs reviewed (email sent to azurecosmossdkdotnet@microsoft.com) - [ ] Kiran sign-off obtained Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
This PR was authored by GitHub Copilot as part of an automated issue triage workflow.
Fixes #5487
Fixes #5702
This PR fixes a memory leak in the LINQ provider where
Expression.Compile()generates JIT-compiled DynamicMethods that persist in native memory, causing growth in long-running services.Root Cause
Location:
Microsoft.Azure.Cosmos/src/Linq/SubtreeEvaluator.cs:119-120Analysis:
Expression.Lambda().Compile()emits a new DynamicMethod with generated IL on every call. DynamicMethod IL is stored in native memory (not GC-tracked), causing memory growth in long-running services with repeated LINQ query evaluation.Evidence:
Changes Made
Call sites updated (4 files)
EvaluateConstant(): original leak site (LINQ provider, the SDK continuously generates new JIT-compiled DynamicMethods (resulting in unmanaged memory growth) #5487)ExpressionSimplifier<T>.Eval(): uses generic overload, no cast neededHandleAsSqlTransformExpression(): 2 Compile() callsConstruct(): uses generic overload, no cast neededSubtreeEvaluatorBenchmark.cs (new)
[MemoryDiagnoser]lambda.Compile()code pathSubtreeEvaluator.Evaluate()to measure the real fixCompile()demonstrates unbounded memory growthBenchmark Results
Environment: Windows, .NET 8.0, Release build, ARM64
Performance (per-call)
Why this validates the fix:
Compile()must generate IL and JIT-compile (~44μs overhead)Compile(preferInterpretation: true)interprets directly (~3μs overhead)Testing
Local Validation
Breaking Changes
None
External References
Checklist
Generated by GitHub Copilot CLI Agent