Add support for dedupe statuses in isolated#3316
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for honoring dedupe statuses in orchestration creation requests for isolated worker apps. The implementation combines request-level dedupe statuses with app-wide OverridableExistingInstanceStates settings, and adds functionality to terminate existing non-terminal orchestrations before creating new instances with the same ID.
Changes:
- Added logic to handle dedupe statuses from gRPC creation requests and merge them with app-wide settings
- Implemented automatic termination of non-terminal orchestrations when creating instances with reusable instance IDs
- Added new timeout configuration option
OrchestrationCreationRequestTimeoutInSeconds(default 180 seconds) - Enhanced E2E tests across all language SDKs to support instance ID reuse scenarios
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
src/WebJobs.Extensions.DurableTask/TaskHubGrpcServer.cs |
Implements dedupe status logic for gRPC requests, though contains unused helper method |
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs |
Adds termination and wait logic for non-terminal orchestrations before creation |
src/WebJobs.Extensions.DurableTask/Options/DurableTaskOptions.cs |
Adds new timeout configuration option with validation |
src/WebJobs.Extensions.DurableTask/HttpApiHandler.cs |
Adds error handling for new exception types |
src/WebJobs.Extensions.DurableTask/DurableTaskExtension.cs |
Initializes timeout setting in durability provider |
src/WebJobs.Extensions.DurableTask/OverridableStates.cs |
Whitespace-only changes |
test/e2e/Tests/Tests/DedupeStatusesTests.cs |
New E2E tests for dedupe status functionality |
test/e2e/Tests/Tests/PurgeInstancesTests.cs |
Fixed logic bug in test condition |
test/e2e/Tests/Tests/DistributedTracingEntitiesTests.cs |
Improved JSON deserialization robustness |
test/Common/DurableTaskEndToEndTests.cs |
Added comprehensive unit tests for overridable states |
test/e2e/Apps/*/ |
Updated test apps across all languages to support instance ID specification |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/WebJobs.Extensions.DurableTask/Options/DurableTaskOptions.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/WebJobs.Extensions.DurableTask/Options/DurableTaskOptions.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/WebJobs.Extensions.DurableTask/Options/DurableTaskOptions.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (6)
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:416
- Typo in comment: "dedupping" → "deduping".
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
test/e2e/Apps/BasicDotNetIsolated/HelloCities.cs:139
ScheduleNewOrchestrationInstanceAsyncreturns the actual instance ID (and may generate one if empty/invalid). Here it’s awaited but the return value is ignored, and the response/log uses the incominginstanceIdparameter. Assign the returned value back toinstanceIdto ensure the check-status payload is correct in all cases.
// Function input comes from the request content.
try
{
await client.ScheduleNewOrchestrationInstanceAsync(orchestrationName, startOptions);
}
catch (OrchestrationAlreadyExistsException ex)
{
// Tests expect Conflict (409) for orchestration dedupe scenarios.
HttpResponseData response = req.CreateResponse(HttpStatusCode.Conflict);
await response.WriteStringAsync(ex.Message);
return response;
}
catch (ArgumentException ex)
{
// Tests expect BadRequest for invalid dedupe statuses
HttpResponseData response = req.CreateResponse(HttpStatusCode.BadRequest);
await response.WriteStringAsync(ex.Message);
return response;
}
logger.LogInformation("Started orchestration with ID = '{instanceId}'.", instanceId);
// Returns an HTTP 202 response with an instance management payload.
// See https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-http-api#start-orchestration
return await client.CreateCheckStatusResponseAsync(req, instanceId);
src/WebJobs.Extensions.DurableTask/TaskHubGrpcServer.cs:539
- Typo in comment: "th instanceId" → "the instanceId".
// Thrown when th instanceId is not found.
throw new RpcException(new Status(StatusCode.NotFound, $"ArgumentException: {ex.Message}"));
test/e2e/Apps/BasicPowerShell/StartOrchestration/run.ps1:12
Start-DurableOrchestration -InstanceId $InstanceIdis always passed even when the query parameter is missing/null. If the cmdlet treats null/empty as invalid, this could break callers/tests that don’t supply instanceId (many tests call StartOrchestration with only orchestrationName). Consider only adding-InstanceIdwhen a non-empty instanceId was provided, otherwise omit it so the runtime generates one.
$orchestrationName = $Request.Query.orchestrationName
$InstanceId = $Request.Query.instanceId
$InstanceId = Start-DurableOrchestration -FunctionName $orchestrationName -InstanceId $InstanceId
test/e2e/Apps/BasicNode/src/functions/LargeOutputOrchestrator.ts:18
sizeInKB <= 0won’t catch missing input in JS/TS (e.g.,undefined/NaN), so the orchestrator may continue and produce an empty output instead of failing. This breaks the assumption in tests that starting this orchestrator without input fails. Add an explicit validation for non-numeric input (e.g., undefined/NaN) before comparing.
const sizeInKB = context.df.getInput<number>();
if (sizeInKB <= 0) {
throw new Error('sizeInKB must be a positive integer.');
}
test/Common/DurableTaskEndToEndTests.cs:6121
- This log assertion relies on an exact DurableTask.Core log string prefix. Since log formats can change across DurableTask.Core versions/providers, this is likely to be brittle/flaky. Consider asserting on a more stable signal (e.g., an event ID, a structured field if available, or a less specific substring) to reduce test fragility.
IReadOnlyCollection<LogMessage> durableTaskCoreLogs =
this.loggerProvider.CreatedLoggers.Single(l => l.Category == "DurableTask.Core").LogMessages;
Assert.Contains(durableTaskCoreLogs, log => log.ToString().StartsWith($"{instanceId}: Orchestration completed with a 'Terminated' status"));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (5)
test/e2e/Apps/BasicNode/src/functions/LargeOutputOrchestrator.ts:18
- In JS/TS,
undefined <= 0isfalse, so when no input is providedsizeInKBwill likely beundefinedand this validation won’t throw. That means the orchestrator can accidentally succeed with an empty output, which breaks the intended behavior (other languages fail when input is missing/<=0). Consider explicitly validatingsizeInKB == null/Number.isFinite(sizeInKB)in addition to<= 0.
const sizeInKB = context.df.getInput<number>();
if (sizeInKB <= 0) {
throw new Error('sizeInKB must be a positive integer.');
}
test/e2e/Apps/BasicDotNetIsolated/HelloCities.cs:107
Enum.TryParsefailures throw anArgumentExceptionbefore thetry/catchblock (thetryonly wrapsScheduleNewOrchestrationInstanceAsync). That will surface as a 500 instead of the intended 400 BadRequest. Wrap the parsing/validation inside the sametry/catch(or add a separate catch around the parsing) so invalid status strings return the expected 400.
var parsedStatuses = new OrchestrationRuntimeStatus[dedupeStatuses.Length];
for (int i = 0; i < dedupeStatuses.Length; i++)
{
string statusStr = dedupeStatuses[i];
if (!Enum.TryParse<OrchestrationRuntimeStatus>(statusStr, ignoreCase: true, out var status))
{
throw new ArgumentException($"Invalid OrchestrationRuntimeStatus value: '{statusStr}'", nameof(dedupeStatuses));
}
parsedStatuses[i] = status;
}
test/e2e/Apps/BasicDotNetIsolated/HelloCities.cs:139
ScheduleNewOrchestrationInstanceAsyncreturns the actual instance ID (it may differ from the requested one if the input is empty/invalid or the client overrides it). This function ignores the return value and always logs/returns the originalinstanceIdparameter, which can produce an incorrect management payload. Capture the returned instance ID and use it for logging andCreateCheckStatusResponseAsync.
try
{
await client.ScheduleNewOrchestrationInstanceAsync(orchestrationName, startOptions);
}
catch (OrchestrationAlreadyExistsException ex)
{
// Tests expect Conflict (409) for orchestration dedupe scenarios.
HttpResponseData response = req.CreateResponse(HttpStatusCode.Conflict);
await response.WriteStringAsync(ex.Message);
return response;
}
catch (ArgumentException ex)
{
// Tests expect BadRequest for invalid dedupe statuses
HttpResponseData response = req.CreateResponse(HttpStatusCode.BadRequest);
await response.WriteStringAsync(ex.Message);
return response;
}
logger.LogInformation("Started orchestration with ID = '{instanceId}'.", instanceId);
// Returns an HTTP 202 response with an instance management payload.
// See https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-http-api#start-orchestration
return await client.CreateCheckStatusResponseAsync(req, instanceId);
test/e2e/Apps/BasicPowerShell/StartOrchestration/run.ps1:12
- This always supplies
-InstanceId $InstanceIdeven when the query parameter is missing/empty. If$Request.Query.instanceIdis$nullor'', PowerShell parameter binding or the underlying Durable cmdlet may error instead of auto-generating an ID. Consider only passing-InstanceIdwhen a non-empty value was provided (otherwise callStart-DurableOrchestrationwithout it).
$InstanceId = $Request.Query.instanceId
$InstanceId = Start-DurableOrchestration -FunctionName $orchestrationName -InstanceId $InstanceId
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:416
- Typo in the XML docs: "dedupping" should be "deduping" (or "deduplication").
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… reduce the flakiness
…snt support terminating suspended orchestrations
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
test/e2e/Apps/BasicDotNetIsolated/HelloCities.cs:119
ScheduleNewOrchestrationInstanceAsyncreturns the actual instance ID that was used. This function ignores the return value and uses the request parameterinstanceIdwhen logging and creating the check-status response, which can be incorrect if the runtime generates a different ID (e.g., missing/empty instanceId). Assign the returned value and use it for logs/response.
try
{
await client.ScheduleNewOrchestrationInstanceAsync(orchestrationName, startOptions);
}
src/WebJobs.Extensions.DurableTask/TaskHubGrpcServer.cs:544
OrchestrationAlreadyExistsExceptionis mapped toStatusCode.FailedPreconditionhere, butStartInstancemaps the same exception toStatusCode.AlreadyExistsand the HTTP API maps it to HTTP 409. Consider usingStatusCode.AlreadyExistsfor consistency and so gRPC clients can distinguish true conflicts from other precondition failures.
catch (OrchestrationAlreadyExistsException ex)
{
throw new RpcException(new Status(StatusCode.FailedPrecondition, $"Non-terminal instance with this instance ID already exists: {ex.Message}"));
}
test/Common/DurableTaskEndToEndTests.cs:6121
- In the
anyStateOverridablepath, the test only asserts that a “Terminated” log exists, but it doesn’t assert that the restart/start operation actually succeeded (i.e., thatexceptionis null). As written, the test could pass even if the API throws after terminating the instance. Add an assertion that no exception was captured whenanyStateOverridableis true.
// If any state is reusable, confirm that there is evidence the existing orchestration was terminated before the new one was created
if (anyStateOverridable && this.useTestLogger)
{
IReadOnlyCollection<LogMessage> durableTaskCoreLogs =
this.loggerProvider.CreatedLoggers.Single(l => l.Category == "DurableTask.Core").LogMessages;
Assert.Contains(durableTaskCoreLogs, log => log.ToString().StartsWith($"{instanceId}: Orchestration completed with a 'Terminated' status"));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/Apps/BasicNode/src/functions/LargeOutputOrchestrator.ts
Outdated
Show resolved
Hide resolved
…plete to remove flakiness
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:417
- Spelling: the XML doc says "dedupping"; this should be "deduping" (or "deduplication") to avoid propagating the typo into generated docs/IntelliSense.
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
/// If an orchestration with the same instance ID already exists, and its status is in this array, then a
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:444
- Spelling: the XML doc says "dedupping"; this should be "deduping" (or "deduplication") to avoid propagating the typo into generated docs/IntelliSense.
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
/// If an orchestration with the same instance ID already exists, and its status is in this array, then a
test/e2e/Tests/Tests/RestartOrchestrationTests.cs:192
- This loop polls the status endpoint in a tight loop with no delay, which can hammer the local functions host and make the test flaky/slow under load. Add a small
Task.Delay(...)between iterations (and ideally pass the CTS token) to throttle polling.
var waitForRestartTimeout = TimeSpan.FromSeconds(30);
using CancellationTokenSource cts = new(waitForRestartTimeout);
while (!cts.IsCancellationRequested && createdTime2 == createdTime1)
{
restartOrchestrationDetails = await DurableHelpers.GetRunningOrchestrationDetailsAsync(restartStatusQueryGetUri);
createdTime2 = restartOrchestrationDetails.CreatedTime;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (6)
test/e2e/Tests/Tests/RestartOrchestrationTests.cs:179
- Avoid using Thread.Sleep in an async test. It blocks the test thread and can slow/flakify the suite; use an awaited delay instead (and ideally tie it to a CancellationToken/timeout helper).
// Restart the orchestrator with the same instance id
// This is necessary to ensure that the created times for the two orchestration instances are different.
// The created time returned by the orchestration status API has a resolution only up to seconds, not milliseconds.
Thread.Sleep(TimeSpan.FromSeconds(1));
using HttpResponseMessage restartResponse = await HttpHelpers.InvokeHttpTriggerWithBody(
"RestartOrchestration_HttpRestart", jsonBody, "application/json");
test/Common/DurableTaskEndToEndTests.cs:6087
- The Counter orchestration uses ContinueAsNew, so
GetStatusAsync()can legitimately returnContinuedAsNew(this file previously treated that as acceptable in similar checks). Asserting strictlyRunninghere risks flakiness; consider allowingRunningorContinuedAsNew(or usingWaitForStartupAsync/custom-status checks as the source of truth).
// Make sure it's still running and didn't complete early (or fail).
DurableOrchestrationStatus status = await client.GetStatusAsync();
Assert.Equal(OrchestrationRuntimeStatus.Running, status?.RuntimeStatus);
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:445
- Typo in XML docs: "dedupping" should be "deduping" (or "deduplication").
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
/// If an orchestration with the same instance ID already exists, and its status is in this array, then a
/// <see cref="OrchestrationAlreadyExistsException"/> will be thrown.
test/e2e/Tests/Tests/RestartOrchestrationTests.cs:195
- The polling loop waiting for CreatedTime to change has no delay/backoff, which can busy-spin and hammer the status endpoint. Add a small delay between iterations (and pass the CTS token) or reuse an existing polling helper to avoid excessive CPU/network usage.
var waitForRestartTimeout = TimeSpan.FromSeconds(30);
using CancellationTokenSource cts = new(waitForRestartTimeout);
while (!cts.IsCancellationRequested && createdTime2 == createdTime1)
{
restartOrchestrationDetails = await DurableHelpers.GetRunningOrchestrationDetailsAsync(restartStatusQueryGetUri);
createdTime2 = restartOrchestrationDetails.CreatedTime;
}
test/e2e/Apps/BasicDotNetIsolated/HelloCities.cs:119
ScheduleNewOrchestrationInstanceAsyncreturns the actual instance ID that was scheduled. This method ignores the return value and then logs/creates the check-status response using the inputinstanceIdparameter, which could be wrong if the runtime generates/normalizes the ID (e.g., if the supplied ID is empty/invalid). Capture and use the returned instance ID for the log andCreateCheckStatusResponseAsync.
// Function input comes from the request content.
try
{
await client.ScheduleNewOrchestrationInstanceAsync(orchestrationName, startOptions);
}
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:417
- Typo in XML docs: "dedupping" should be "deduping" (or "deduplication"). This appears twice in the new overload docs.
/// <param name="dedupeStatuses">An array of orchestration statuses used for "dedupping":
/// If an orchestration with the same instance ID already exists, and its status is in this array, then a
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (4)
test/e2e/Tests/Tests/DedupeStatusesTests.cs:257
- The
dedupeStatusesquery parameter is built from raw JSON (["Pending", ...]) without URL-encoding. SinceHttpHelpers.GetTestRequestconcatenates the query string directly, this can produce invalid/ambiguous URLs. URL-encode the serialized JSON (or switch to repeated query keys likededupeStatuses=Pending&dedupeStatuses=Failed) before appending it to the request URI.
string queryString = $"?orchestrationName={orchestrationName}&instanceId={instanceId}" +
$"&dedupeStatuses={JsonSerializer.Serialize(dedupeStatuses)}";
if (scheduledStartTime is not null)
{
queryString += $"&scheduledStartTime={scheduledStartTime:o}";
}
HttpResponseMessage response = await HttpHelpers.InvokeHttpTrigger("StartOrchestration_DedupeStatuses", queryString);
test/e2e/Tests/Tests/RestartOrchestrationTests.cs:179
Thread.Sleepblocks the test thread inside an async test method. Useawait Task.Delay(...)instead so the test doesn’t tie up a threadpool thread (and remains consistent with the rest of the async polling helpers in this suite).
// Restart the orchestrator with the same instance id
// This is necessary to ensure that the created times for the two orchestration instances are different.
// The created time returned by the orchestration status API has a resolution only up to seconds, not milliseconds.
Thread.Sleep(TimeSpan.FromSeconds(1));
using HttpResponseMessage restartResponse = await HttpHelpers.InvokeHttpTriggerWithBody(
"RestartOrchestration_HttpRestart", jsonBody, "application/json");
test/e2e/Tests/Tests/RestartOrchestrationTests.cs:196
- The
whileloop pollsGetRunningOrchestrationDetailsAsyncin a tight loop with no delay. This can spin hot CPU and spam the status endpoint. Add a smallawait Task.Delay(...)(e.g., 100–250ms) inside the loop, and consider also breaking early if the runtimeStatus reaches a terminal state.
var waitForRestartTimeout = TimeSpan.FromSeconds(30);
using CancellationTokenSource cts = new(waitForRestartTimeout);
while (!cts.IsCancellationRequested && createdTime2 == createdTime1)
{
restartOrchestrationDetails = await DurableHelpers.GetRunningOrchestrationDetailsAsync(restartStatusQueryGetUri);
createdTime2 = restartOrchestrationDetails.CreatedTime;
}
Assert.NotEqual(createdTime1, createdTime2);
src/WebJobs.Extensions.DurableTask/DurabilityProvider.cs:705
runningStatusesomitsOrchestrationStatus.ContinuedAsNew, which is treated as a non-terminal/running status elsewhere in this repo (e.g.,OverridableStatesExtensions.NonRunning). If an existing instance is inContinuedAsNewand the dedupe policy allows reuse of running instances, this helper won’t terminate it before creating the new instance. IncludeContinuedAsNewin the running-status set (and update the XML doc that lists running statuses) to keep behavior consistent and safe.
var runningStatuses = new List<OrchestrationStatus>()
{
OrchestrationStatus.Running,
OrchestrationStatus.Pending,
OrchestrationStatus.Suspended,
};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
What changed?
This PR:
What is ultimately used to determine the dedupe statuses is a combination of those specified in the request and those obtains from the user's app-wide
OverridableExistingInstanceStatessetting. These two are "unioned" to obtain the strictest dedupe statuses.Why is this change needed?
OverridableExistingInstanceStateswas set toAnyState, we would allow just creating a new orchestration with the same instance ID without terminating any existing running orchestration, which is unsafe.Project checklist
pending_docs.mdrelease_notes.mdWebJobs.Extensions.DurableTaskpackage/src/Worker.Extensions.DurableTask/AssemblyInfo.csEventSourcelogsv2.xbranchWebJobs.Extensions.DurableTaskv3.x and will be retained only in thedevandmainbranchesAI-assisted code disclosure (required)
Was an AI tool used? (select one)
If AI was used:
DedupeStatusesTestAI verification (required if AI was used):
Notes for reviewers
OrchestrationIdReusePolicyof "null" to mean all statuses can be reused (equivalent to no dedupe statuses). Does this make sense? I do this because none of the other languages (that use gRPC, so Java and Python) set this field currently. In that case I just default to relying on theOverridableExistingInstanceStatessetting.