-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Library name and version
Microsoft.Azure.WebJobs.Extensions.EventHubs 6.0.2 and 5.5.0
Describe the bug
EventHubListener.PartitionProcessor
progresses the checkpoint even when the application is shutting down (for example, configuration change, scaling, etc.). It causes message lost.
Note that I found that this issue is occurred occasionally.
I and my colleague guess that this issue should be fixed #36432, but reintroduced with #38067 as following out investigation results.
Our investigation results
In this situation, while _functionExecutionToken.IsCancellationRequested
became true
, linkedCts.IsCancellationRequested
had not become true
. So, the checkpoint was progressed even if the application execution had been cancelled.
By LinkedCancellationTokenSource
source code, the following facts were found:
LinkdedCancellationTokenSource
(linkedCts
) is cancelled via callback of "linked" cancellation token (_functionExecutionToken
's source).- The callback chain is invoked LIFO order.
- The callback is also used in many places such as
Task.Delay(CancellationToken)
implementation.
By watching the callstack in the checkpointing, following sequence was occurred:
WebHost
calls listener'sStopAsync()
- The listener calls
CancellationTokenSource.Cancel()
(source of_functionExecutionToken
) - The application cancellation is occurred as continuation of async / await, then awaits in function runtimes are finished as part of registered callback execution. Note that this callback should be occurred before setting
linkedCts.IsCancellationRequested
totrue
as described above. So, the checkpoint is progressed becauselinkedCts.IsCancellationRequested
has not beentrue
yet.
Expected behavior
The checkpoint is never progressed when application process is shutting down ( _functionExecutionToken
is cancelled).
Actual behavior
The checkpoint is progressed occasionally.
Reproduction Steps
- Use event hub trigger with following in local.
- After
Recieve
method started, pressCtrl + C
to shutdown process. - The checkpoint should be progressed. You can investigate
linkedCts
and_functionExecutionToken
states with break point in the checkpointing.
[FunctionName("Receive")]
[ExponentialBackoffRetry(5, "00:00:10", "00:10:00")]
public async Task Receive(
[EventHubTrigger("%EventHubName%", Connection = "EventHubConnectionString")] EventData[] events,
CancellationToken cancellationToken)
{
await Task.Delay(TimeSpan.FromMinutes(3), cancellationToken);
}
Environment
- Platform: Windows (Functions runtime v3 and v4, Azure App Service)
- We reproduced in local environment, but message lost was occurred in production Azure environment multiple times.