Prevent writes during EventSource.Disable #55862

josalem · 2021-07-16T23:27:51Z

There is a race in EventSource.Disable where any active writers that have passed their IsEnabled() checks could end up accessing internal structures as they are being disposed. This patch adds a two phase disable as part of dispose that will prevent writes from happening when resources are being cleaned up.

TODO:

performance analysis. Adding a spinwait could be expensive and even with the timeout, it's possible for this to bite us. I want to run this through bdn to see if there is a serious impact to EventSource performance. I also want to check if this impact performance on shutdown. I'll leave this as draft until I'm either convinced I don't need the data, or I've collected it.

CC @brianrob @noahfalk @davmason

* prevent writers from accessing internal data while it is being disposed

ghost · 2021-07-16T23:27:58Z

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #55441

There is a race in EventSource.Disable where any active writers that have passed their IsEnabled() checks could end up accessing internal structures as they are being disposed. This patch adds a two phase disable as part of dispose that will prevent writes from happening when resources are being cleaned up.

TODO:

performance analysis. Adding a spinwait could be expensive and even with the timeout, it's possible for this to bite us. I want to run this through bdn to see if there is a serious impact to EventSource performance. I also want to check if this impact performance on shutdown. I'll leave this as draft until I'm either convinced I don't need the data, or I've collected it.

CC @brianrob @noahfalk

Author:	josalem
Assignees:	josalem
Labels:	`area-System.Diagnostics.Tracing`
Milestone:	6.0.0

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs

davmason · 2021-07-17T18:42:28Z

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs

@@ -1457,6 +1462,10 @@ protected virtual void Dispose(bool disposing)
                    catch { } // If it fails, simply give up.
                    m_eventSourceEnabled = false;
                }
+
+                // Wait till active writes have finished (stop waiting at 1 second)
+                SpinWait.SpinUntil(() => m_activeWritesCount == 0, 1000);


How do we know 1 second is the right value to wait?

This should have a big old TODO over it saying "until I find evidence for a better value".

I'm not even sure the timeout is appropriate. My fear is there is a code path after a call to one of the core Write methods that would take a lock and having this infinitely loop would be asking for a deadlock of some kind.

Let me make a better suggestion here. When I was writing the ETW collector provided by VS this problem was endemic to all ETW collector designs. A solution we came up with was during shutdown we inserted an event into the ETW stream. When we observed that event we knew we were "synchronized" with the request. This isn't entirely true for reasons I can explain offline but it is close enough that VS's ETW collector has never had missed events due to the logic. Timeouts like this in ETW should be avoided at all costs. One second actually is a very reasonable value though – I can also explain that offline if interested.

...ystem.Private.CoreLib/src/System/Diagnostics/Tracing/TraceLogging/TraceLoggingEventSource.cs

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs

* add comment

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs

noahfalk · 2021-07-27T04:50:05Z

@josalem and I chatted earlier, but just to record it - most BCL types don't provide a thread-safety guarantee for Dispose() and I wonder if it is necessary here. The synchronization necessary to provide that guarantee comes with perf costs I'm hopeful we can avoid.
- If the runtime will automatically call Dispose() on a thread of our choosing at shutdown then we do need some thread-safety guarantee, but does the runtime need to be calling Dispose() at all? I'm not sure what benefit it gives us but research may reveal that is necessary.
- Since the runtime automatically calls Write for EventCounters we would need to modify Dispose() to ensure that we stopped the EventCounter Write() calls prior to freeing the provider.

brianrob · 2021-07-27T21:54:33Z

I agree. I was trying to figure out what I was concerned about with regard to this change and I couldn't quite verbalize it. I think you hit it on the head @noahfalk.

noahfalk · 2021-07-28T08:30:40Z

I was looking at a related bug and wound up doing some research on why we call EventSource.Dispose() at shutdown. So far I see two reasons:

Avoiding native->managed callbacks after we've made it illegal to call back into managed code here
Logging ETW manifests at the end of the trace here

With a little refactoring we could create a new internal API like EventSource.HandleShutdown() that handles only these concerns while still leaving us in a state where concurrent calls to Write don't generate failures (whether any data is logged is optional). Then we would call this new API and not Dispose() in the shutdown path. Dispose() would continue to do what it does now including freeing memory, but unlike HandleShutdown() the onus could be on the caller to ensure they don't make any concurrent calls to other EventSource APIs during/after the call to Dispose().

In the case of EventPipe, HandleShutdown() could disable the callbacks without deleting the provider. In the case of ETW past precedent suggests concurrent calls to EventUnregister and EventWrite are already safe (we've presumably been doing it for many years with no reported issues).

@AaronRobinsonMSFT @vitek-karas @elinor-fung - The comment in the source references throwing a COMPLUS_BOOT_EXCEPTION and I find no reference to it any longer. I do recall that we used to block native->managed calls at some point during AppDomain shutdown, but do you know if that constraint is a relic of the desktop runtime that is no longer an issue on CoreCLR?

josalem · 2021-07-28T20:55:44Z

The consensus is that we don't want to add protection of this nature due to the cost and that we should find an alternate way to prevent this. Rather than discuss this on a draft PR, I'll migrate our conversation to the issue describing the root problem (#55441).

John Salem added 2 commits July 16, 2021 16:21

Null out m_provHandle when deleted

60fa305

two-phase disable checking in EventSource

ddb2fb9

* prevent writers from accessing internal data while it is being disposed

josalem added the area-System.Diagnostics.Tracing label Jul 16, 2021

josalem added this to the 6.0.0 milestone Jul 16, 2021

josalem self-assigned this Jul 16, 2021

davmason requested changes Jul 17, 2021

View reviewed changes

Fix positioning of incr/decr pairs

f7a43a8

* add comment

davmason reviewed Jul 19, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs Outdated Show resolved Hide resolved

Switch to interlocked.incr/decr

0f7a8b9

karelz mentioned this pull request Jul 20, 2021

Test failure TcpAcceptSocket_WhenServerBoundToWildcardAddress_LocalEPIsSpecific - different port #56042

Closed

davmason reviewed Jul 22, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs Outdated Show resolved Hide resolved

switch to a using statement with a ref struct

1725479

karelz mentioned this pull request Jul 28, 2021

MsQuic tests hang / long running #56487

Closed

josalem closed this Jul 28, 2021

josalem mentioned this pull request Jul 28, 2021

Rare race condition in EventSource dispose/finalizer #55441

Open

ghost locked as resolved and limited conversation to collaborators Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent writes during EventSource.Disable #55862

Prevent writes during EventSource.Disable #55862

josalem commented Jul 16, 2021 •

edited

Loading

ghost commented Jul 16, 2021

davmason Jul 17, 2021

josalem Jul 19, 2021

AaronRobinsonMSFT Jul 28, 2021 •

edited

Loading

noahfalk commented Jul 27, 2021

brianrob commented Jul 27, 2021

noahfalk commented Jul 28, 2021

josalem commented Jul 28, 2021

Prevent writes during EventSource.Disable #55862

Prevent writes during EventSource.Disable #55862

Conversation

josalem commented Jul 16, 2021 • edited Loading

ghost commented Jul 16, 2021

davmason Jul 17, 2021

Choose a reason for hiding this comment

josalem Jul 19, 2021

Choose a reason for hiding this comment

AaronRobinsonMSFT Jul 28, 2021 • edited Loading

Choose a reason for hiding this comment

noahfalk commented Jul 27, 2021

brianrob commented Jul 27, 2021

noahfalk commented Jul 28, 2021

josalem commented Jul 28, 2021

josalem commented Jul 16, 2021 •

edited

Loading

AaronRobinsonMSFT Jul 28, 2021 •

edited

Loading