-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent writes during EventSource.Disable #55862
Prevent writes during EventSource.Disable #55862
Conversation
* prevent writers from accessing internal data while it is being disposed
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue Detailsfixes #55441 There is a race in TODO:
|
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Outdated
Show resolved
Hide resolved
@@ -1457,6 +1462,10 @@ protected virtual void Dispose(bool disposing) | |||
catch { } // If it fails, simply give up. | |||
m_eventSourceEnabled = false; | |||
} | |||
|
|||
// Wait till active writes have finished (stop waiting at 1 second) | |||
SpinWait.SpinUntil(() => m_activeWritesCount == 0, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know 1 second is the right value to wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a big old TODO
over it saying "until I find evidence for a better value".
I'm not even sure the timeout is appropriate. My fear is there is a code path after a call to one of the core Write
methods that would take a lock and having this infinitely loop would be asking for a deadlock of some kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me make a better suggestion here. When I was writing the ETW collector provided by VS this problem was endemic to all ETW collector designs. A solution we came up with was during shutdown we inserted an event into the ETW stream. When we observed that event we knew we were "synchronized" with the request. This isn't entirely true for reasons I can explain offline but it is close enough that VS's ETW collector has never had missed events due to the logic. Timeouts like this in ETW should be avoided at all costs. One second actually is a very reasonable value though – I can also explain that offline if interested.
...ystem.Private.CoreLib/src/System/Diagnostics/Tracing/TraceLogging/TraceLoggingEventSource.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/src/System/Diagnostics/Tracing/TraceLogging/TraceLoggingEventSource.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Outdated
Show resolved
Hide resolved
* add comment
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Outdated
Show resolved
Hide resolved
@josalem and I chatted earlier, but just to record it - most BCL types don't provide a thread-safety guarantee for Dispose() and I wonder if it is necessary here. The synchronization necessary to provide that guarantee comes with perf costs I'm hopeful we can avoid. |
I agree. I was trying to figure out what I was concerned about with regard to this change and I couldn't quite verbalize it. I think you hit it on the head @noahfalk. |
I was looking at a related bug and wound up doing some research on why we call EventSource.Dispose() at shutdown. So far I see two reasons:
With a little refactoring we could create a new internal API like EventSource.HandleShutdown() that handles only these concerns while still leaving us in a state where concurrent calls to Write don't generate failures (whether any data is logged is optional). Then we would call this new API and not Dispose() in the shutdown path. Dispose() would continue to do what it does now including freeing memory, but unlike HandleShutdown() the onus could be on the caller to ensure they don't make any concurrent calls to other EventSource APIs during/after the call to Dispose(). In the case of EventPipe, HandleShutdown() could disable the callbacks without deleting the provider. In the case of ETW past precedent suggests concurrent calls to EventUnregister and EventWrite are already safe (we've presumably been doing it for many years with no reported issues). @AaronRobinsonMSFT @vitek-karas @elinor-fung - The comment in the source references throwing a COMPLUS_BOOT_EXCEPTION and I find no reference to it any longer. I do recall that we used to block native->managed calls at some point during AppDomain shutdown, but do you know if that constraint is a relic of the desktop runtime that is no longer an issue on CoreCLR? |
The consensus is that we don't want to add protection of this nature due to the cost and that we should find an alternate way to prevent this. Rather than discuss this on a draft PR, I'll migrate our conversation to the issue describing the root problem (#55441). |
fixes #55441
There is a race in
EventSource.Disable
where any active writers that have passed theirIsEnabled()
checks could end up accessing internal structures as they are being disposed. This patch adds a two phase disable as part of dispose that will prevent writes from happening when resources are being cleaned up.TODO:
CC @brianrob @noahfalk @davmason