Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow thread suspension in busy loop #94767

Closed
szehetner opened this issue Nov 15, 2023 · 8 comments
Closed

Slow thread suspension in busy loop #94767

szehetner opened this issue Nov 15, 2023 · 8 comments
Assignees
Labels
area-VM-coreclr tenet-performance Performance related issue
Milestone

Comments

@szehetner
Copy link
Contributor

I have encountered a case where thread suspension before a GC takes multiple seconds (up to 70 seconds in the worst case). A minimal repro can be found here: https://gist.github.com/szehetner/47515ee0f28e2ca9d4990d60ac230a07

This starts a thread in a busy spin loop and triggers GCs. In a lot of iterations the GC takes multiple seconds. PerfView confirms that the time is spent during thread suspension:

image

Tested with .NET 7 and .NET 8 on x64 Windows - suspension times seem to be lower in .NET 8, but still significant.

My understanding is that it shouldn't be possible for user code to prevent thread suspension for such a long time. Is this an issue in the runtime? Or is there something I can do to speed up the suspension without giving up control of the thread (so without Thread.Yield() or Thread.Sleep())?

@szehetner szehetner added the tenet-performance Performance related issue label Nov 15, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2023
@ghost
Copy link

ghost commented Nov 15, 2023

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

I have encountered a case where thread suspension before a GC takes multiple seconds (up to 70 seconds in the worst case). A minimal repro can be found here: https://gist.github.com/szehetner/47515ee0f28e2ca9d4990d60ac230a07

This starts a thread in a busy spin loop and triggers GCs. In a lot of iterations the GC takes multiple seconds. PerfView confirms that the time is spent during thread suspension:

image

Tested with .NET 7 and .NET 8 on x64 Windows - suspension times seem to be lower in .NET 8, but still significant.

My understanding is that it shouldn't be possible for user code to prevent thread suspension for such a long time. Is this an issue in the runtime? Or is there something I can do to speed up the suspension without giving up control of the thread (so without Thread.Yield() or Thread.Sleep())?

Author: szehetner
Assignees: -
Labels:

area-System.Threading, tenet-performance

Milestone: -

@jkotas
Copy link
Member

jkotas commented Nov 15, 2023

cc @VSadov

@VSadov VSadov self-assigned this Nov 15, 2023
@VSadov
Copy link
Member

VSadov commented Nov 15, 2023

Interesting. I will take a look.

@VSadov
Copy link
Member

VSadov commented Nov 15, 2023

It looks a lot like a case of a tight loop with a very fast call in it.
JIT does not insert a poll, since there is a call, but catching the thread in the call is difficult.

I can reporduce this and behavior is also sensitive to OS (Win11 seems less affected than Win10) and also depends on the suspension implementation (NativeAOT seems more robust)

app prints milliseconds taken by a GC.Collect() call while there is a thread concurrently running a tight loop.

// all measured with
set DOTNET_TieredCompilation=0 
as debug code suspends easily. 

===== Windows10

=== CoreCLR
Arbitrary delays, sometimes the app hangs forever. (pauses for minutes, then I just stop it)
That is regardless of 8.0 or 7.0, so this is not a result of some recent change.

=== NativeAOT

713
66
1698
2327
22
3804
1593
3343
1766
6077
1515
1218
1082

@VSadov
Copy link
Member

VSadov commented Nov 15, 2023

My understanding is that it shouldn't be possible for user code to prevent thread suspension for such a long time. Is this an issue in the runtime?

This should not be generally happening and this situation if fairly uncommon. The scenario here is one of the hardest cases for suspension. Since there is a call in the loop, suspension is supposed to catch the thread when it returns from the call, but since the call does literally nothing, there is an extremely short opportunity. Such loops typically do not run for long, since they are clearly wasting CPU, but this one does... Then depending on OS API latencies and how suspension performs the retries the problem could be amplified to take seconds or minutes to suspend.

Since NativeAOT performs better here, it will be worth looking into what is happening in CoreCLR.

Or is there something I can do to speed up the suspension without giving up control of the thread (so without Thread.Yield() or Thread.Sleep())?

As a temporary workaround making a polling call could help. There is no specific API to do a suspension poll, but many OS/interop services do a poll. Calling Thread.Yield() once in a few iterations would be a logical choice, but if that is undesirable, something like Environment.TickCount could work too. It is a cheap call and will do a poll, so the following would be more suspension friendly.

            int someCounter = 0;

            while (!_cts.IsCancellationRequested)
            {
                _idleStrategy.Idle(0);

                // call Thread.Yield() once in a few iterations.
                //
                // if (someCounter++ % 1024 == 0)
                //    Thread.Yield();

                // or call some cheap OS API for the sideeffect of suspension poll
                _ = Environment.TickCount;
            }

@szehetner
Copy link
Contributor Author

Thanks for the explanation. Using Environment.TickCount looks promising, I will test this further.

To provide some background on the busy spinning loop: Our actual system uses https://github.com/AdaptiveConsulting/Aeron.NET for receiving messages. At the core of that is a loop to poll for incoming messages. Keeping this thread spinning is a deliberate choice to achieve lower latency at the cost of wasted CPU cycles.

@acaly
Copy link

acaly commented Nov 25, 2023

Keeping this thread spinning is a deliberate choice to achieve lower latency at the cost of wasted CPU cycles.

In my understanding, if this is the case, you don't want the GC to suspend this thread, do you?

I think in such a latency sensitive scenario, it might be better to separate the application into busy state and idle state, disable GC completely in the first state, and only do GC in the second.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label May 2, 2024
@mangod9 mangod9 added this to the 9.0.0 milestone May 2, 2024
@VSadov
Copy link
Member

VSadov commented May 11, 2024

With #95565 and #94767 changes, the repro scenario sees suspension times in sub-millisecond range, which is below the benchmark's sensitivity.
(validated with CoreCLR on Windows 10 AMD Ryzen 9 7950X)

See: #95565 (comment)

Theoretically it may still be possible to observe a difficult-to-suspend loop, but such scenario would be hard to construct even intentionally.

I think we can close this issue now as addressed.

@VSadov VSadov closed this as completed May 11, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants