Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel.For blocks when no ThreadPool threads are available on dotnet core but works on framework #2078

Open
ficzerepeti opened this issue Jan 23, 2020 · 6 comments

Comments

@ficzerepeti
Copy link

Hi,

I've been migrating a dotnet framework application to dotnet core and during the process I've found a possible issue.

On dotnet core in case all ThreadPool threads are blocked (waiting for computation to be finished for example) then Parallel.For doesn't return however it does on framework. In the app I'm working on the maximum ThreadPool thread count is limited to processor count (ThreadPool.SetMaxThreads(Environment.ProcessorCount, Environment.ProcessorCount);).

I've been able to reproduce the issue with ~60 lines of code which you can find here.

My question is: is this the expected behaviour? If yes then how can I get it to work like in framework?

Thanks,
Peter

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Jan 23, 2020
@stephentoub stephentoub added bug and removed untriaged New issue has not been triaged by the area owner labels Jan 23, 2020
@stephentoub
Copy link
Member

stephentoub commented Jan 23, 2020

Thank you for the repro.

Parallel.For works to spread itself over however many threads are available / allowed in the pool by creating a single task. That task immediately creates and queues a copy of itself and then starts running work for the loop. If another thread becomes available, that thread picks up the copy, which in turn creates another copy and starts joining in on the processing of the loop. Etc. The very first Parallel.For creates doesn't actually get scheduled to the pool (assuming the TaskScheduler hasn't been changed to something that works differently); rather, it gets run on the current thread (after all, why block the current thread waiting for another thread to do work when it could just be done by the current thread). In the case where all of the thread pool threads are blocked, that main thread will end up doing all of the work. However, it will also have already queued a copy of itself, and it needs to ensure that such a copy won't actually run, such that everything associated with the Parallel.For invocation will have quiesced by the time the For call returns to its caller. In .NET Framework, Parallel lives in the same assembly as Task, and has access to its internals; it's calling an internal method on the replica to cancel it. In .NET Core, Parallel is a standalone implementation, and that call (or an equivalent) is missing. That means the Parallel.For in your repro does all the work, but then blocks waiting for the single task it queued to complete; that task is just sitting in the thread pool's queue, waiting to be picked up by a thread from the pool, but you've blocked all threads in the pool until the Parallel.For completes and prohibited the thread pool from injecting more, thereby creating a deadlock.

The obvious workaround for you is: don't do that ;-) Parallel.For has been implemented like this since the earliest days of .NET Core (well before, actually), so the fact that we're only learning about this issue now highlights its rarity.

However, it's still worth looking at fixing the implementation of Parallel.For to be able to cancel the replica tasks it creates. I'm not sure how easy it will be to do, though. The "easy" fix would likely lead to an exception being thrown and caught on every Parallel.For invocation, which isn't ideal. We might need https://github.com/dotnet/corefx/issues/8142 to do this correctly.

@stephentoub
Copy link
Member

ps As another possible workaround to unblock you, you could try adding a class like this to your project:

private sealed class AlwaysInlineScheduler : TaskScheduler
{
    protected override void QueueTask(Task task) => ThreadPool.QueueUserWorkItem(s => TryExecuteTask((Task)s), task);
    protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued) => TryExecuteTask(task);
    protected override IEnumerable<Task> GetScheduledTasks() => null;
}

and then changing your Parallel.For to pass in a ParallelOptions that uses it:

Parallel.For(0, 20, new ParallelOptions { TaskScheduler = new AlwaysInlineScheduler() }, ...);

I suspect that will workaround the issue. When the main task finishes its work, it then waits on each created replica task using Task.Wait(). Task.Wait() tries to employ a similar "use this thread if possible" optimization as previously mentioned, where if the task being waited on hasn't started yet, it will try to run that task on the current thread instead of on a pool thread. To do that, though, it asks the TaskScheduler if it's ok (since the scheduler might have certain requirements for the threads tasks queued to it run on). The default scheduler will allow such "inlining" if it can find the requested task in the current thread pool thread's local queue, but in this case, it's not a thread pool thread that's queueing the initial task but rather the main thread of the app, so when we do the initial Task.Wait, the scheduler will say "nope" and we'll end up blocking waiting on the task rather than inlining it. The class I pasted above overrides TryExecuteTaskInline to just always try to run the task, and as such, it should avoid the blocking that's leading to the deadlock.

Another workaround would be to run the Parallel.For from a thread pool thread (though in your repro you'd need to change the number of dummy tasks you queue to block pool threads, since you'd be using one of the thread pool threads already). Then the initial replica Parallel.For creates will be in that thread pool thread's local queue, and the Task.Wait on it should be able to inline it successfully without further workarounds.

@ficzerepeti
Copy link
Author

ficzerepeti commented Jan 24, 2020

What an insightful response, very much appreciated. It makes reporting bugs a very positive experience.
I like the idea of the replica tasks and now I finally understand what I saw during debugging :)

The obvious workaround for you is: don't do that ;-)

I know you're half joking but I tried that. Sadly it'd be a massive amount of work which has now been put on our backlog. I guess you know what that means :)

Good news is that you were right, AlwaysInlineScheduler solves our problem. For me at this point this is good enough as we're in full control of our codebase and Parallel.For is not directly used but hidden behind a lib to allow switching between various implementations.

I'm happy to wait for dotnet/corefx#8142 to be resolved and get low cost cancellation.

@ghost
Copy link

ghost commented Jul 8, 2020

Tagging subscribers to this area: @tarekgh
Notify danmosemsft if you want to be subscribed.

@ghost
Copy link

ghost commented Jan 16, 2024

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

Issue Details

Hi,

I've been migrating a dotnet framework application to dotnet core and during the process I've found a possible issue.

On dotnet core in case all ThreadPool threads are blocked (waiting for computation to be finished for example) then Parallel.For doesn't return however it does on framework. In the app I'm working on the maximum ThreadPool thread count is limited to processor count (ThreadPool.SetMaxThreads(Environment.ProcessorCount, Environment.ProcessorCount);).

I've been able to reproduce the issue with ~60 lines of code which you can find here.

My question is: is this the expected behaviour? If yes then how can I get it to work like in framework?

Thanks,
Peter

Author: ficzerepeti
Assignees: stephentoub
Labels:

bug, area-System.Threading.Tasks

Milestone: 9.0.0

@stephentoub
Copy link
Member

The "easy" fix would likely lead to an exception being thrown and caught on every Parallel.For invocation, which isn't ideal. We might need https://github.com/dotnet/corefx/issues/8142 to do this correctly.

Now that we have SuppressThrowing, I gave this a quick go, but it's unfortunately not as easy as I'd hoped. We can easily create a CancellationTokenSource, create the tasks with its CancellationToken, Cancel the CTS once the initial replica task completes, and ignore any cancellation while waiting via SuppressThrowing. But in order for the Task being waited on to actually transition to the canceled state after it's been started, the scheduler's TryDequeue needs to return true, and that can only happen for the default scheduler if the task being dequeued is queued to the work-stealing queue for the thread calling Cancel. And that's frequently not going to be the case here.

@stephentoub stephentoub removed their assignment Jan 24, 2024
@stephentoub stephentoub modified the milestones: 9.0.0, Future Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants