Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminating dotnet run doesn't terminate child #8610

Closed
mickaelistria opened this issue Aug 12, 2017 · 26 comments
Closed

terminating dotnet run doesn't terminate child #8610

mickaelistria opened this issue Aug 12, 2017 · 26 comments
Assignees
Milestone

Comments

@mickaelistria
Copy link

Steps to reproduce

[Terminal 1]
$ dotnet new razor
$ dotnet run
Hosting environment: Production
Content root path: /home/mistria/runtime-EclipseApplication/tst
Now listening on: http://localhost:5000
Application started. Press Ctrl+C to shut down.
[Terminal 2]
$ ps aux | grep dotnet
mistria  24040  2.2  0.7 3296988 88604 pts/0   SLl+ 12:07   0:01 dotnet run
mistria  24143  1.1  0.5 21300684 71156 pts/0  SLl+ 12:07   0:00 dotnet exec /home/mistria/runtime-EclipseApplication/tst/bin/Debug/netcoreapp2.0/tst.dll
$ kill -15 24040 #SIGTERM on host
$ ps aux | grep dotnet
mistria  24143  1.1  0.5 21300684 71156 pts/0  SLl+ 12:07   0:00 dotnet exec /home/mistria/runtime-EclipseApplication/tst/bin/Debug/netcoreapp2.0/tst.dll
# child still there like a zombie

Expected behavior

Child process terminated

Actual behavior

Child process is still there
This can be confusing for IDEs (like Eclipse IDE) which bind the "stop" button and other UI actions to the Terminate signal. In such case, the developer trying dotnet run from the IDE will face issues because of the zombi-ish processes

Environment data

dotnet --info output:

.NET Command Line Tools (2.0.1-servicing-006924)

Product Information:
 Version:            2.0.1-servicing-006924
 Commit SHA-1 hash:  1ed6be56ca

Runtime Environment:
 OS Name:     fedora
 OS Version:  26
 OS Platform: Linux
 RID:         fedora.26-x64
 Base Path:   /home/mistria/apps/dotnet-2/sdk/2.0.1-servicing-006924/

Microsoft .NET Core Shared Framework Host

  Version  : 2.0.0
  Build    : e8b8861ac7faf042c87a5c2f9f2d04c98b69f28d
@jonaskello
Copy link

Maybe using dumb-init could help with this?

@omajid
Copy link
Member

omajid commented Sep 8, 2017

Maybe using dumb-init could help with this?

Maybe for containers. This is just using a plain dotnet command on a normal OS.

@psmolkin
Copy link

psmolkin commented Sep 21, 2017

Same problem with dotnet test. While terminating main process (eg timeout) all Its childs stays and locks assemblies, takes up processors time etc.

@alexsandro-xpt
Copy link

I'm tool, same problem.

@livarcocc
Copy link
Contributor

on @mickaelistria's case, about dotnet razor, I don't think it is dotnet's job to terminate those processes. The CTRL+C message there is coming from asp.net itself and when you CTRL+C it is asp.net's responsibility to terminate any extra processes that it may have initiated when it gets terminated.

Also, I couldn't repro this right now when I tried.

As for dotnet test, @psmolkin would you happen to have a repro for it?

@livarcocc livarcocc self-assigned this Nov 23, 2017
@psmolkin
Copy link

psmolkin commented Nov 23, 2017

@livarcocc I tried to reproduce it but as I think the problem was fixed. Now I have installed CLI version 2.0.3

@psmolkin
Copy link

@livarcocc

Steps to reproduce:

  • Create a project:
    • Ap.Ms.Test.csproj :
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>netcoreapp2.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="15.5.0" />
    <PackageReference Include="MSTest.TestAdapter" Version="1.2.0" />
    <PackageReference Include="MSTest.TestFramework" Version="1.2.0" />
  </ItemGroup>

</Project>
  • UnitTest1.cs :
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;

namespace Ap.Ms.Test
{
    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void TestMethod1()
        {
            while (true)
            {
                Assert.IsTrue(true);
            }
            Assert.IsTrue(true);
        }
    }
}
  • run dotnet test
    image
  • run procexp.exe and find parent donet.exe
    image
  • then kill parnet process (9864)
    image

Expected

  • All child processes are stopped

Actually

  • Childs continue running and lock assemblies

@etinin
Copy link

etinin commented Apr 17, 2018

getting this with 2.1.0-preview2-26406-04, both on Linux and on Windows

@gjaryczewski
Copy link

I have observed a similar beahoviour of not working Ctrl+C BUT only when using a hosted services. I used sample codes from https://github.com/aspnet/Docs/tree/master/aspnetcore/fundamentals/host/hosted-services/samples/, in time and scope variant, and this is repeatable and permanent behavior.

@gjaryczewski
Copy link

gjaryczewski commented Jun 27, 2018

And one observation more: when I press any key (e.g. "f") after Ctrl+C, than I see an error message (translated from Polish: "Name 'f' is not recognized as an internal or external command, exacutable or batch script."), and the process is terminating, without any other messages in console. It applies to SDK 2.1.

@brthor
Copy link
Contributor

brthor commented Jul 2, 2018

I hit this in @coherenceApi on linux recently.

I managed to work around it by using the method here.

This is a nodejs specific solution, to kill the entire process group. I happen to be using nodejs to launch dotnet run so this works for me, but obviates a need for a similar mechanism in .NET (no idea if one already exists).

var spawn = require('child_process').spawn;

var child = spawn('my-command', {detached: true});

process.kill(-child.pid);

IMO the cli should be terminating the entire process group @livarcocc , but there should be an api for that in corefx (no idea if there is one).

FYI @etinin @gjaryczewski @mickaelistria

@tskarman
Copy link

A few PowerShell (run as Administrator) one-liners to work around that:

# Terminate "dotnet run" process trees
Get-WmiObject Win32_Process -Filter "name = 'dotnet.exe'" | Where-Object -Property CommandLine -Like '*dotnet.exe" run*' | ForEach-Object { Get-Process -Id $_.ProcessId } | ForEach-Object { taskkill /F /T /PID $_.Id  }

# Terminate "dotnet watch run" process trees
Get-WmiObject Win32_Process -Filter "name = 'dotnet.exe'" | Where-Object -Property CommandLine -Like '*dotnet.exe" watch run*' | ForEach-Object { Get-Process -Id $_.ProcessId} | ForEach-Object { taskkill /F /T /PID $_.Id  }

Run at your own risk.
You can adjust the CommandLine filter to be more specific. Which goes well with running dotnet --project .\project1.csproj which then allows you to match the particular .csproj file on the CommandLine.
The Get-Process pipeline step is unnecessary but I left it in there for easier diagnosing.

@tskarman
Copy link

I think it is reasonable to expect dotnet run/dotnet watch/etc to reliably shutdown its child processes.

Still:

  • I've found that one indirect cause of this, is the inability of the actual running code to come to a halt.
    In my case a third-party library was holding up the shutdown of the application. It looks like it manages its own thread pool and deadlocked in its clean-up logic.
    Not sure why that could hold up the shutdown (the threads look to be background threads). It might be that something deadlocks the "exit" thread.
  • This happens for both console applications and Kestrel-type (i.e. WebHost) applications
  • Make sure to dispose of your ServiceProvider instances
  • And be careful around disposal of your injected instances. It looks like the ServiceProvider implementations are not guaranteed to run .Dispose() on objects implementing IDisposable (e.g. on types registered as singletons).

@martinbliss
Copy link

Looks like this thread is still active but stale. It was originally open almost a year ago: do we have consensus that this is real and should be addressed somewhere and slotted for a future release?

@cole21771
Copy link

cole21771 commented Oct 29, 2018

I'd really like this to be fixed. I'm just getting into the dotnet platform and would really like to take advantage of the speed of linux over windows for our server, but in testing we keep running into this issue which putting us down a bit.

How I encountered the issue

System

Ubuntu 18.04.1 LTS 1 cpu, 1gb ram (AWS EC2 free tier)

Install dotnet-sdk-2.1

wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb

sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1

Install Node.js

curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash -
sudo apt-get install -y nodejs

Create Angular / .net project

dotnet new angular -o angular-test-project

Start project

export ASPNETCORE_Environment=Development
dotnet build
dotnet run

Wait for the angular build to finish. After this, kill the project with Ctrl + C. Sometimes it successfully says Application is shutting down and then shuts down. Other times, it will just say Application is shutting down and then sit there forever meaning something is blocking in some way most likely (my best guess). If you Ctrl + C again after this, the memory will remain being used by this zombie process. You can see in the screenshot below I had to press Ctrl + C twice before it went back to bash.
image

If you look in htop, it seems to be a bunch of ng processes which is strange. So maybe it is just something as simple as the ng serve is stuck open. You can see htop below.
image

@omajid @livarcocc Hopefully this helps move this along in some way.

@livarcocc
Copy link
Contributor

If closing a asp.net app is not terminating the process, I believe this is an issue that should be investigated by the ASP.NET team. Please, file a new issue on aspnet/home.

Another possibility is that dotnet has three long running processes that stay around to help with inner loop performance: roslyn, msbuild and razor. You can terminate these processes by running dotnet build-server shutdown.

@patricksuo
Copy link

@livarcocc

quoting from asp.net issue tracker:

aspnet/Hosting#960 (comment)

dotnet doesn't forward Ctrl-C to child processes, so we won't be able to fix this in Hosting itself. Closing per the triage team's decision.

@livarcocc
Copy link
Contributor

@jkotas do you know if this is a limitation of the host or the process APIs?

@jkotas
Copy link
Member

jkotas commented Oct 29, 2018

@jkotas do you know if this is a limitation of the host or the process APIs?

@livarcocc Killing parent process does not automatically kill child processes. It is standard behavior of process APIs in both .NET and elsewhere. For example, if you launch a child process using C/C++ system function, killing the parent process won't automatically kill the child process either.

You can do a best effort to kill child process in dotnet run. It will fix the common cases, but the bad cases where things go wrong will still leave orphaned processes behind.

A more reliable solution is to kill the process group in the first place. E.g. replace "kill -15 24040 #SIGTERM on host" in the repro above with "kill -15 -- -24040 #SIGTERM on host process group".

@peterhuene peterhuene self-assigned this Nov 20, 2018
@Tomamais
Copy link

Well, I've found a temporary solution to the problem. Again, it's TEMPORARY. It's wonky, but it works.

https://github.com/Tomamais/StartAndStopDotNetCoreApp

@peterhuene
Copy link
Contributor

peterhuene commented Jan 4, 2019

I am considering making a fix that addresses this issue.

On POSIX systems, dotnet run would handle the SIGTERM signal (via AppDomain.CurrentDomain.ProcessExit) and then first attempt to kill its own pid as a process group (i.e. -pid), masking SIGTERM for its process to prevent reentry. This should kill all the children of dotnet that did not break away from the process group. However, if dotnet is not running in its own process group (i.e. it was spawned with an inherited pgid) then at best we can try to kill the child pid because Process.Start spawns new processes with inherited pgid (see dotnet/corefx#17412).

On Windows, we could potentially create a job object, associate the new process to the job, and set the job object to terminate all processes in the job upon rude termination of dotnet. I'm loathe to introduce job objects where we don't have explicit control over the process tree, however. I know from experience that spawning user programs that suddenly find themselves in a job may fail when spawning their own processes (i.e. break-away).

Still, considering the fix would effectively translate the user action of kill -15 $dotnet_pid to kill -15 -- -$dotnet_pid (or use taskkill /t /pid $dotnet_pid on Windows), it seems like the workaround is trivial compared to the effort of a fix. The question comes down to user expectation and experience with process tree management, really.

@herebebeasties
Copy link

You can pick up CTRL+C on Windows (C# Console.CancelKeyPress, implemented behind the scenes via kernel32.dll's SetConsoleCtrlHandler on Windows).
Well-behaved consoles should send that, I'd have thought?

@peterhuene
Copy link
Contributor

peterhuene commented Jan 4, 2019

Hi @herebebeasties. Consistent SIGINT/Ctrl-C handling from dotnet run is a different issue and one I am also producing a fix for. SIGINT is sent to the foreground process group in POSIX and, similarly, to all processes attached to the current console on Windows. Thus, the children of dotnet run see the signal and can respond to it appropriately; for example, this is why Ctrl-C of an ASP.NET Core application running from dotnet run prints the shutdown message. The problem there is that the default handling of SIGINT from dotnet run causes a different exit status to the parent then it would have if the parent had spawned the child program directly. It also causes an immediate return back to the waiting parent and the child program may still be writing to stdout when handling the signal, resulting in inconsistent and undesirable output in an interactive shell session.

This issue is about sending SIGTERM or using task manager / taskkill to terminate the dotnet run process and having it reliably clean up the child processes. Unlike SIGINT, SIGTERM is usually sent with the kill command and that, unless you manually tell it to send to a process group, will only go to the individual process specified. On Windows, killing a process that has no message queues (e.g. a console .NET application like the dotnet CLI) from task manager / taskkill will rudely terminate the process, resulting in zombied children unless the application takes (complex) steps to prevent it.

@Code-DJ
Copy link

Code-DJ commented Jan 18, 2019

We are seeing similar behavior with Task Scheduler in Windows.

Scheduled run.bat which runs "dotnet foo.dll" in Task Scheduler with "Stop the task if it runs longer than: X".

Past the time, we see "Task stopping due to timeout reached" in the History of the task but "dotnet foo.dll" continues to run and can be seen in Task Manager Details tab.

Is this the same problem as this issue? Thanks!

@cchamberlain
Copy link

@Code-DJ Task Scheduler has been unreliable for ages. I've had better luck creating services that do this programatically.

@msftgits msftgits transferred this issue from dotnet/cli Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests