-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task stuck in running when replacing deployment #60
Comments
I would agree with your assumption that it is related to #41 . This version of the driver is a little bit wonky on how it goes about performing these status changes. I have an unfinished v0.2.0 version of this driver in a different branch that should address this problem. I haven't touched it in a long while. I can try and allocate some time here in the coming days to help address this problem for ya. |
Yeah I suspected it was the same root cause. If the 2.0 driver is working I can test it out on my workload as it is. If that would help. |
I'm having a few issues with raw_exec on windows VMs which might be related? hashicorp/nomad#11939 |
Hmmm, possibly, but I am inclined to believe it's more of a problem on this code base's end than on Hashicorp's nomad code. I could be wrong and judging from the response you got, this is a known issue that's hard to replicate. I'll try and keep an eye as things progress on that issue and its potential root cause. |
The windows desktop heap configuration solving hashicorp/nomad#11939 (comment) appears to have mitigated this issue as well. The IIS processes no longer get "stuck" as they did previously. I'm cautiously optimistic after seeing no problems on the windows VMs now for a few days :) |
The issue with tasks getting stuck has resurfaced, although much more rarely than before. Is there any quick workaround to getting the tasks unstuck manually? It happens so rarely that doing it manually is not out of the question. I've tried various commands with the nomad cli |
Yea, this is what I have experienced on some occasions with test automation. I am not 100% sure where the fault lies between Nomad or the driver itself. This kinda goes back to me thinking it is a logic problem in how the driver tries to handle its state in this version vs the v0.2. To guarantee a clean slate for a single node, I ended up clearing all allocs manually in IIS and deleting the nomad db/data files/dirs on the client. I have not done so in a clustered environment and I don't know how the nomad servers themselves would treat the scenario (unchanged, dead, failed states for allocs). It may require a forced garbage collect of the system afterwards (https://www.nomadproject.io/docs/commands/system/gc). |
Issue
I have a problem where tasks occasionally get stuck when updating:
I'm thinking the issue occurs when cleaning up the app pool fails for any reason. It's possibly to manually remove the app pool and site, but the nomad status is still stuck in running. Might be related to #41?
IIS/VM state
Eventlog:
Contents of \?\C:\inetpub\temp\apppools\b5f132be-a977-3106-ef71-b936ad90c24b\b5f132be-a977-3106-ef71-b936ad90c24b.config
<!-- ERROR: There's been an error reading or processing the applicationhost.config file. Line number: 0 Error message: Cannot read configuration file -->
IIS apppools:
IIS sites:
Logs
Output from
nomad alloc status b5f132be-a977-3106-ef71-b936ad90c24b
Output from
nomad alloc status 0a3dac7b-8377-d0f2-8eb6-9b85bd0c6d39
The text was updated successfully, but these errors were encountered: