-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflows manager #324
Workflows manager #324
Conversation
4666de9
to
64c91e1
Compare
@dwsutherland if you get the chance to take a look at this one that would be great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review (6/9 files viewed)
64c91e1
to
bddf77a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run the ui-server with branch since yesterday without any traceback, deleting stale workflows and have seen no traceback. I have started and stopped multiple workflows, workflow state appears to be working correctly.
I have also read the code, only one query.
Thanks @oliver-sanders
else: | ||
with suppress(IOError): | ||
client.stop(stop_loop=False) | ||
self.uiserver.data_store_mgr.disconnect_workflow(wid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need awaiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an async method though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what I meant was should we be marking disconnect_workflow
as async and so await this?
Not entirely sure why register_workflow
is async but this one isn't when neither look IO-bound.
Not fully got my head around using asyncio for multi-threading CPU-bound operations yet. Will have to use some training time on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, not sure why register_workflow
is defined async
Keep this one open for a mo, I'll slide in an extra commit to fix the issue Ronnie recently reported. |
Hmm, I was still able to reproduce the mismatch between |
Don't think it's related to the UI, if you run the UIS in debug mode ( |
(easy way to tell if it is UI related, refresh the browser and see if the issue persists) |
Couldn't replicate with 10 runs of the integration tests. |
I'm having a go at replicating now. Will report back in a bit. Update: I have reproduced a running status message for a stopped workflow, |
ok, what does the debug output say? |
Debug output looked normal as far as I could see. It is a very transient problem as have only seen it once. I have closely inspected the status_msg code and can't spot anything obvious. Only thing I can think of is nfs maybe causing problems? |
Can you send me the debug output, it contains a record of every method called on the data store which should reveal the issue. |
|
Brill, that focuses me on connection failure, there should be only one side-effect of connection failure which is the workflow getting added to |
Codecov Report
@@ Coverage Diff @@
## master #324 +/- ##
==========================================
- Coverage 80.29% 77.96% -2.34%
==========================================
Files 10 10
Lines 1005 1053 +48
Branches 197 205 +8
==========================================
+ Hits 807 821 +14
- Misses 164 196 +32
- Partials 34 36 +2
Continue to review full report at Codecov.
|
I had replaced my usual $HOME expansion in the traceback with a Sorry for not making that clear, cylc/cylc-flow#4734 can probably just be closed. |
Ah, I thought we had failed to expand In that case it's still a cylc-flow issue but presumably caused by the script trying to remove the contact file only to find that either the contact file or .service dir is no longer there? It is perfectly possible that the contact file could have been removed since the client was created (even if that's a short time window another client started at the same time could have beat this one to removing the file, or a clean in progress at the time or whatever), so probably just need to quietly suppress this traceback. |
PR up cylc/cylc-flow#4735 |
cb234b3
to
4f84dde
Compare
Ok, last commit should do it, really hard to test properly without running a workflow so came up with a kinda half-test which at least ensures that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Read code, checked out and tested as running smoothly.
Minor typo, not a blocker though. Thanks @oliver-sanders
Note there is some new warnings on this branch on running pytest:
Task was destroyed but it is pending!
task: <Task pending name='Task-29' coro=<WorkflowsManager.run() running at ~/cylc-uiserver/cylc/uiserver/workflows_mgr.py:412> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f43db947bb0>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at ~/miniconda3/envs/cylc8/lib/python3.9/site-packages/tornado/ioloop.py:688]>
Task was destroyed but it is pending!
task: <Task pending name='Task-64' coro=<WorkflowsManager.run() running at ~/cylc-uiserver/cylc/uiserver/workflows_mgr.py:412> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f43db98d310>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at ~/miniconda3/envs/cylc8/lib/python3.9/site-packages/tornado/ioloop.py:688]>
Co-authored-by: Melanie Hall <[email protected]>
Will take a look at those warnings... |
The scan task was being sent the stop signal, however, the UIS was shutting down before the stop had completed. Strangely, even when the scan task has completed Tornado needs an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very strange. This latest commit has not solved the problem for me, I have run the tests as a set and am still getting the warnings. I also have run cylc/uiserver/tests/test_auth.py
and cylc/uiserver/tests/test_graphql.py
individually - these seem to be the source of the error. However on master they run without warning. Can you reproduce? Perhaps a race condition if not.
I get lots of other warnings but not the specific (python 3.7) |
Ahha I expect my python version is too high - Python 3.9.10 |
I'll see if I can replicate with 3.9 |
Managed to replicate, it's to do with the jupyter_server version. It looks like more recent jupyter_server is not calling the |
The warnings are due to a regression in jupyter_server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to shrug off my attempts to create horrible workflow/contact file states
Good I think that's everything addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick query about the commenting of the publish port.
Closes #223, #310, #311, #312 and most of #308
Starting and stopping a batch of workflows, before some would occasionally get stuck in the stopped state:
Taking a single workflow through a full cycle:
Removing the workflow database now causes the workflow to be re-registered (data from the old run will disappear):
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.