-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cylc set
bug fix for new flows
#6186
Conversation
4f4b551
to
121f92c
Compare
Needs a rebase to pull in flake8 fixes. |
I have created a crude integration/fnctional test at hjoliver#55 I think I can do better/faster, but was blocked by a simulation mode bug. |
121f92c
to
2e30d3f
Compare
@wxtim - rebased and added an integration test. My integration test is presumably similar to what you are working on? Trouble is, it passes on master too! I've run out of time to investigate further, for now. |
2e30d3f
to
1333320
Compare
Mine is a functional test using the integration test framework. Since you've written most of the test I'll leave the ball in your court I think. I have other 🐟 to 🥘 , not least a sim mode bug which will scupper skip mode too. |
(Update: a similar scenario works fine on master if the set and trigger happens for tasks that have never run before in any flow). |
2f8c30b
to
ef72370
Compare
Test tweaked and validated. It passes on master as an integration test for "the wrong reasons", which are not valid on this branch (it's to do with submit numbers being zero in integration tests - I can explain in detail if necessary). The same scenario does not pass on master as a functional test. |
|
cylc/flow/task_pool.py
Outdated
# TODO: Detecting removal after completion of some outputs probably | ||
# TODO: requires recording removal in the DB (set :expired maybe?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the scope of this PR. I'll post a note on Element, then raise an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you stick up the issue and add a link to it in this comment so we can find our way back to the context when needed.
Note, the cylc remove
work (will hopefully arrive in 8.4.0) may change this as removed tasks will be bumped into the None
flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've clarified the comments for now. This needs a bit of discussion before posting an issue - see Element chat.
Note this bit of code is to avoid respawning a previously-removed task (see #6066) - which is at odds with the proposed future of cylc remove
. (If erasing flow history, you want to be able to respawn the removed tasks, not avoid it).
cylc/flow/task_pool.py
Outdated
# TODO: Detecting removal after completion of some outputs probably | ||
# TODO: requires recording removal in the DB (set :expired maybe?). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you stick up the issue and add a link to it in this comment so we can find our way back to the context when needed.
Note, the cylc remove
work (will hopefully arrive in 8.4.0) may change this as removed tasks will be bumped into the None
flow.
(All discussions resolved except for the TODO one) |
tests/integration/test_task_pool.py
Outdated
'scheduling': { | ||
'cycling mode': 'integer', | ||
'graph': { | ||
'R1': 'a => b => c1 & c2 => z', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO test becomes clearer if
'R1': 'a => b => c1 & c2 => z', | |
'R1': 'a => good & failonce => z', |
And is the task b
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO test becomes clearer if
Well that makes it look as if success and failure are important for the test, which they're not. What matters is that only one of them gets respawned in the new flow, if the other one gets manually set to succeeded (in the new flow).
[UPDATE: not relevant to the updated test anyway]
And is the task b necessary?
Actually neither b nor z are needed. They were just to make the original functional case easier to manage. I'll remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually tested, and appears that the fix is working.
I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.
Is there a reason why this bug is on 8.3.2 milestone but open against master?
Code changes make sense to me.
I can confirm that |
@@ -73,7 +73,7 @@ def test_basic(checker): | |||
['output', '10000101T0000Z', 'succeeded'], | |||
['output', '10010101T0000Z', 'succeeded'], | |||
['good', '10000101T0000Z', 'waiting', '(flows=2)'], | |||
] | |||
['good', '10010101T0000Z', 'waiting', '(flows=2)'], ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, this task appears in the DB because the workflow has run on from the triggered task. This is an unintended side-effect of the await sleep(1)
in the test fixture.
I've opened a PR to replace the sleep with a DB update call #6212, however, if we want to keep this behaviour to allow the workflow to run to completion again (to test this PR), then replace the sleep
with await mod_complete(schd)
(and close #6212).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've not had time to think about this today)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Reviewers - this is a test framework issue, so not a reason to delay this PR)
This should be rebased onto 8.3.x (even if it doesn't make it into the 8.3.2 release) |
I think I probably opened this PR right after 8.3.0 was released, before the 8.3.x branch was made, and forgot to come back and rebase it. But maybe it was just a mistake. Anyhow, rebased now. |
09028c8
to
9eafa2e
Compare
I commented on this above The integration test essentially replicates the functional test that fails on master, but as an integration test it passes on master "for the wrong reasons" - to do with submit numbers being zero in integration tests. UPDATE: I've simplified the test and it does now fail on master. Note the last commit clarifies and documents spawning logic a bit (it was too hard to follow!). |
aa501cb
to
c0ac01e
Compare
c0ac01e
to
8d6b02a
Compare
# set b:succeeded in flow 2 and check downstream spawning | ||
schd.pool.set_prereqs_and_outputs(['1/b'], prereqs=[], outputs=[], flow=[2]) | ||
assert schd.pool.get_task(IntegerPoint("1"), "c1") is None, '1/c1 (flow 2) should not be spawned after 1/b:succeeded' | ||
assert schd.pool.get_task(IntegerPoint("1"), "c2") is not None, '1/c2 (flow 2) should be spawned after 1/b:succeeded' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a sign we are ready to throw off the chains of oppression and embrace line lengths > 79 chars? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a sign that writing it like this doesn't work (always True):
assert (
var is None,
'string'
)
However, I suppose I should have done this:
assert var is None, (
'string'
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a sign that we don't run the linter on the test code!
Limiting line length is one thing that linters can all agree on.
linter/formatter/guide | length |
---|---|
pep8 | 79 |
80 | |
flake8 | 80 |
black | 88 |
styleguide (ruby) | 80 |
rubocop (ruby) | 80 |
prettier (js/ts) | 80 |
rustfmt (rust) | 80 |
Line length is arbitrary and each each tool is free to choose its own limit, yet most pick 80. This is not a coincidence:
- Readability (reading vertically is faster than reading horizontally, code structure is clearer when better spaced)
- Reduced diff surface (diff highlights changes more clearly, reduced merge conflicts)
- Accessibility (works better with larger font sizes)
- Works better with tooling (e.g. side-by-side diffs)
Why Bother with 80 characters in a World of Modern Widescreen Displays?
A lot of people these days feel that a maximum line length of 80 characters is just a remnant of the past and makes little sense today. After all - modern displays can easily fit 200+ characters on a single line. Still, there are some important benefits to be gained from sticking to shorter lines of code.
First, and foremost - numerous studies have shown that humans read much faster vertically and very long lines of text impede the reading process. As noted earlier, one of the guiding principles of this style guide is to optimize the code we write for human consumption.
Additionally, limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns.
The default wrapping in most tools disrupts the visual structure of the code, making it more difficult to understand. The limits are chosen to avoid wrapping in editors with the window width set to 80, even if the tool places a marker glyph in the final column when wrapping lines. Some web based tools may not offer dynamic line wrapping at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree lines should be kept short if possible, but sometimes 79 is too short for the content and results in less readable code. I would support moving to black's 88 limit
Merging with missing changelog entry added! |
if ( | ||
prev_status is not None | ||
and not itask.state.outputs.get_completed_outputs() | ||
): | ||
# If itask has any history in this flow but no completed outputs | ||
# we can infer it was deliberately removed, so don't respawn it. | ||
# TODO (follow-up work): | ||
# - this logic fails if task removed after some outputs completed | ||
# - this is does not conform to future "cylc remove" flow-erasure | ||
# behaviour which would result in respawning of the removed task | ||
# See github.com/cylc/cylc-flow/pull/6186/#discussion_r1669727292 | ||
LOG.debug(f"Not respawning {point}/{name} - task was removed") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is to be done about this RE: cylc remove
? Should this block just be removed? @hjoliver CC @oliver-sanders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I need to remind myself and think again about this. The linked discussion refers to Element chat. I'll try to find that, and we can continue the conversation there until it's clear what the problem is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However there are other ways that TaskPool.remove()
are called that are not as a result of cylc remove
, so this might still be needed to prevent a removed task from respawning immediately after being removed in one of those ways?
Thus I am thinking of leaving this in, just updating the comment (it no longer applies to tasks removed by cylc remove
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm concerned about that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(without diving into the code in depth)
The proposal does not change the behaviour of suicide-triggered tasks so that will need to be preserved.
This logic should not apply to the remove command any more though.
Damn it, found a bug in 8.3.0 for
cylc set --flow=new
on future tasks: the task_outputs table won't be updated because of "WHERE flow is n" in the DB statement.From: https://github.com/hjoliver/ecox-interventions?tab=readme-ov-file#scenario-2-rerun-some-tasks-from-upstream-of-a-failed-task
Example:
The scenario:
1/c_m2
fails, stalling the workflow due to a corrupted output from1/b
1/b
and1/c_m2
, but not1/c_m1, m3
, then continue to1/d
and finishMethod:
1/b
after setting1/c_m1, m3
to succeeded in the new flowcylc set test //1/c_m1 //1/c_m2 --flow=2
cylc trigger test//1/b --flow=2
On master this reruns
1/c_m1
and1/c_m3
because theset
outputs don't end up in the DB.Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.