`cylc set` bug fix for new flows #6186

hjoliver · 2024-07-01T09:42:19Z

Damn it, found a bug in 8.3.0 for cylc set --flow=new on future tasks: the task_outputs table won't be updated because of "WHERE flow is n" in the DB statement.

From: https://github.com/hjoliver/ecox-interventions?tab=readme-ov-file#scenario-2-rerun-some-tasks-from-upstream-of-a-failed-task

Example:

[task parameters]
    m = 1..3
[scheduling]
    [[graph]]
        R1 = "a => b => c<m> => d"
[runtime]
    [[root]]
        pre-script = sleep 5
    [[a, b, d]]
    [[c<m>]]
    [[c<m=2>]]
        script = """
            if (( CYLC_TASK_SUBMIT_NUMBER == 1 )); then
                false
            fi
        """

The scenario:

1/c_m2 fails, stalling the workflow due to a corrupted output from 1/b
want to rerun 1/b and 1/c_m2, but not 1/c_m1, m3, then continue to 1/d and finish

Method:

run a new flow (2) from 1/b after setting 1/c_m1, m3 to succeeded in the new flow

cylc set test //1/c_m1 //1/c_m2 --flow=2
cylc trigger test//1/b --flow=2

On master this reruns 1/c_m1 and 1/c_m3 because the set outputs don't end up in the DB.

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

wxtim · 2024-07-02T11:35:04Z

Needs a rebase to pull in flake8 fixes.

wxtim · 2024-07-02T16:18:51Z

I have created a crude integration/fnctional test at hjoliver#55

I think I can do better/faster, but was blocked by a simulation mode bug.

hjoliver · 2024-07-05T07:56:26Z

@wxtim - rebased and added an integration test.

My integration test is presumably similar to what you are working on? Trouble is, it passes on master too! I've run out of time to investigate further, for now.

wxtim · 2024-07-05T08:22:01Z

@wxtim - rebased and added an integration test.

My integration test is presumably similar to what you are working on? Trouble is, it passes on master too! I've run out of time to investigate further, for now.

Mine is a functional test using the integration test framework. Since you've written most of the test I'll leave the ball in your court I think. I have other 🐟 to 🥘 , not least a sim mode bug which will scupper skip mode too.

hjoliver · 2024-07-08T00:39:16Z

(Update: a similar scenario works fine on master if the set and trigger happens for tasks that have never run before in any flow).

hjoliver · 2024-07-08T06:47:26Z

Test tweaked and validated. It passes on master as an integration test for "the wrong reasons", which are not valid on this branch (it's to do with submit numbers being zero in integration tests - I can explain in detail if necessary). The same scenario does not pass on master as a functional test.

hjoliver · 2024-07-08T06:49:49Z

~~Damn it, I seem to have broken two integration tests though. So not quite done.~~ DONE

hjoliver · 2024-07-08T09:25:12Z

cylc/flow/task_pool.py

+            # TODO: Detecting removal after completion of some outputs probably
+            # TODO: requires recording removal in the DB (set :expired maybe?).


Beyond the scope of this PR. I'll post a note on Element, then raise an issue.

Could you stick up the issue and add a link to it in this comment so we can find our way back to the context when needed.

Note, the cylc remove work (will hopefully arrive in 8.4.0) may change this as removed tasks will be bumped into the None flow.

I've clarified the comments for now. This needs a bit of discussion before posting an issue - see Element chat.

Note this bit of code is to avoid respawning a previously-removed task (see #6066) - which is at odds with the proposed future of cylc remove. (If erasing flow history, you want to be able to respawn the removed tasks, not avoid it).

cylc/flow/scripts/set.py

oliver-sanders · 2024-07-08T14:39:09Z

cylc/flow/task_pool.py

+            # TODO: Detecting removal after completion of some outputs probably
+            # TODO: requires recording removal in the DB (set :expired maybe?).


Could you stick up the issue and add a link to it in this comment so we can find our way back to the context when needed.

Note, the cylc remove work (will hopefully arrive in 8.4.0) may change this as removed tasks will be bumped into the None flow.

tests/integration/test_dbstatecheck.py

hjoliver · 2024-07-09T05:27:54Z

(All discussions resolved except for the TODO one)

wxtim · 2024-07-09T08:53:21Z

tests/integration/test_task_pool.py

+        'scheduling': {
+            'cycling mode': 'integer',
+            'graph': {
+                'R1': 'a => b => c1 & c2 => z',


IMO test becomes clearer if

Suggested change

'R1': 'a => b => c1 & c2 => z',

'R1': 'a => good & failonce => z',

And is the task b necessary?

IMO test becomes clearer if

Well that makes it look as if success and failure are important for the test, which they're not. What matters is that only one of them gets respawned in the new flow, if the other one gets manually set to succeeded (in the new flow).

[UPDATE: not relevant to the updated test anyway]

And is the task b necessary?

Actually neither b nor z are needed. They were just to make the original functional case easier to manage. I'll remove them.

tests/integration/test_task_pool.py

wxtim

Manually tested, and appears that the fix is working.

I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.

Is there a reason why this bug is on 8.3.2 milestone but open against master?

oliver-sanders · 2024-07-09T11:44:40Z

Code changes make sense to me.

I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.

I can confirm that test_set_future_flow passes on upstream/8.3.x so doesn't appear to be capturing this bug.

cylc/flow/task_pool.py

oliver-sanders · 2024-07-09T11:02:58Z

tests/integration/test_dbstatecheck.py

@@ -73,7 +73,7 @@ def test_basic(checker):
        ['output', '10000101T0000Z', 'succeeded'],
        ['output', '10010101T0000Z', 'succeeded'],
        ['good', '10000101T0000Z', 'waiting', '(flows=2)'],
-    ]
+        ['good', '10010101T0000Z', 'waiting', '(flows=2)'], ]


Note, this task appears in the DB because the workflow has run on from the triggered task. This is an unintended side-effect of the await sleep(1) in the test fixture.

I've opened a PR to replace the sleep with a DB update call #6212, however, if we want to keep this behaviour to allow the workflow to run to completion again (to test this PR), then replace the sleep with await mod_complete(schd) (and close #6212).

(I've not had time to think about this today)

(Reviewers - this is a test framework issue, so not a reason to delay this PR)

MetRonnie · 2024-07-09T16:22:08Z

This should be rebased onto 8.3.x (even if it doesn't make it into the 8.3.2 release)

hjoliver · 2024-07-09T23:40:51Z

Is there a reason why this bug is on 8.3.2 milestone but open against master?

This should be rebased onto 8.3.x (even if it doesn't make it into the 8.3.2 release)

I think I probably opened this PR right after 8.3.0 was released, before the 8.3.x branch was made, and forgot to come back and rebase it. But maybe it was just a mistake. Anyhow, rebased now.

hjoliver · 2024-07-09T23:58:19Z

I'm not convinced that the test is demonstrating the bug on upstream/8.3.x or master.

I can confirm that test_set_future_flow passes on upstream/8.3.x so doesn't appear to be capturing this bug.

I commented on this above

The integration test essentially replicates the functional test that fails on master, but as an integration test it passes on master "for the wrong reasons" - to do with submit numbers being zero in integration tests.

UPDATE: I've simplified the test and it does now fail on master. Note the last commit clarifies and documents spawning logic a bit (it was too hard to follow!).

MetRonnie · 2024-07-10T09:55:20Z

tests/integration/test_task_pool.py

+        # set b:succeeded in flow 2 and check downstream spawning
+        schd.pool.set_prereqs_and_outputs(['1/b'], prereqs=[], outputs=[], flow=[2])
+        assert schd.pool.get_task(IntegerPoint("1"), "c1") is None, '1/c1 (flow 2) should not be spawned after 1/b:succeeded'
+        assert schd.pool.get_task(IntegerPoint("1"), "c2") is not None, '1/c2 (flow 2) should be spawned after 1/b:succeeded' 


Is this a sign we are ready to throw off the chains of oppression and embrace line lengths > 79 chars? 😁

It's a sign that writing it like this doesn't work (always True):

assert ( var is None, 'string' )

However, I suppose I should have done this:

assert var is None, ( 'string' )

It's a sign that we don't run the linter on the test code!

Limiting line length is one thing that linters can all agree on.

linter/formatter/guide length

pep8 79

google 80

flake8 80

black 88

styleguide (ruby) 80

rubocop (ruby) 80

prettier (js/ts) 80

rustfmt (rust) 80

Line length is arbitrary and each each tool is free to choose its own limit, yet most pick 80. This is not a coincidence:

Readability (reading vertically is faster than reading horizontally, code structure is clearer when better spaced)

Reduced diff surface (diff highlights changes more clearly, reduced merge conflicts)

Accessibility (works better with larger font sizes)

Works better with tooling (e.g. side-by-side diffs)

Why Bother with 80 characters in a World of Modern Widescreen Displays?

A lot of people these days feel that a maximum line length of 80 characters is just a remnant of the past and makes little sense today. After all - modern displays can easily fit 200+ characters on a single line. Still, there are some important benefits to be gained from sticking to shorter lines of code.

First, and foremost - numerous studies have shown that humans read much faster vertically and very long lines of text impede the reading process. As noted earlier, one of the guiding principles of this style guide is to optimize the code we write for human consumption.

Additionally, limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns.

The default wrapping in most tools disrupts the visual structure of the code, making it more difficult to understand. The limits are chosen to avoid wrapping in editors with the window width set to 80, even if the tool places a marker glyph in the final column when wrapping lines. Some web based tools may not offer dynamic line wrapping at all.

-- https://rubystyle.guide/#max-line-length

I agree lines should be kept short if possible, but sometimes 79 is too short for the content and results in less readable code. I would support moving to black's 88 limit

MetRonnie · 2024-07-10T10:22:33Z

Merging with missing changelog entry added!

MetRonnie · 2024-09-02T17:33:07Z

cylc/flow/task_pool.py

+        if (
+            prev_status is not None
+            and not itask.state.outputs.get_completed_outputs()
+        ):
+            # If itask has any history in this flow but no completed outputs
+            # we can infer it was deliberately removed, so don't respawn it.
+            # TODO (follow-up work):
+            # - this logic fails if task removed after some outputs completed
+            # - this is does not conform to future "cylc remove" flow-erasure
+            #   behaviour which would result in respawning of the removed task
+            # See github.com/cylc/cylc-flow/pull/6186/#discussion_r1669727292
+            LOG.debug(f"Not respawning {point}/{name} - task was removed")
+            return None


What is to be done about this RE: cylc remove? Should this block just be removed? @hjoliver CC @oliver-sanders

Hmm, I need to remind myself and think again about this. The linked discussion refers to Element chat. I'll try to find that, and we can continue the conversation there until it's clear what the problem is.

However there are other ways that TaskPool.remove() are called that are not as a result of cylc remove, so this might still be needed to prevent a removed task from respawning immediately after being removed in one of those ways?

Thus I am thinking of leaving this in, just updating the comment (it no longer applies to tasks removed by cylc remove).

Yes I'm concerned about that too.

(without diving into the code in depth)

The proposal does not change the behaviour of suicide-triggered tasks so that will need to be preserved.

This logic should not apply to the remove command any more though.

hjoliver added the bug Something is wrong :( label Jul 1, 2024

hjoliver added this to the 8.3.1 milestone Jul 1, 2024

hjoliver self-assigned this Jul 1, 2024

hjoliver force-pushed the flow-set-bug-fix branch from 4f4b551 to 121f92c Compare July 2, 2024 00:02

oliver-sanders modified the milestones: 8.3.1, 8.3.2 Jul 3, 2024

hjoliver force-pushed the flow-set-bug-fix branch from 121f92c to 2e30d3f Compare July 5, 2024 07:53

hjoliver force-pushed the flow-set-bug-fix branch from 2e30d3f to 1333320 Compare July 5, 2024 07:58

hjoliver force-pushed the flow-set-bug-fix branch 2 times, most recently from 2f8c30b to ef72370 Compare July 8, 2024 06:43

hjoliver marked this pull request as ready for review July 8, 2024 06:44

hjoliver commented Jul 8, 2024

View reviewed changes

oliver-sanders reviewed Jul 8, 2024

View reviewed changes

wxtim self-requested a review July 8, 2024 15:02

wxtim reviewed Jul 9, 2024

View reviewed changes

tests/integration/test_task_pool.py Show resolved Hide resolved

wxtim requested changes Jul 9, 2024

View reviewed changes

oliver-sanders mentioned this pull request Jul 9, 2024

tests/i: remove sleep #6212

Merged

oliver-sanders reviewed Jul 9, 2024

View reviewed changes

MetRonnie self-requested a review July 9, 2024 15:48

hjoliver added 3 commits July 10, 2024 11:35

Fix for add new flow rows.

5ebf2e5

Fix integration tests.

46f789c

Address review comments.

9eafa2e

hjoliver force-pushed the flow-set-bug-fix branch from 09028c8 to 9eafa2e Compare July 9, 2024 23:41

Simplify new integration test.

52f389c

hjoliver changed the base branch from master to 8.3.x July 10, 2024 05:05

hjoliver force-pushed the flow-set-bug-fix branch from aa501cb to c0ac01e Compare July 10, 2024 05:06

clarify and document spawning logic

8d6b02a

hjoliver force-pushed the flow-set-bug-fix branch from c0ac01e to 8d6b02a Compare July 10, 2024 05:11

wxtim approved these changes Jul 10, 2024

View reviewed changes

MetRonnie approved these changes Jul 10, 2024

View reviewed changes

Changelog [skip ci]

5c28d99

MetRonnie merged commit 954f65e into cylc:8.3.x Jul 10, 2024

hjoliver deleted the flow-set-bug-fix branch July 18, 2024 02:17

wxtim mentioned this pull request Jul 22, 2024

Sim Mode: Not flow aware #6189

Closed

MetRonnie reviewed Sep 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cylc set` bug fix for new flows #6186

`cylc set` bug fix for new flows #6186

hjoliver commented Jul 1, 2024 •

edited by MetRonnie

Loading

wxtim commented Jul 2, 2024

wxtim commented Jul 2, 2024 •

edited

Loading

hjoliver commented Jul 5, 2024

wxtim commented Jul 5, 2024

hjoliver commented Jul 8, 2024

hjoliver commented Jul 8, 2024

hjoliver commented Jul 8, 2024 •

edited

Loading

hjoliver Jul 8, 2024

oliver-sanders Jul 8, 2024

hjoliver Jul 9, 2024 •

edited

Loading

oliver-sanders Jul 8, 2024

hjoliver commented Jul 9, 2024

wxtim Jul 9, 2024

hjoliver Jul 10, 2024 •

edited

Loading

wxtim left a comment •

edited

Loading

oliver-sanders commented Jul 9, 2024 •

edited

Loading

oliver-sanders Jul 9, 2024

hjoliver Jul 10, 2024

hjoliver Jul 10, 2024 •

edited

Loading

MetRonnie commented Jul 9, 2024

hjoliver commented Jul 9, 2024

hjoliver commented Jul 9, 2024 •

edited

Loading

MetRonnie Jul 10, 2024 •

edited

Loading

hjoliver Jul 10, 2024

oliver-sanders Jul 11, 2024

MetRonnie Jul 11, 2024

MetRonnie commented Jul 10, 2024

MetRonnie Sep 2, 2024

hjoliver Sep 2, 2024

MetRonnie Sep 3, 2024

hjoliver Sep 3, 2024

oliver-sanders Sep 3, 2024

		# TODO: Detecting removal after completion of some outputs probably
		# TODO: requires recording removal in the DB (set :expired maybe?).

	'R1': 'a => b => c1 & c2 => z',
	'R1': 'a => good & failonce => z',

linter/formatter/guide	length
pep8	79
google	80
flake8	80
black	88
styleguide (ruby)	80
rubocop (ruby)	80
prettier (js/ts)	80
rustfmt (rust)	80

cylc set bug fix for new flows #6186

cylc set bug fix for new flows #6186

Conversation

hjoliver commented Jul 1, 2024 • edited by MetRonnie Loading

wxtim commented Jul 2, 2024

wxtim commented Jul 2, 2024 • edited Loading

hjoliver commented Jul 5, 2024

wxtim commented Jul 5, 2024

hjoliver commented Jul 8, 2024

hjoliver commented Jul 8, 2024

hjoliver commented Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hjoliver Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hjoliver commented Jul 9, 2024

Choose a reason for hiding this comment

hjoliver Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

wxtim left a comment • edited Loading

Choose a reason for hiding this comment

oliver-sanders commented Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hjoliver Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

MetRonnie commented Jul 9, 2024

hjoliver commented Jul 9, 2024

hjoliver commented Jul 9, 2024 • edited Loading

MetRonnie Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MetRonnie commented Jul 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`cylc set` bug fix for new flows #6186

`cylc set` bug fix for new flows #6186

hjoliver commented Jul 1, 2024 •

edited by MetRonnie

Loading

wxtim commented Jul 2, 2024 •

edited

Loading

hjoliver commented Jul 8, 2024 •

edited

Loading

hjoliver Jul 9, 2024 •

edited

Loading

hjoliver Jul 10, 2024 •

edited

Loading

wxtim left a comment •

edited

Loading

oliver-sanders commented Jul 9, 2024 •

edited

Loading

hjoliver Jul 10, 2024 •

edited

Loading

hjoliver commented Jul 9, 2024 •

edited

Loading

MetRonnie Jul 10, 2024 •

edited

Loading