Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no preparing data-store jobs, decrement submit-num on restart #5011

Merged
merged 6 commits into from
Jul 26, 2022

Conversation

dwsutherland
Copy link
Member

@dwsutherland dwsutherland commented Jul 25, 2022

These changes close #4994

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation update required.

@dwsutherland dwsutherland self-assigned this Jul 25, 2022
@dwsutherland dwsutherland force-pushed the no-preparing-store-jobs branch 2 times, most recently from 1c2956a to 1cb2dbe Compare July 25, 2022 09:47
@MetRonnie MetRonnie added this to the cylc-8.0.0 milestone Jul 25, 2022
@MetRonnie MetRonnie added bug Something is wrong :( small labels Jul 25, 2022
@MetRonnie MetRonnie self-requested a review July 25, 2022 10:03
@dwsutherland dwsutherland force-pushed the no-preparing-store-jobs branch 2 times, most recently from 387c45c to 8e5c277 Compare July 25, 2022 11:54
@MetRonnie
Copy link
Member

I have written an integration test at dwsutherland#7

Turned out to be tricky. I couldn't figure out how to successfully test that the job gets put in the data store on the second try

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested as working 🎉 Thanks @dwsutherland for going above and beyond the call of duty to get this done after midnight 💐

@dwsutherland
Copy link
Member Author

I have written an integration test at dwsutherland#7

Turned out to be tricky. I couldn't figure out how to successfully test that the job gets put in the data store on the second try

@MetRonnie - Merged in.

Looks like there's nothing in the store yet. I put this:

        print(schd.data_store_mgr.data[schd.data_store_mgr.workflow_id]['jobs'])
        print(schd.data_store_mgr.added['jobs'])
        print(schd.data_store_mgr.updated['jobs'])

above the test you're trying to run:

        assert await gql_query(client, '''
            jobs {
                cyclePoint, name, submitNum
            }
        ''') == {
            'jobs': [{
                'cyclePoint': '1',
                'name': 'one',
                'submitNum': 1
            }]
        }

and they were all empty:

E           AssertionError: assert {'jobs': []} == {'jobs': [{'c...bmitNum': 1}]}
E             Differing items:
E             {'jobs': []} != {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 1}]}
E             Full diff:
E             - {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 1}]}
E             + {'jobs': []}

tests/integration/test_data_store_mgr.py:372: AssertionError
------------------------------- Captured stdout call -------------------------------
{}
{}
{}

but from the log .. the job was submitted .. so perhaps the test happened before it was run or after the job finished (and was deleted).

@dwsutherland dwsutherland force-pushed the no-preparing-store-jobs branch from b5b1eea to 4164d1b Compare July 26, 2022 05:33
@dwsutherland
Copy link
Member Author

dwsutherland commented Jul 26, 2022

@MetRonnie - Test working now! 🎉

Just added a 2nd yield to allow the scheduler to do some more processing, and it worked:

-------------------------------- Captured stdout call --------------------------------
{'~sutherlander/cit-20220726T150453+0935/integration.test_data_store_mgr/test_ghost_job/fa0642be-0ca3-11ed-8690-cdc6e1fdd4bc//1/one/01': stamp: "~sutherlander/cit-20220726T150453+0935/integration.test_data_store_mgr/test_ghost_job/fa0642be-0ca3-11ed-8690-cdc6e1fdd4bc//1/one/[email protected]"
id: "~sutherlander/cit-20220726T150453+0935/integration.test_data_store_mgr/test_ghost_job/fa0642be-0ca3-11ed-8690-cdc6e1fdd4bc//1/one/01"
submit_num: 1
state: "submitted"
task_proxy: "~sutherlander/cit-20220726T150453+0935/integration.test_data_store_mgr/test_ghost_job/fa0642be-0ca3-11ed-8690-cdc6e1fdd4bc//1/one"
submitted_time: "2022-07-26T15:04:55+09:35"
job_id: "130831"
job_runner_name: "background"
platform: "localhost"
job_log_dir: "/home/sutherlander/cylc-run/cit-20220726T150453+0935/integration.test_data_store_mgr/test_ghost_job/fa0642be-0ca3-11ed-8690-cdc6e1fdd4bc/log/job/1/one/01"
environment: "{}"
directives: "{}"
param_var: "{}"
name: "one"
cycle_point: "1"
}
{}
{}

(added another to the first, just to be consistent)

Thanks

@dwsutherland dwsutherland force-pushed the no-preparing-store-jobs branch from 4164d1b to 2633b94 Compare July 26, 2022 05:42
@dwsutherland
Copy link
Member Author

hmm.. new test seems flaky

@dwsutherland dwsutherland force-pushed the no-preparing-store-jobs branch from 2633b94 to 0d74179 Compare July 26, 2022 07:48
@dwsutherland
Copy link
Member Author

dwsutherland commented Jul 26, 2022

hmm.. new test seems flaky

Ok looks like that's done it (added a pause and sleep to the task)

@dwsutherland
Copy link
Member Author

Also, just to confirm, with the decrement commented out:

            elif status == TASK_STATUS_PREPARING:
                # put back to be readied again.
                status = TASK_STATUS_WAITING
                # Re-prepare same submit.
                #itask.submit_num -= 1

the test fails

E           AssertionError: assert {'jobs': [{'c...bmitNum': 2}]} == {'jobs': [{'c...bmitNum': 1}]}
E             Differing items:
E             {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 2}]} != {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 1}]}
E             Full diff:
E             - {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 1}]}
E             ?                                                           ^
E             + {'jobs': [{'cyclePoint': '1', 'name': 'one', 'submitNum': 2}]}
E             ?

So we're good 👍

@MetRonnie
Copy link
Member

I was hoping to avoid being reliant on sleep durations for waiting for it to appear in the data store, may need to revisit this post-8.0.0 otherwise the test could end up being flaky

@dwsutherland
Copy link
Member Author

dwsutherland commented Jul 26, 2022

I was hoping to avoid being reliant on sleep durations for waiting for it to appear in the data store, may need to revisit this post-8.0.0 otherwise the test could end up being flaky

Well, the problem is the workflow needs time to run to a point where there's a job... And that's exactly what your ascyncio sleep was doing (along with yielding control), because if you yield for too long then the workflow will complete before you run the tests.. So I had to extend the run time of the workflow with sleep.

Not ideal, I agree.. but at least the ordinary sleep is in the task script... I think you'd need to make it a functional test otherwise (where the task outputs the graphql to a file)

@MetRonnie
Copy link
Member

MetRonnie commented Jul 26, 2022

Ok after wrestling with bash syntax I have managed to convert it into a non-flaky functional test: dwsutherland#9

Sorry for the revert

Update: I have gone ahead and pushed that onto this branch, to speed things along

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am happy with the codebase changes)

@oliver-sanders oliver-sanders merged commit 1954c58 into cylc:master Jul 26, 2022
@hjoliver
Copy link
Member

🎉 nice work all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data store: preparing jobs appearing on restart
4 participants