-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DOCKER_IMAGE tag in testslurm.yml #697
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #697 +/- ##
==========================================
- Coverage 83.31% 82.99% -0.32%
==========================================
Files 22 22
Lines 4873 4894 +21
Branches 1401 0 -1401
==========================================
+ Hits 4060 4062 +2
- Misses 809 832 +23
+ Partials 4 0 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
btw, you may want to see if you can build this dockerfile: https://github.com/tazend/docker-centos7-slurm/blob/1cdc401df445ecf00e2db431a99c583eda950300/Dockerfile as it contains the latest slurm you could drop the python versions lower than 3.8. |
Sure, I'll start working on it. |
I built the docker image, it can be found here .
|
nice work @adi611 - it may be good to track down those specific tests to see what's going on perhaps executing and debugging why the slurm return doesn't have a job id. at least we have an up to date slurm container now! for this PR perhaps change to your newly built slurm container. |
@adi611 - Have you tried to repeat the run? Do you have the errors every time you run? I've run the test with your new image on my laptop and can't reproduce the errors... But I run twice the GA in this PR and seems to work fine. fyi. still don't have access to the MIT slurm computers, so can't compare. |
Should I create a PR to run the tests using all the available python versions (3.8.16, 3.9.16, 3.10.9, 3.11.1) for the container or using just the default version (3.11.1)? |
let's just get the default working. it may be overkill to try all at the moment. they are already being tested normally outside of slurm. |
Yes I did re-run the workflow but I still got the same errors |
Ok sure. |
just want to confirm that with I don't understand why |
I think there may be some confusion.
|
Yes the errors exist even while limiting pytest to a single test from the list of failed tests, as can be seen here. |
I was suggesting removing the `-n` option just based on what i see when
running the test with the container on my laptop
…On Tue, Sep 12, 2023, 04:21 Aditya Agarwal ***@***.***> wrote:
I think there may be some confusion.
- I mentioned having issues with -n auto in #694
<#694>
- I tried running the current GA workflow without -n auto and I still
get the same errors
—
Reply to this email directly, view it on GitHub
<#697 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMV6GR34SNJIZT2EBN6RWTX2ALRRANCNFSM6AAAAAA4QXHBXE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@adi611 - I've just checked the GA reports and I see that there is |
I ran it separately here. But I should update the pytest command in the current PR to check if it runs fine without the |
ok, it looks like for GA, the option doesn't really make any difference... Do you also see the same error when running on your laptop? |
Yes I tried it on my laptop and I get the same error |
The issue seems to be more random, with the two |
I am unable to find such a check |
Logs:
(The line numbers for |
The same test:
|
instead of just running it, you can run the test with the pdb option, you can also enable debug logging around the relevant parts of the code. that may give you more insight into the state of the system. i suspect this is a resource contention with the slurm database issue. and couple of retries could help or looking at stderr and stdout. |
@adi611 - I've noticed that some tests do not use pytest's fixture: |
I checked and many of the failed tests, like |
Currently I am unable to reproduce the issue for single tests |
This seems to be a python 3.11.1 specific issue, should I try the newer version for 3.11 like 3.11.5 and see if it works? I have seen discussions on cpython with similar issues and maybe they have rolled out a fix for it. |
If you're testing on the current Python branch, it is best to try out the latest published version first and then bissect with previous versions if you notice a regression. Could you reference the exact issues you think may be of interest on cpython? |
This is one such issue |
thanks for tracking this! yes, please check for 3.11.5! |
it looks like it works for 3.11.5! :) just remove 3.11.1, and we will merge it. |
is there a way to exclude a series of python dependencies in pydra's python config? we should add that to the PR so we know why we did this. |
Thanks! |
I added |
The Slurm workflow for 3.11.5 is failing at the
Is it possible this is due to the |
I've just restarted and it seems to work, if we have this issue again, we can increase the time |
@adi611 - I've realized that your name is not in the zenodo file, please open a PR if you want your name to be included! |
Thanks I'll do that, but I need some help since I've never done it before. Is there a preferred order for including my name? Also, what should I specify as my affiliation? |
we should think about the order, for now we don't use any rule except that Satra is at the end, so you can put your name before him. |
@adi611 - just a quick thing. can you post the slurm dockerfile somewhere? |
Thank you, perhaps you can add this to |
Types of changes
Summary
Update
DOCKER_IMAGE
tag intestslurm.yml
fromlatest
to21.08.6
Checklist