Avoid the effect of external environment on tasks#436
Conversation
|
Machine: hera |
|
Machine: jet |
MichaelLueken
left a comment
There was a problem hiding this comment.
These changes look good to me!
It would probably be a good idea to keep @mark-a-potts, @natalie-perlin, and @ulmononian in the loop. Will this method be the one used to move forward?
I'll kick off the Jenkins tests to see if they encounter any issues, in the meantime.
mark-a-potts
left a comment
There was a problem hiding this comment.
These changes look good to me, and when combined with Natalie's PR, they seem to fix some problems I was seeing on Jet. Assuming they pass the Jenkins tests, I propose we merge these and then Natalie and I can pull these changes into hers to get the miniconda updated as well.
|
@MichaelLueken There is an issue with this PR and cheyenne now that |
|
@danielabdi-noaa I have killed the Jenkins CI testing for Cheyenne. I'll resubmit once you have implemented a fix. |
|
@MichaelLueken I have made a fix that will hopefully make it work, so you can re-run jenkins. I am not sure if an empty pbspro line liek this |
|
@danielabdi-noaa It looks like your modification has corrected the issue on Cheyenne (the workflow is actually running with PBS job submission failures). Once the tests wrap up, I'll move forward with merging this work. |
|
@danielabdi-noaa and @mark-a-potts Bad news. It looks like the Cheyenne Intel run was successful, but the Cheyenne GNU run is still encountering the same issue as previously: 10/28/22 14:08:19 MDT :: FV3LAM_wflow.xml :: Submission of make_grid failed! qsub: directive error: None Looking in the FV3LAM_wflow.xml files for the Intel tests, the Stopping the Jenkins pipeline now. |
|
@MichaelLueken I feel like there is still some kind of clash between simultaneous runs on the same machine on Jenkins. In the past there were clashes between GNU and INTEL runs on Cheyenne because they were being assigned the same experiment directory. The other PR that fixes a bug on hera failed on Cheyenne too, but it really shouldn't since nothing changed there. It was not able to access the directory so maybe the other PR deleted it because the intel run was successful there. Maybe @jessemcfarland can take a look? |
|
@MichaelLueken More on this. There can't be conflict between different PRs but it looks there is still the issue with GNU and INTEL using same directories. For PR-438 run-1 For Build both have same directories, which is not right, but the build will work fine because the build directories are named For Test Intel failed because there is no |
|
@danielabdi-noaa Thanks for looking into this more. I saw that the PR #436 Jenkins working directory also had both build_intel and build_gnu in the same directory. @jessemcfarland has opened PR #441 to try and correct the behavior so that doesn't continue happening. |
|
@danielabdi-noaa As a follow-up, both a manual run of your branch for on Cheyenne for GNU and the Jenkins CI with GNU only are both submitting jobs properly. Once either the manual or Jenkins test is complete, I will go ahead and merge this work. |
|
@danielabdi-noaa @mark-a-potts Both the manual and Jenkins CI tests for GNU have successfully passed. This PR is ready to be merged. |
DESCRIPTION OF CHANGES:
This PR mainly addresses issue
Detailed list of changes
SCHED_NATIVE_CMDbe used on all systems not only forgaea--export=NONE--export=ALLload_modules_run_task.shto work first time (yet to be tested)Type of change
TESTS CONDUCTED:
To be conducted ...
DEPENDENCIES:
None
DOCUMENTATION:
None
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):