Release/P7b: add fix for omp reproducibility issue for P7b#689
Merged
Conversation
DeniseWorthen
approved these changes
Jul 13, 2021
Collaborator
Author
|
@DeniseWorthen @jiandewang would you please run a short P7b test to confirm this fixes the reproducibility issue? Thanks |
Collaborator
|
I tested 24hr for 20131001. I compiled and ran and then re-compiled and ran a second time. The coupler history file at the end of the 24hrs is identical for the two runs. |
Collaborator
|
Moorthi,
I assume this is the issue you mentioned. Thanks for fixing it
…On Tue, Jul 13, 2021 at 5:23 PM Denise Worthen ***@***.***> wrote:
I tested 24hr for 20131001. I compiled and ran and then re-compiled and
ran a second time. The coupler history file at the end of the 24hrs is
identical for the two runs.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKY5N2OYLZGLMHZ42E2VUH3TXSVFTANCNFSM5AJ523LQ>
.
--
*Fanglin Yang, Ph.D.*
*Chief, Model Physics Group*
*Modeling and Data Assimilation Branch*
*NOAA/NWS/NCEP Environmental Modeling Center*
*https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/
<https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/>*
|
Collaborator
Author
|
Denise, thanks for doing the testing. |
Collaborator
my two runs just finished, they generated identical results |
Collaborator
Author
|
Thanks Jiande for testing. Now the P7b branch is updated.
…On Tue, Jul 13, 2021 at 6:39 PM jiandewang ***@***.***> wrote:
I tested 24hr for 20131001. I compiled and ran and then re-compiled and
ran a second time. The coupler history file at the end of the 24hrs is
identical for the two runs.
my two runs just finished, they generated identical results
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMRIGUQDMUWQR3VRPTTXS6CVANCNFSM5AJ523LQ>
.
|
Contributor
|
Yes.
Moorthi
On Tue, Jul 13, 2021 at 5:29 PM Fanglin Yang ***@***.***>
wrote:
… Moorthi,
I assume this is the issue you mentioned. Thanks for fixing it
On Tue, Jul 13, 2021 at 5:23 PM Denise Worthen ***@***.***>
wrote:
> I tested 24hr for 20131001. I compiled and ran and then re-compiled and
> ran a second time. The coupler history file at the end of the 24hrs is
> identical for the two runs.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#689 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AKY5N2OYLZGLMHZ42E2VUH3TXSVFTANCNFSM5AJ523LQ
>
> .
>
--
*Fanglin Yang, Ph.D.*
*Chief, Model Physics Group*
*Modeling and Data Assimilation Branch*
*NOAA/NWS/NCEP Environmental Modeling Center*
*https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/
<https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/>*
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALLVRYVDSKNXT2UD4GKQI4DTXSVZ7ANCNFSM5AJ523LQ>
.
--
Dr. Shrinivas Moorthi
Research Meteorologist
Modeling and Data Assimilation Branch
Environmental Modeling Center / National Centers for Environmental
Prediction
5830 University Research Court - (W/NP23), College Park MD 20740 USA
Tel: (301)683-3718
e-mail: ***@***.***
Phone: (301) 683-3718 Fax: (301) 683-3718
|
epic-cicd-jenkins
pushed a commit
that referenced
this pull request
Apr 17, 2023
## DESCRIPTION OF CHANGES: Cleaning up bugs in the machine files. The first bug prompted this PR, and the rest were found subsequently. The bugs (and their fixes) are as follows: 1) A space is missing after the `print_info_msg` and `print_err_msg_exit` function calls in the `file_location` functions. Inserting a space gets passed this bug, but subsequent issues were found as described below. **For machine files that call the `print_info_msg` function in `file_location` (`cheyenne.sh`, `hera.sh`, `jet.sh`, and `orion.sh`):** Fixing this bug leads to other failures because when the "*" stanza is encountered in the `file_location` function, the `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` variable gets set to the message that `file_location` returns. Since that message contains spaces, it leads to other failures in downstream scripts (the ex-scripts). Simply removing the printing out of the message (thus causing `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` to be set to a null string) fixes the failures, so this was the fix implemented. If desired, a message for an empty value for `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` can be placed in another script (where those variables are used). **For machine files that use `print_err_msg_exit` in `file_location` (`stampede.sh` and `wcoss_dell_p3.sh`):** These should not exit if the file location is not available since the experiment can still complete successfully. So just removing the `print_err_msg_exit` call should work (and make the behavior of these machine files consistent with the set above). 2) In all the machine files, the variable `FV3GFS_FILE_FMT_ICS` should be changed to `FV3GFS_FILE_FMT_LBCS` in the definition of `EXTRN_MDL_SYSBASEDIR_LBCS`. This was fixed in all the files. 3) In `stampede.sh`, a variable named `SYSBASEDIR_ICS` is defined. This is a typo. Modify to `EXTRN_MDL_SYSBASEDIR_ICS`. ## TESTS CONDUCTED: Ran the WE2E test `grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR` on: * Hera -- successful * Jet -- successful except for UPP tasks * Cheyenne -- successful except for UPP tasks The UPP task failures are new and being experienced by other PRs as well (e.g. #689). The original issue with machine files seems resolved. ## CONTRIBUTORS (optional): @JeffBeck-NOAA encountered and reported the original error.
epic-cicd-jenkins
pushed a commit
that referenced
this pull request
Apr 17, 2023
* Tweaks for running with containers on azure * added config.sh for GST on azure * added AWS to load_modules_run_task.sh * working on bare metal now * Changing to azure, aws, and singularity * updates for singularity * tweaks for running using singularity exec * tweaks for running using singularity exec * Converting to a single noaacloud type * slight changes to config.sh for aws * update machine file * added missing slash to namelist * changes for intel * more cleanup * cleaned up commented lines
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Checklist
Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.
This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR
An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
are specified below.
If new or updated input data is required by this PR, it is clearly stated in the text of the PR.
Instructions: All subsequent sections of text should be filled in as appropriate.
The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.
Description
The PR is going to fix the run to run reproducibility issue in release/P7b branch.
Issue(s) addressed
Testing
How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)
Dependencies
If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).
Do PRs in upstream repositories need to be merged first?