wmmdatmd.F90: update MTAG parameters for 2022 compilers#825
Conversation
|
@benoitp-cmc can you try this out to see if this helps for the case you reported in #711 ? |
|
We're not using multi-grid for E3SM, so this won't effect our configurations. |
|
@JessicaMeixner-NOAA I will. |
Thank you for reporting this, @sbrus89. |
|
Note: I wanted to clarify this is currently a draft PR only because I am still adding text and log files for the testing performed. The code fix itself (and my testing of it) is complete. |
|
That would be great, @ukmo-ccbunney. Thanks! |
|
It's not working for my real case (560 cpu). I've revisited the simplified test case I provided in #711 by adding the missing 3rd grid: With 10 CPU, this test case works but with 80 CPU it gives me: Without the patch, I have: |
|
@benoitp-cmc :( We did know that this would not work for every case, but we were certainly hoping it'd work for most. Have you had any success changing the MTAG parameters to get successful runs? Or have you just used the MPI environment variables to succeed with the newest intel? We're trying to figure out a near-term path forward that will allow us to move to the latest intel for our WW3 regression testing, which this PR would allow, but we don't want to break other people's cases. So I guess I'm asking is does this break things "more" for you? Could this be another interim state until we figure out a longer term solution to this, which might require some new MPI route handles so that we're not using the same ones and going over the itag limits? (That idea was from @ukmo-ccbunney ) |
|
@JessicaMeixner-NOAA I have tested this change and it doesn't affect our multigrid configurations. I don't know how the tag system works so I'll pass my turn on playing with MTAG values. For now, using the MPI environment variables is a suitable workaround. |
@benoitp-cmc, thank you for these confirmations. |
|
@MatthewMasarik-NOAA I believe for this PR, we're still waiting on @ukmo-ccbunney / @ukmo-jianguo-li to test their SMC multi-grid case, and word from @thesser1 and @mickaelaccensi to confirm there's not any unintended consequences for them. Given that this does not completely address #711, let's update the PR description to remove "fix" from the issue mention as we move forward. Hopefully we can have reviews from others by the end of next week at the latest before moving forward on this PR. |
@JessicaMeixner-NOAA, I copy, thanks for the status update. I have removed 'fixes' #711 from the description (and added ufs-community/ufs-weather-model/issues/1237). |
There was a problem hiding this comment.
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR2_UNO_MPI_d2 (12 files differ)
mww3_test_03/./work_PR1_MPI_d2 (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (12 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (12 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (15 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (15 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_tp2.10/./work_MPI_OMPH (5 files differ)
ww3_tp2.16/./work_MPI_OMPH (3 files differ)
ww3_tp2.17/./work_a (1 files differ)
ww3_tp2.17/./work_c (1 files differ)
ww3_tp2.17/./work_b (1 files differ)
ww3_tp2.6/./work_ST0 (1 files differ)
ww3_tp2.6/./work_ST4 (1 files differ)
ww3_tp2.6/./work_pdlib (1 files differ)
ww3_ufs1.3/./work_a (1 files differ)
matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt
Just waiting on @ukmo-ccbunney @mickaelaccensi and @thesser1 to see if they have any objections at this point.
|
Thanks for the update @JessicaMeixner-NOAA. Glad to see your tests passed as well. |
|
all tests passed and differences seems as usual ********************* non-identical cases **************************** mww3_test_03/./work_PR2_UQ_MPI_d2 (9 files differ) |
|
Thank you for providing your results, @mickaelaccensi! |
|
Although we have not gotten affirmative answers from everyone we were waiting for, at this point we plan to merge this PR at the end of the workday here unless we hear something in the meantime. |
|
Sorry I am a bit late on reporting this, but for the record this ran fine on our Cray XC40 and EXZ systems using the Cray compiler. |
|
Great to hear @ukmo-ccbunney, thanks for the report. |
|
orion UFS RT |
|
wcoss2 (cactus) UFS RT |
Pull Request Summary
Modifies the
MTAG1/2parameters which determineitagvalues available for MPI communication between multiple grids.Description
In
wmmdatmd.F90the parametersMTAG1andMTAG2allow you to adjust the total number ofitagvalues which serve as id's for MPI messages between multiple grids. When tuningMTAG1/2you are effectively splitting the total number of available id's (itag values) between id's used for (1) grids of equal rank, and (2) grids of non-equal rank.What bug does it fix, or what feature does it add?
itagvalues must be updated for newer (2022) compiler versions.Is a change of answers expected from this PR?
Please also include the following information:
Add any suggestions for a reviewer
Mention any labels that should be added:
Are answer changes expected from this PR? Please describe the changes and the reason why in addition to which of the following labels would apply:
Issue(s) addressed
dev/ufs-weather-model.Commit Message
wmmdatmd.F90: update MTAG1/2 parameters for 2022 compilers
Co-authors: @JessicaMeixner-NOAA, @aliabdolali
Check list
Testing
How were these changes tested?
A number of WW3 standalone and UFS weather model coupled regression tests were performed.
WW3 standalone
ensure consistency in a way that is apples-to-apples.
UFS WM coupled
hera,orion,wcoss2to ensure the current regtests aren't broken by the newvalues. Additionally, to confirm that
control_c384gdas_wavcan be turned back on, this test was run on hera anda baseline was created and successfully matched.
Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
Have the matrix regression tests been run (if yes, please note HPC and compiler)?
Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
(1) Only known non-identicals (and unstructured mod_defs).
(2) Only known non-identicals (and unstructured mod_defs).
(3) For this set of tests we would not expect multi-grid tests to pass without differences. For the itags:curr/comp:new runs that finished without error, the changes are: the known non-identicals, unstructured mod_defs, and additional multi-grid tests (mww3) as mentioned.
(1)
(2)
(3)
UFS RT Logs
hera - for a normal run all tests pass except
compile_011tests andcontrol_c384gdas_wav(expected). When a baselineis created for
control_c384gdas_wavit successfully matches.orion - all current runs pass (initially 6 tests stall, though when re-ran they pass -- records concatenated at the end).
wcoss2 - all current runs pass.