optimize do loops#979
Conversation
* use npa when filling va at the end of pdlib_explict_block * re-indent w3wavemd after removal of if-block (from wise pr) * trailing whitespace cleanup (from wise pr)
|
More exciting work, thank you @DeniseWorthen. The matrix.comp for #975 is running now so that PR should be ready for merging later this morning if no issues. I'm looking forward to seeing the speed improvements from this PR! |
|
@DeniseWorthen, the dependency #975 was just merged. I'll start the regtests for this now. |
|
@DeniseWorthen could you please update your branch? |
aronroland
left a comment
There was a problem hiding this comment.
Hey Denise,
missed that one, thanks for optimizing this part!
Cheers
Aron
I re-ran my tests using 4x as many tasks for the WAV model. The improvement is still there, but is reduced (as expected): current code: The ESMF Profile gives the following values for the WAV RunPhase1 (min time, mean time, max time): current code: 385.2281 187.5005 806.7036 So these are reductions of ~4.5% in wall clock, and ~10% in mean run phase time. |
|
Thanks for the update @DeniseWorthen. The testing will finish this morning and then complete the review this afternoon if there's no problems. |
MatthewMasarik-NOAA
left a comment
There was a problem hiding this comment.
Code review Pass
Regtests Pass
matrixCompFull.txt
matrixDiff.txt
matrixCompSummary.txt
- Summary of non-identicals shows only the known non-b4b cases with unstructured mod_defs.
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR1_MPI_d2 (14 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (17 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (10 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (14 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (16 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_tp2.10/./work_MPI_OMPH (6 files differ)
ww3_tp2.16/./work_MPI_OMPH (4 files differ)
ww3_tp2.17/./work_ma (1 files differ)
ww3_tp2.17/./work_a (1 files differ)
ww3_tp2.17/./work_mc1 (1 files differ)
ww3_tp2.17/./work_mb (1 files differ)
ww3_tp2.17/./work_mc (1 files differ)
ww3_tp2.17/./work_ma1 (1 files differ)
ww3_tp2.17/./work_c (1 files differ)
ww3_tp2.17/./work_b (1 files differ)
ww3_tp2.19/./work_1B_a (1 files differ)
ww3_tp2.19/./work_1A_a (1 files differ)
ww3_tp2.19/./work_1C_a (1 files differ)
ww3_ufs1.3/./work_a (3 files differ)
**********************************************************************
************************ identical cases *****************************
|
Thanks again, @DeniseWorthen for this excellent catch. |
|
@MatthewMasarik-NOAA Thanks. Could you or Jessica please create a PR to update the ufs-weather-model branch? |
|
Yes, we will do that today or tomorrow at the latest. |
Pull Request Summary
Reorders the loops in
PDLIB_EXPLICIT_BLOCKfor efficiency.Description
Reorders several loops across
ip=1,npaorip=1,npso that the inner most index varies the fastest.Note: this PR was built on top of the changes in PR #975 and so depends on that PR being merged first.
Issue(s) addressed
Commit Message
Reorders some loops in PDLIB_EXPLICIT_BLOCK for efficiency.
Check list
Testing