Scal xyb#556
Conversation
…following the quest
|
@aronroland Thanks for these nice updates - I can definitely see how this could impact any high-resolution case. I'm curious if you or @aliabdolali looked into the impact of this change (beyond not changing answers) for structured multi-grid test cases that are already memory intensive, for example, the fully resolution test case example in ww3_ufs1.1. Does this dramatically affect the memory usage for that test? This is important information as double versus single precision is not an option in the way this change is introduced. |
JessicaMeixner-NOAA
left a comment
There was a problem hiding this comment.
Is there a reason that build_utils.sh and ft2src.sh were changed? This seems unrelated to this PR along with the additional comp/link files.
| DEALLOCATE(MX1, MXX, MYY, MXY, MAPOUT) | ||
| DEALLOCATE(MX1R, MXXR, MYYR, MXYR) | ||
| DEALLOCATE(AUX1) | ||
| !AR: This needs to be cleaned for the scalability part .. |
There was a problem hiding this comment.
Clean up comment?
|
@aliabdolali @aronroland there are still a few unresolved issues with this PR from the comments I made in the review last week. There are also the open question as to why the permissions are being changed on a few files. Lastly, there is the question on the impact of the memory since this is not added as an option. Please let me know if I need to look into that myself for impacts to the operations of the global systems here at NCEP. |
@JessicaMeixner-NOAA the way @aronroland and I isolated and tested the scal_xyb does not affect the runtime and performance/outputs of the structured part of the code. As a sanity check, I checked the ufs1.* cases and there is no diff in the runtime. The reason we isolated the work is even one single change in the unstructured case made some changes in the structured part which sometimes takes hours to track and understand, sometimes impossible, therefore delaying the scalability reintegration and further development. |
|
@aliabdolali it's great to hear runtime is unaffected. How about the memory usage in the ufs1.1 with the high resolution grids? Again, if I need to do this work, I can I just need to know. |
|
@JessicaMeixner-NOAA <https://github.com/JessicaMeixner-NOAA> , @aliabdolali <https://github.com/aliabdolali> ,
Sorry for the late reply. I was busy this week and will have just time at the weekend to clean the code and reply to the comments from <https://github.com/JessicaMeixner-NOAA> @JessicaMeixner-NOAA, many thanks for this in advance. Since we have deleted one array I do not expect big surprise. How do you <https://github.com/JessicaMeixner-NOAA> @JessicaMeixner-NOAA check memory usage?
Cheers
Aron
Von: Ali.Abdolali ***@***.***>
Gesendet: Mittwoch, 15. Dezember 2021 15:53
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scal xyb (PR #556)
@aliabdolali <https://github.com/aliabdolali> @aronroland <https://github.com/aronroland> there are still a few unresolved issues with this PR from the comments I made in the review last week. There are also the open question as to why the permissions are being changed on a few files. Lastly, there is the question on the impact of the memory since this is not added as an option. Please let me know if I need to look into that myself for impacts to the operations of the global systems here at NCEP.
@JessicaMeixner-NOAA <https://github.com/JessicaMeixner-NOAA> the way @aronroland <https://github.com/aronroland> and I isolated and tested the scal_xyb does not affect the runtime and performance/outputs of the structured part of the code. As a sanity check, I checked the ufs1.* cases and there is no diff in the runtime. The reason we isolated the work is even one single change in the unstructured case made some changes in the structured part which sometimes takes hours to track and understand, sometimes impossible, therefore delaying the scalability reintegration and further development.
Aron can comment more here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#556 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43W2YESWUGR3SEQW5ADURCTWJANCNFSM5JR5BF4Q> . <https://github.com/notifications/beacon/AB2S43TZHERACYI75HXB4S3URCTWJA5CNFSM5JR5BF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHNGHSVY.gif>
|
|
@aronroland there are a variety of ways to check the memory, including using the MEMCHECK switch you implemented. Some HPCs give memory output and we can also check the resulting size of files. Looks like I should just do these checks myself at this point. I hope to have information for you in a week if not before. |
|
Hi Jessica,
ok, so I guess that Ali and me can do that as well, as he is back. I would appreciate if could have a discussion with you and Ali in order to organize further steps.
Cheers
Aron
Von: Jessica Meixner ***@***.***>
Gesendet: Mittwoch, 15. Dezember 2021 16:17
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scal xyb (PR #556)
@aronroland <https://github.com/aronroland> there are a variety of ways to check the memory, including using the MEMCHECK switch you implemented. Some HPCs give memory output and we can also check the resulting size of files. Looks like I should just do these checks myself at this point. I hope to have information for you in a week if not before.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#556 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43TDTKALZIASWT4KPQ3URCWOVANCNFSM5JR5BF4Q> . <https://github.com/notifications/beacon/AB2S43UBLYMIIYU2CMOTMODURCWOVA5CNFSM5JR5BF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHNGNLRI.gif>
|
|
For the ww3_tr1 files that are different, it's the mod_def files. I wonder if something is written to the mod_def files that are changed by the changes made here or alternatively if this is a case of the memory issues that we seem to occasionally get. One question would be if we run this branch twice, does that test give different answers? |
|
Dear Jessica,
this is a bit strange stuff. So basically the mod_def file is different for Ali not for me and I cannot reproduce the difference, for which the final run results are the same. As we have looked at Ali’s binary difference using vdiffbin, the results have very tiny difference.
I would have fixed it but I cannot reproduce it since using the intel compiler in our system (ifort version 2021.2.0), I could not find any difference in the binary.
Code wise there should not be any difference but u never know. So what we wanted to do is to push this to Ty and see what he gets. I can also check on the French super computer. Hunting the differences down for the mod_def, would basically now result for me to introduce some possibility to write everything out in ascii and see where it differs.
We will discuss this issue more on Monday I guess.
Cheers
Aron
Von: Jessica Meixner ***@***.***>
Gesendet: Donnerstag, 16. Dezember 2021 23:32
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scal xyb (PR #556)
For the ww3_tr1 files that are different, it's the mod_def files. I wonder if something is written to the mod_def files that are changed by the changes made here or alternatively if this is a case of the memory issues that we seem to occasionally get. One question would be if we run this branch twice, does that test give different answers?
—
Reply to this email directly, view it on GitHub <#556 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43VCEN42ZEDPAHJE5Y3URJSGVANCNFSM5JR5BF4Q> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43QJEMVKINFXRND2ADLURJSGVA5CNFSM5JR5BF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHNQXYPY.gif> Message ID: ***@***.*** ***@***.***> >
|
|
Hi @JessicaMeixner-NOAA and @aronroland Seeing this jogged my memory - I thought a fix had been merged for that Issue, but I don't believe it has. Cheers. |
|
This is great that u have reported that. Many thanks for this! |
|
The size of the output files did not change, additionally the total memory usage on all of the ww3_ufs* tests increase with this PR, but not so much so that I think this warrants needing to put in an option for this variable. If anyone is concerned about total memory usage for their applications please test this PR and make sure it's okay (@mickaelaccensi @ukmo-ccbunney & anyone else). |
|
Hi Jessica,
can u please specify how u have tested the memory usage and post the certain numbers u have obtained?
Cheers
Aron
Von: Jessica Meixner ***@***.***>
Gesendet: Freitag, 17. Dezember 2021 17:15
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scal xyb (PR #556)
The size of the output files did not change, additionally the total memory usage on all of the ww3_ufs* tests increase with this PR, but not so much so that I think this warrants needing to put in an option for this variable. If anyone is concerned about total memory usage for their applications please test this PR and make sure it's okay ***@***.*** <https://github.com/mickaelaccensi> @ukmo-ccbunney <https://github.com/ukmo-ccbunney> & anyone else).
—
Reply to this email directly, view it on GitHub <#556 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43RHJZAB6LBSHZYDDX3URNOYLANCNFSM5JR5BF4Q> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43RPX4VZRSSPQ2MV7EDURNOYLA5CNFSM5JR5BF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHNVKMBQ.gif> Message ID: ***@***.*** ***@***.***> >
|
|
@aronroland I looked at the resulting file sizes and I used slurm commands to see the total memory usage of the regression test jobs for the ww3_ufs* tests. The increase in total memory used was about .1GB or less. I didn't save output. |
I checked it this morning, running tr1 twice ended with a different mod_def :( |
|
After ww3_tr1 fix and resolving comments, I reran the regtests and here is the summary: pre-known different cases: Expected changes in mod_def files for unstructured cases: |
Pull Request Summary
A short overview of the PR
Remove XYB and replaced by double precision XGRD, YGRD and single precision ZB
https://github.com/NOAA-EMC/WW3/projects/2
Description
XYB is keeping the XY-coordinates and the depths, this array is deleted for the sake of memory saving and towards the integration of the scalability branch into develop. We rather use XGRD, YGRD and ZB instead. XGRD and YGRD are converted to double precision since this is needed by the unstructured part of WW3 for the accurate calculation of the geometric quantities of the unstructured triangular grid. The changes have been implemented in such a way that the structured part of the code and the non-decomposed part of the unstructured grid remains constant. The use of single precision in lat/lon coordinates may also be a bug for the structured, curvilinear, triangular or smc grid if applied to very high resolution.
Please also include the following information:
enhancement
Issue(s) addressed
Commit Message
Remove XYB and replaced by double precision XGRD, YGRD and single precision ZB
Check list
Testing
These two tests are not expected. Aron has them identical with Intel and Ali has them different on both Orion and Hera using Intel. @thesser1 could you test this test on ERDC HPC or your computer? One does not need mpi.
pre-known different cases:
Expected changes in mod_def files for unstructured cases:
matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt