Feature/memcheck#848
Conversation
* move implicit none to module level in edited files
* lengthen some short lines
|
@DeniseWorthen thank you for your very nice work streamlining this important utility for memory monitoring. I'm working on reviewing it now. |
|
Good morning @DeniseWorthen, @JessicaMeixner-NOAA and I have done an initial run of the regression tests and we get some errors popping up in ww3_shel/ww3_multi. They seem to be related to the use of |
|
@MatthewMasarik-NOAA Do we have any regtest to check the MEMCHECK switch? |
|
@MatthewMasarik-NOAA Can you point me to a run directory where the errors are being reported? |
@DeniseWorthen sure. On hera, |
|
@MatthewMasarik-NOAA Does IAPROC have a value yet? It doesn't look like it is assigned until after the call to the MPI_COM_RANK (if MPI) or after the first call to print_memcheck (if SHRD). A quick test would be to comment out the first print_memcheck to see if that is the issue. |
|
@DeniseWorthen I tried this: And I get past the ww3_shel error Matt reported, and then start getting one in ww3_multi/w3init and I think it's the same type of issue that IAPROC is being used before being set. Could memunit just be set in the print_memcheck routine? IAPROC should be a module variable you could add to the routine. Otherwise, I think its just a matter of changing where memunit is defined and/or where the early print_memcheck routines are being called from. There isn't a regtest using this feature, so we haven't run into this issue before because its' just simply always behind a switch. Honestly not sure how some of the calls worked before given some of these errors. |
|
@DeniseWorthen, yes @JessicaMeixner-NOAA effectively did that by moving that first call past the initializations of NAPROC/IAPROC in the SHRD ifdef, and that does take care of the first error. Other's then come up in multi. @JessicaMeixner-NOAA may be able to point you to that run she did. |
|
@aliabdolali , yes it would be great if @aronroland and @MathieuDutSik would like to review/comment here since they are the original authors. |
|
@MatthewMasarik-NOAA @JessicaMeixner-NOAA I found a similar issue in the yowpdlibmain, where myrank is not assigned until after the call to initMPI. In that case though, myrank=0 so did not cause an issue. I'm not sure why that doesn't seem to be the case w/ iaproc. What value does it have prior to initialization? The problem w/ assigning memunit in print_memcheck itself is that there are multiple choices for where the file is written. memunit could be converted back to the hard-coded values |
|
From testing w/in the meshcap, it appears that after the call to w3seto, |
|
@DeniseWorthen I think we have to have this change to move to this new paradigm. @aronroland might be able to chime in with how he uses this, if it breaks the functionality he needs, but we can't run the code without some sort of change. Also we have an issue in w3init where https://github.com/DeniseWorthen/WW3/blob/feature/memcheck/model/src/w3initmd.F90#L522 we have memunit = 10000+IAPROC but IAPROC is not necessarily set until https://github.com/DeniseWorthen/WW3/blob/feature/memcheck/model/src/w3initmd.F90#L558 |
|
Does the existing code work if W3_MEMCHECK is used in the regression tests? |
|
There is not a regression test that tests MEMCHECK and it's not part of our WW3 regression testing as of right now. |
|
Understood. I meant if W3_MEMCHECK is enabled, do any of the failing tests (w/ this branch) run successfully? |
|
@DeniseWorthen I have not tried that. I just tried a few tests to see if that fix might work. |
|
@MatthewMasarik-NOAA @JessicaMeixner-NOAA I've made an attempt to run a regression test from the develop branch with MEMCHECK enabled
This job fails because MEMCHECK is trying to open a file (fort.5) that is readonly. The printMallInfo should be sent to 740+IAPROC, not IAPROC. I can continue to try to work through the failures, but I don't believe the current develop branch would pass the RTs with MEMCHECK enabled. |
|
@DeniseWorthen Thanks for reporting this. This doesn't seem super surprising given that MEMCHECK is just used in certain cases and is not included in any regression testing. Right now the biggest barrier to this PR being merge is the issue that it's breaking the ww3 regression tests without MEMCHECK on. @aronroland or the other developers of MEMCHECK might be able to address the failure you reported. |
|
Hello, I was asked to comment on this:
Finally, let me comment that I regard the elimination of |
@MathieuDutSik, there may be a misunderstanding here. The changes regarding |
Ok, but why mix those two issues in the same PR? Jessica indicated us that PR have to be atomic. And in particular, why limit the elimination of |
|
@MathieuDutSik We do prefer PRs to be as single focused as possible. Most importantly we are a community code and I appreciate your acceptance that sometimes we might not always all agree on what is the best path forward but that we all are working towards the same goal to improve WW3. We ask everyone to be respectful of others ideas, opinions and contributions. Please email me privately with any follow up questions/comments/issues with this PR. I know I personally am very grateful for @DeniseWorthen's contributions here! |
* iaproc must be initialized and/or set for use in memunit to 1) and then again after
|
@MatthewMasarik-NOAA I believe this latest commit will resolve the issue w/ the regtest failing even when memcheck is not set. I'm not experienced w/ running the regtests for WW3, but I think I did it correctly. Note that the fix will still mean that some memcheck calls will write results only to the first file (ie, 740+1) because they come prior to obtaining (or resetting) iaproc with a call to MPI_COMM_RANK. |
|
Thanks for your continued work on this @DeniseWorthen. I'll run the regtests to confirm the effects of this update. |
|
@DeniseWorthen I tried running the updated branch but still was getting seg faults in the same areas, |
|
For the record, the solution that I would have liked to implement is the following: (A) The first thing would be in w3servmd, the following code (that is without ifdef) end subroutine print_memcheck (B) The second thing would in the w3macros.h define MACRO_PRINT_MEMCHECK(a, b) print_memcheck(a, b)#else define MACRO_PRINT_MEMCHECK(a,b)#endif (C) The function calls then become instead of The advantage I see are the following:
Of course, you are free to do as you want, but I just wanted to point out that alternative solutions exist. |
|
Hi All, sorry for being absent few days. I just read through all the comments and yes definitely this is a good way to proceed and to wrap it that way it was done now. I have optimized the location and the number of the memcheck part in the latest branch. I never commit it since there are some open questions. One of the u found e.g. the not defined IAPROC. If this is not defined it was writing the STDOUT, I used it at that time for looking at the memory and it was good enough. Another part, which I did not liked is that everything is written in the 740+IAPROC files. I started to subdivide it for my purposes to the IAPROC+10000, 20000, 30000, 40000 files for each of the parts in WW3. Now this is also not optimal. We need some brainstorming on the question where to write what and define file handles, which is dedicated to memcheck. As for now and your merge I would just delete the memcheck part, which is not properly intialized. I suggest that we discuss the above issues and I can then streamline my changes ... Denise, Jessica I like to thank u both for bringing this forward and cleaning the code. Mathieu's suggestion is really great, thanks for pointing this out. So what would be now from your side the favorable way to proceed? Aron |
|
I think one issue I see w/ the macro option is that each file that implements a memcheck call would need to have that macro defined, correct? It also doesn't align with a path forward of removing ifdefs when possible. Yes, the feature/memcheck branch still requires the use of W3_MEMCHECK ifdefs but those can be removed as the code moves towards implementation of namelist settings. For example, a logical flag, set via namelist, can be passed to the print_memcheck routine; when this flag is false, the routine simply returns w/o doing anything. I didn't make that change here, although I probably should have explained that would be a further required step. |
Dear Denise, I only wanted here to explain that there is another solution and that ifdef-based code is not necessarily awful. |
|
@MatthewMasarik-NOAA I was able to run these two tests in debug mode with commit dd80c1d. Can you re-try your failed cases now? |
|
Hi All, there's been some posts over the holiday weekend, I'd like to address each. It is nice to have the input from @aronroland who wrote the code initially and @MathieuDutSik who is also involved. @MathieuDutSik I think your solution is a clever macro addition to Denise's function call (and takes the @aronroland I'm curious about the unit indexing you mentioned: IAPROC + 10000, 20000, etc. Did you find it useful to separate the output based on section, or should we consolidate and just have one file for each proc? |
@DeniseWorthen , yes. I'll let you know how they go. |
Right now, the "w3macros.h" is included in 124 of the 136 Fortran F90 files of the source code so I think the issue inclusion is a non-issue. I guess that following the chosen path of eliminating of CPP pragmas the "w3macros.h" file will be removed from the source code. |
|
@DeniseWorthen the regtests came back with no errors. I'm running develop now to confirm no answers changed. Did your runs have the memcheck output in the expected IAPROC files? |
|
@MatthewMasarik-NOAA I tested this in the mesh cap by turning on memcheck in the structured and pdlib cases for both the original code and this feature branch. The same files were produced in both cases. I also did some basic checking on the numbers I obtained when turning the feature on (see PR for notes I made). |
|
Awesome. Thank you for that testing and checking the numbers between the two, @DeniseWorthen. |
|
@DeniseWorthen I wanted to give you a status update here. I needed to re-run the regtests as the first time around had some flaky behavior, but now everything looks well. All that remains before acceptance is some timing comparisons. I'm currently compiling timing output from the regtests and am going to submit some larger test cases this afternoon. I see both those being complete by Wed. |
MatthewMasarik-NOAA
left a comment
There was a problem hiding this comment.
Code review
The code edits contain:
- definition of new subroutine
print_memcheck - replacement of existing
W3_MEMCHECKifdef-blocks with calls to the new routine - whitespace removal
- alignment formatting
- wrapped lines formatting
- and removal of duplicated code.
The encapsulation of the MEMCHECK functionality in print_memcheck provides a substantial reduction in code (5 lines replaced with 1 for each instance), improving readability.
Timing
The timing for the PR was carefully check against current
develop. The develop commit used for this testing as well as for future testing is: 616bf79. Note, the established acceptable threshold for runtime increase is 5%.
Testing was done in two ways:
- comparing regtest output from WW3
run_testusing option-Tto provide timing information. [performed on:hera/intel] - running larger tests cases (in time period, grid resolution) and comparing. [performed on:
wcoss2/intel]
Summary of regtest timing
There are a large number of regression tests (661) and they are lightweight by design, with many of the tests running in a matter of seconds, or even fractions of seconds. The performance of longer test cases are our primary concern, though the results from these shorter regtests also provide a useful check for the many different configurations.
96%(635/661) were less than the5%threshold. In fact, many of the values were closer to~1%, with40%of the values actually being negative. This likely hints that in these cases runtime increases are near the level of noise.4%(26/661) exceeded the5%threshold. However, most of these were tests lasting~1sec, with differences in the fractions of seconds. The longest test in this group lasted~1minwith a %change of7%.
Summary of larger tests cases
A 1/4 deg tripolar grid was used to run 4 test cases which varied: length of simulation, resources (MPI tasks, threads), and global timestep/input frequency. These simulations were much longer (15min -- 1hr+), and they all had runtime increases of only a fraction of a %, (~< 0.6%).
GRID: 1/4 deg tripolar
TEST01
* total tasks: 240
* threads: 4
* forcing dt / global dt: 300sec
* sim length: 1day
> timing develop: 903sec (~15.05min)
> timing memcheck: 902sec
> %change: -0.1%
TEST02
* total tasks: 480
* threads: 2
* forcing dt / global dt: 300sec
* sim length: 5days
> timing develop: 4314sec (~1hr 11min)
> timing memcheck: 4334sec
> %change: 0.5%
TEST03
* total tasks: 480
* threads: 2
* forcing dt / global dt: 1800sec
* sim length: 5days
> timing develop: 1220sec (~20.33min)
> timing memcheck: 1224sec
> %change: 0.3%
TEST04
* total tasks: 240
* threads: 4
* forcing dt / global dt: 1800sec
* sim length: 10days
> timing develop: 2584sec (~43.06min)
> timing memcheck: 2599sec
> %change: 0.6%
Regression Tests
The regtest matrix was run and compared with current develop. The tests all passed, with exception of only the expected non-b4b tests. These are shown in output of the matrixCompSummary.txt:
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (8 files differ)
mww3_test_03/./work_PR1_MPI_d2 (6 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (11 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (15 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_tp2.10/./work_MPI_OMPH (7 files differ)
ww3_tp2.16/./work_MPI_OMPH (4 files differ)
ww3_tp2.6/./work_ST0 (1 files differ)
ww3_tp2.6/./work_ST4 (1 files differ)
ww3_tp2.6/./work_pdlib (1 files differ)
ww3_ts4/./work_ug_MPI (1 files differ)
ww3_ufs1.3/./work_a (3 files differ)
**********************************************************************
************************ identical cases *****************************
**********************************************************************The full output files are attached:
Conclusion
- Code review - Improves code readability by encapsulating
MEMCHECKfunctionality in a new routine. Additional edits pertain to related code clean up (whitespace, alignment, etc). PASS. - Timing - Regest timing showed
96%were under the5%runtime increase threshold, with those remaining4%being mostly simulations lasting~1sec, with the longest, only~1min. Results from the larger test cases (1/4 deg tripolar), having much longer runtimes (15min -- 1hr+), showed that in each case the runtime increase was~< 0.6%. These larger test cases are the most important here and the runtime increases are consistently well below the5%increase threshold. PASS. - Regtests - All tests passed with only the expected non-b4b tests as exceptions. PASS.
Based on these considerations I approve this PR.
|
Please let me know of any questions or concerns on the timing results. |
|
Thank you @DeniseWorthen for your hard work. The general code clean up, and encapsulation of |
Pull Request Summary
Replaces W3_MEMCHECK ifdef and associated lines with utility routine.
Description
with
where the
print_memcheckroutine is added tow3servmd.F90. The variablememunitis assigned locally.implicit noneat the module level for files which were edited.Issue(s) addressed
Related issue #800 and Discussion #551.
Commit Message
Check list
Testing
The changes were tested in UWM and all tests were b4b. The changes were also tested in UWM by turning on the
MEMCHECKswitch, running on a small number of tasks (5) and for only 2 hours. The same files were generated by the existing MEMCHECK code as well as the changes in this branch. The values of VM and RSS were not identical. The VM values across files were generally <1% different between existing code and these changes. The values of RSS were also not identical (larger differences were seen). To investigate, the current code was built and run twice and similar variation in RSS values was seen.