Merging feature/sharedobject to develop#29
Conversation
Merging Hotfixes branch HF_ounfpart into the WW3 master. * Bugfix in ww3_ounf.ftn: NBIPART check against NOSWLL was not accouting for zeroth wind partition. See Issue #2. * Updating link.wcoss_phase2 for grib compile, removing !$ lines in ww3_uprstr that broke compilation of codes with OMPH OMPG switch flags, added new global wave model switch. Added option for writing to fixed netcdf gridded output file when using NCO switch (the issue will be fixed more adequately in v7 following Issue NOAA-EMC#8). * DB1 depth-breaking_bugfix: added proper scaling coefficients for radian freq. [radHz]. See issue NOAA-EMC#7. Merge will lead to v6.07.1, tag and release will be updated accordingly.
Feature/cmakebased
Stelios' develop to master
…p ${CMAKE_SOURCE_DIR}/WW3/model -c theia_so -s NCEP_st4" line was updated
model/bin/cmplr.env : "optc='-c -module $path_m -no-fma -fPIC -ip -p -g -i4 -real-size 32 -fp-model precise -assume byterecl -convert big_endian -fno-alias -fno-fnalias'" was updated, "-fPIC"
make_makefile.sh: libww3.so was added
w3_automake: libww3.so was added
w3_make: libww3.so was added
|
Hi @flampouris, we've found that building shared libraries has doubled the compilation time for some of the programs in our package (eg, using the -fPIC option instead of -ip). Having the option to build shared libraries is a wanted feature, would it be possible for you to build the shared library option without affecting the compilation time? This is critical for us in running multiple regression tests. We are reverting your changes to the NOAA-EMC/WW3 repo, and will be ready to reintegrate when you have a solution for reducing compile time. Thanks! |
This reverts commit 04ed64b.
|
Hi @ajhenrique, No, it is impossible to build shared objects "without affecting the compilation time," especially if the rest of the compilation flags remain the same. I do not understand the comment about the "-ip". The -ip is not related to shared objects, and according to the control version, I did not touch it. |
|
Hey @flampouris, thanks for the reply. I'm sure about that, and compile time is really important in our case because when pushing code from feature branches to the master/develop branches we run a matrix of regtests with more than 500 cases, the increase in compilation time while building shared objects was adding > 16h to the process, which made it impractical. Unfortunately, when your change was submitted our regtests matrix did not have a way to monitor effectively compile time, and I was unable to determine how much additional time this added to the build, so this went unnoticed (I am now adding a compile time tracker so we can make this part of our certification process). To fix the issue, I suggest making the build of shared objects an option, which we can easily incorporate into cmplr.env by adding the intel_so label. I'll clone your branch and make the changes, if you agree that is an option. Sorry about the -ip option comment, I was under the wrong impression that you had been replaced it by -fPIC. by accident. |
|
@ajhenrique I agree total compile time is important and adding an "intel.so" option to the cmplr.env could be an option. How can we improve our regression testing so that we catch this sooner the next time? @flampouris another thought of a fix: If the goal for the so is for JEDI, do you use cmake/ecBuild? And if so, can you pass the opt, and other things that are going into cmplr.env from the top level build to WW3? We'd have to maybe update cmplr.env to accept predefined values, but this could be a nice feature for anyone who wants to pass this information (compiler and options) down from an overall build system. I can't remember if @mickaelaccensi already added that feature or not? |
|
hello @ajhenrique and @JessicaMeixner-NOAA The modification of the cmpl.env is not the issue. The problem was the way that the design of the compiling procedure, and specifically the w3_make, and in a smaller degree, the list of progs at the make_makefile.sh. I solved the problem by "grepping" the fPIC flag from the comp; in principle, I violated the design of the w3_make. See the pull request for details. BTW, I ran a couple of experiments, the difference between the master and the branch with shared objects, is about 0.2 seconds, so I don't think that the problem was the compilation time but the running time. About the cmake: I am not a guru, but the best option is to forget the recycling of the w3_make_family_scripts and the customized preprocessor and start from scratch. In my case, I just created a "poor man's" CMakeLists.txt based on "execute_process" that serves my development. |
|
@flampouris thanks for the feedback. We (@aliabdolali + myself) inspected the ~500 cases output from the regtests matrix, and in all of them the run time was not significantly different between the master and develop branch which had included the fPIC option: in our experiments the time burden was indeed in the code compilation. The latter was confirmed by running w3_make for individual programs: compilation times were significantly reduced (by > 2x) by removing the fPIC option. It would be interesting to compare notes in person, since we sit not that far from each other. We can take that chance to discuss options to move forward, and clarify issues that may not seem evident to each other. Thanks! |
|
Hi all, I would rather prefer to add it as a prefix so_intel to keep it consistent with options added in specific case or for specfic machine as it was done for zeus and datarmor. Mickael |
|
@ajhenrique When compiling for regtests, what level of compile time optimization do you set? I ask because I found that I was able to dramatically decrease the total time required for regtests by using a reduced optimization level. I wonder if this would help mitigate the longer compile times you are experiencing when copiling with the -fPIC flag? For our system (Cray XC), I found that using -O1 gave the best results: the compilation is significantly quicker than our operational settings (-O3) with only a small impact on the runtime. Using no optimization (-O0) resulting in really quick compile times, but the runtime suffered dramatically (compilers are very good at optimizing code!) Level 1 optimzations gave the best balance... Just a thought.... Chris |
|
Thanks @ukmo-ccbunney this is good to know. The added option to use or not the fPIC does the job of reducing compile time (which seems to be a platform-dependent issue as well). We are trying to keep the compile flags as close to our operational config as possible, but perhaps in some cases if we can fit matrix runtime under 8h (our wall-time limit), using -O1 may come in handy. I'll run some tests to figure if this is the case. Cheers, Henrique. |
1 similar comment
|
Thanks @ukmo-ccbunney this is good to know. The added option to use or not the fPIC does the job of reducing compile time (which seems to be a platform-dependent issue as well). We are trying to keep the compile flags as close to our operational config as possible, but perhaps in some cases if we can fit matrix runtime under 8h (our wall-time limit), using -O1 may come in handy. I'll run some tests to figure if this is the case. Cheers, Henrique. |
|
For completion, I just finished updating the run_test script to display metrics indicating compile and run time for all programs under any restest. I tested the new run_test with the code added by @flampouris in #29 and without the fPIC flag. The modules loaded on NCEP's R&D machine Theia were:
Results below indicate a substantial increase in compile tie particularly for ww3_grid. For the first regtest below (ww3_tp1.1), ww3_grid compilation went from 60s to 100s. For the second used in these examples. ww3_grid compile time went from 56s tp 102s. Other codes in these cases were not affected. However, the differences noted explain the significant impact of using fPIC with the above modules while running our regression tests matrix. Please note that this issue has now a proposed solution in pull request #47 by @flampouris . Results for test
A) Without fPIC B) With fPIC Results for test
A) Without fPIC B) With fPIC |
Merge latest dev/ufs-weather-model
For different applications, the ww3 library is needed as a shared object; therefore, the "make" scripts were updated accordingly. The main modification is the addition of -fPIC flag for compiling the code. The shared object, named libww3.so, has been added as a target at the makefile, it is created similar to the other executables and libraries of WW3 (./w3_make libww3.so); the "libww3.so" resides at the [...]/obj folder.
The code has been tested with intel17 and gnu7.3.