Skip to content

Build workflow utilities with CMake.#264

Merged
KateFriedman-NOAA merged 8 commits into
developfrom
feature/cmake
Feb 19, 2021
Merged

Build workflow utilities with CMake.#264
KateFriedman-NOAA merged 8 commits into
developfrom
feature/cmake

Conversation

@aerorahul

Copy link
Copy Markdown
Contributor

This PR enables the utilities under sorc/ to be built using cmake. modulefiles for hera, orion and wcoss_dell_p3 are added that load hpc-stack.

These workflow utilities were compiled on hera and orion, but not on wcoss_dell_p3
These workflow utilities were compiled on macOS, EMC Linux box (RHEL7), Docker with GNU and IntelOneAPI containers.
All utilities built as expected. They were not tested against the hand-written makefiles on the supported HPC's for reproducibility.

Minor changes in 3 Fortran source codes syndat_qctropcy.fd/qctropcy.f, gfs_bufr.fd/meteorg.f, and fv3nc2nemsio.fd/fv3_main.f90 were required to get these utilities to be compiled with GNU Fortran.

@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

I cloned the feature/cmake branch on Hera and ran build_workflow_utils.sh without doing anything else. I got this error:

-bash-4.2$ pwd
/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
-bash-4.2$ git branch
* feature/cmake
-bash-4.2$ sh build_workflow_utils.sh
++ uname -s
+ [[ Linux == Darwin ]]
++ which readlink
+ cmd=/usr/bin/readlink
++++ /usr/bin/readlink -f -n build_workflow_utils.sh
+++ dirname /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/build_workflow_utils.sh
++ cd /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
++ pwd -P
+ readonly UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ target=NULL
+ modulefile=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.NULL
+ [[ -f /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.NULL ]]
+ [[ -d nceplibs-ncio ]]
+ git clone -b develop https://github.com/noaa-emc/nceplibs-ncio
Cloning into 'nceplibs-ncio'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 198 (delta 11), reused 16 (delta 6), pack-reused 170
Receiving objects: 100% (198/198), 107.73 KiB | 4.90 MiB/s, done.
Resolving deltas: 100% (102/102), done.
+ cd nceplibs-ncio
+ mkdir -p build
+ cd build
+ cmake -DCMAKE_INSTALL_PREFIX=../install
CMake Error: The source directory "/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/nceplibs-ncio/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.

I noticed it didn't know the target (other build scripts source machine-setup.sh) so I first added "target=hera" to test and then tried adding a line to source machine-setup.sh. It got "hera" as the target either way but it still error:

-bash-4.2$ sh build_workflow_utils.sh
++ uname -s
+ [[ Linux == Darwin ]]
++ which readlink
+ cmd=/usr/bin/readlink
++++ /usr/bin/readlink -f -n build_workflow_utils.sh
+++ dirname /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/build_workflow_utils.sh
++ cd /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
++ pwd -P
+ readonly UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ UTILS_DIR=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc
+ source ./machine-setup.sh
+ target=hera
+ modulefile=/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.hera
+ [[ -f /scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/../modulefiles/workflow_utils.hera ]]
+ set +x

The following have been reloaded with a version change:
  1) impi/2018.4.274 => impi/2018.0.4


Currently Loaded Modules:
  1) cmake/3.16.1       4) hpc-intel/18.0.5.274   7) jasper/2.0.22  10) w3emc/2.7.3  13) bacio/2.4.1   16) nemsiogfs/2.5.3  19) bufr/11.4.0
  2) hpc/1.1.0          5) impi/2018.0.4          8) zlib/1.2.11    11) sp/2.3.3     14) w3nco/2.4.1   17) sigio/2.3.2      20) hdf5/1.10.6
  3) intel/18.0.5.274   6) hpc-impi/2018.0.4      9) png/1.6.35     12) ip/3.3.3     15) nemsio/2.5.2  18) g2/3.4.1         21) netcdf/4.7.4



+ [[ -d nceplibs-ncio ]]
+ rm -rf nceplibs-ncio
+ git clone -b develop https://github.com/noaa-emc/nceplibs-ncio
Cloning into 'nceplibs-ncio'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 198 (delta 11), reused 16 (delta 6), pack-reused 170
Receiving objects: 100% (198/198), 107.73 KiB | 5.67 MiB/s, done.
Resolving deltas: 100% (102/102), done.
+ cd nceplibs-ncio
+ mkdir -p build
+ cd build
+ cmake -DCMAKE_INSTALL_PREFIX=../install
CMake Warning:
  No source or binary directory provided.  Both will be assumed to be the
  same as the current working directory, but note that this warning will
  become a fatal error in future CMake releases.


CMake Error: The source directory "/scratch1/NCEPDEV/global/Kate.Friedman/git/feature-cmake/sorc/nceplibs-ncio/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.

I'm not experienced with cmake...am I missing something? Thanks!

… compiled files. bugfix in build_workflow_utils.sh hack. nceplibs-ncio now creates the module ncio and not fv3gfs_ncio.
@aerorahul

Copy link
Copy Markdown
Contributor Author

@KateFriedman-NOAA
My apologies. The hack was missing a path to the source directory for nceplibs-ncio.
It has been fixed. I tested on Orion:
target=orion ./build_workflow_utils.sh

@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

@aerorahul Thanks for the fix! I updated my clone of feature/cmake on Hera and all workflow util codes built, woohoo!

These workflow utilities were compiled on hera and orion, but not on wcoss_dell_p3

Question...why not on wcoss_dell_p3? Are we waiting for a library update there?

@aerorahul

Copy link
Copy Markdown
Contributor Author

I did not build on wcoss_dell_p3. I was lazy. But, it should build.
When hpc-stack gets nceplibs-ncio installed on all machines within hpc-stack, we can remove the hack that is in build_workflow_utils.sh and uncomment the line in modulefiles/workflow_utils.<target> that loads the ncio module.

@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

@aerorahul Gotcha, I'll try the build on WCOSS-Dell and report any issues.

I ran a cycle test overnight on Hera with feature/cmake. The gdasechgres job segfaulted:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log.1

I reran with module_base.hera using hpc-stack to test that but it segfaulted similarly:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log

I see mention of module_ncio, do we need to load an additional module at runtime? Thanks!

Also, there is some additional updating/cleanup/decommissioning work to be done on feature/cmake (e.g. remove unneeded build scripts). I had to do some updating to get the branch ready to run so I'll finish that and commit those updates.

@aerorahul

Copy link
Copy Markdown
Contributor Author

@KateFriedman-NOAA
module_ncio is a fortran module and not a modulefile.
I will investigate the failure.

The cleanup/decommissioning should be done in a separate branch and not in feature/cmake and in a follow up PR.
You can push changes that you need to make the branch ready to run.

@aerorahul

Copy link
Copy Markdown
Contributor Author

@KateFriedman-NOAA
I suspect there is an issue in the ipolates routine coming from ip/3.3.3.
Can you confirm that your branch feature/hpc-stack works on hera for enkf_chgres_recenter_nc.fd?
I will tag @GeorgeGayno-NOAA and @kgerheiser as they have been in the weeds of ip lately.

…_nc.fd. When going from nemsio to netcdf, the linking of ip, sp and w3nco changed from _d to _4.
@aerorahul

Copy link
Copy Markdown
Contributor Author

@KateFriedman-NOAA
I found the copy/paste error from enkf_chgres_recenter.fd to enkf_chgres_recenter_nc.fd in CMakeLists.txt. When going from nemsio to netcdf, the linking of ip, sp and w3nco changed from _d to _4.
I tested the program and it now terminates normally.

@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

@aerorahul Thanks for the fix! I updated my clone on Hera, reran build_workflow_utils.sh and then rewound the failed gdasechgres job. The job completed successfully!

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090118/gdasechgres.log

I also ran the gdasechgres job in my feature/hpc-stack test (symlinked workflow execs to feature/cmake execs) and it also completed successfully:

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/stacktest1/logs/2020090118/gdasechgres.log

I am letting my cmake test on Hera continue forward another cycle. I want to run a cycle on both Orion and WCOSS-Dell, and repeat the test on Hera for a reproducibility check.

I'll have a commit shortly to add workflow_utils to the build_all and linking scripts so it builds/links out of the box again. Further cleanup and decommissioning will happen in a follow-up PR.

- add build_workflow_utils.sh to build_all.sh
- add workflow utils exec symlinks to link_fv3gfs.sh since these execs no longer land in /exec folder
- add workflow_utils build to partial_build.sh and fv3gfs_build.cfg for use by build_all
- turn off builds in fv3gfs_build.cfg that are now covered by workflow_utils build
- move machine-setup source in build_workflow_utils.sh before target is needed

Further cleanup and decommissioning of consolidated builds will come in follow-up commit/PR.

Refs: #264
Comment thread sorc/build_workflow_utils.sh Outdated
Comment thread sorc/fv3gfs_build.cfg
Comment thread sorc/build_all.sh Outdated
- Add $target to build_workflow_utils.sh command in build_all.sh
- Move machine-setup source back down into if-block in build_workflow_utils.sh

Refs: #264
@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

@aerorahul The analcalc jobs are failing in my test of feature/cmake on Hera. Can you take a look? Thanks!

/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090200/gdasanalcalc.log
/scratch1/NCEPDEV/stmp4/Kate.Friedman/comrot/cmake/logs/2020090200/gfsanalcalc.log

@aerorahul

Copy link
Copy Markdown
Contributor Author

@WalterKolczynski-NOAA @KateFriedman-NOAA
Can you as part of the review check and make sure that all the compiler flags and library linking was accurately transferred from the Makefile to CMakeLists.txt.

@WalterKolczynski-NOAA

Copy link
Copy Markdown
Contributor

@WalterKolczynski-NOAA @KateFriedman-NOAA
Can you as part of the review check and make sure that all the compiler flags and library linking was accurately transferred from the Makefile to CMakeLists.txt.

Will do

@WalterKolczynski-NOAA WalterKolczynski-NOAA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did what I could to cross-check against existing makefiles/build scripts with my rudimentary knowledge of cmake and the library dependencies. I suspect most of my comments will be either covered by cmake picking up library dependencies or just inconsequential changes. It's also possible I missed a flag being set somewhere, though I tried to check both the build script and the makefile.

Comment thread sorc/fv3nc2nemsio.fd/CMakeLists.txt Outdated
Comment thread sorc/fv3nc2nemsio.fd/fv3_main.f90
Comment thread modulefiles/workflow_utils.hera Outdated
Comment thread modulefiles/workflow_utils.orion Outdated
Comment thread modulefiles/workflow_utils.wcoss_dell_p3 Outdated
Comment thread sorc/syndat_maksynrc.fd/CMakeLists.txt
Comment thread sorc/tave.fd/CMakeLists.txt
Comment thread sorc/tocsbufr.fd/CMakeLists.txt
Comment thread sorc/vint.fd/CMakeLists.txt
Comment thread sorc/CMakeLists.txt
@aerorahul

Copy link
Copy Markdown
Contributor Author

@WalterKolczynski-NOAA
Thank you for a careful review and cross-examination of compiler flags and dependencies.
As you can see, the ingredients are sprinkled across files and there is no "convention" or "consistency".
Please examine the responses and mark the ones that you are satisfied as "resolved".
I will work on what is left and request a re-review.

@WalterKolczynski-NOAA

Copy link
Copy Markdown
Contributor

Sorry for the hold-up. I've resolved all the ones that are definitely taken care of. I will leave it to your discretion about what to do with the remainder and whether they are impactful or not.

@aerorahul

Copy link
Copy Markdown
Contributor Author

@WalterKolczynski-NOAA Thanks again for the careful review.
There are no more changes from me on this PR.

@WalterKolczynski-NOAA

Copy link
Copy Markdown
Contributor

@KateFriedman-NOAA Go ahead and merge if you are satisfied.

@KateFriedman-NOAA KateFriedman-NOAA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cmake-ifying these codes @aerorahul ! Tested branch on Hera, Orion, and WCOSS-Dell. Ran fine. Duplicate run on Hera reproduced output.

@KateFriedman-NOAA

Copy link
Copy Markdown
Contributor

Reran build_workflow_utils.sh on WCOSS-Dell, Hera, and Orion after commit at 1d7776d. Build ran fine.

@KateFriedman-NOAA KateFriedman-NOAA merged commit f6aa693 into develop Feb 19, 2021
@aerorahul aerorahul deleted the feature/cmake branch February 19, 2021 15:16
JessicaMeixner-NOAA pushed a commit to JessicaMeixner-NOAA/global-workflow that referenced this pull request Nov 7, 2025
* Fb oasis t+0 (#13)

In the original code a coupling lag had to be used, coupling took place at the last time step but not at the first, and the coupling fields had to be written in an oasis restart file. With this ticket, the program can run in a different way by not using a coupling lag, coupling taking place at the first time step but not the last, and the coupling fields are written in the wave restart file - no extra oasis restart file needed.

* Fb uprstr inp (#15)

Enhancement to improve the way in which input data is read in and logged by the ww3_uprstr program:
  * read in variables specific to the update process selected
  * output the values provided in the ww3_uprstr.out log file
  * update the .inp template file and regtests to improve clarity and work with the changes
  * add capability to read inputs from a namelist (ww3_uprstr.nml) file

* Bf unconforming where in coupling routines (#17)

* Fix non-conforming WHERE statements in coupled routines
* Stop comparing history lines in OASIS rmp files

* Changes for efficient SMC grid coupling (#16)

* Changes for efficient SMC grid coupling
* Ensure consistency between SMC coupled test nml and inp files

Co-authored-by: Juan Manuel Castillo Sanchez <48921434+ukmo-juan-castillo@users.noreply.github.com>
Co-authored-by: Andy Saulter <48921142+ukmo-ansaulter@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants