Faster, safer, compilation options targeted for HAFS operational upgrade#1554
Conversation
|
It would be great if @BinLiu-NOAA @ZhanZhang-NOAA and @BijuThomas-NOAA could review this, as they're the HAFS developers that helped test these options on WCOSS2. |
|
It looks like you ran RT with new baselines. Since it does not look like baselines are changed can you please run against current baselines Also with any new RT, we need ORT to be run with |
You are incorrect. I ran against old baselines by rsyncing the BL_DATE baseline directory into my directory. Only the new test has a new directory. |
OK great. It didn't appear that way with your log, which confused me. Thanks. |
|
@jkbk2004 @DeniseWorthen Please review code before commit queue assignment. |
|
So, performance impact is mostly expected on wcoss2, right? |
No. It'll be faster everywhere. |
|
Wasn't quite awake in my last post. There is no FV3 PR that I forgot to make. All of the build options are set at the ufs-weather-model level; nothing below that overrides them (I checked). |
|
@SamuelTrahanNOAA Have your run the reproducibility tests with the fast compile options? Do we have run-to-run reproducibility? Will the results change with different threads? If I remember correctly, the hafs test does not reproduce with different MPI tasks or in restart runs yet. |
I did, and so did other people.
Yes.
No, it still doesn't. Several people tried several approaches to get this working, but we've run out of time for the implementation.
I don't even know how to set up a coupled wave-ocean-atmosphere restart test.
I don't think this has been tested with any optimization options (or lack thereof) because the main focus was MPI task decomposition. Someone else would have to do this test on Dogwood, since Cactus is unusable at the moment, and Acorn is too small. |
|
The automatic github tests are failing due to an internal error:
I manually checked, and my repo is up-to-date with develop. |
@junwang-noaa Are you asking me to run all of the tests in rt.conf this way? I don't even know if they'll work. Some may fail to build, getting the dreaded "internal compiler error," which happens more often at higher optimization levels. My only target for this was the two operational HAFS configurations, due to lack of time; the model took 50% more resources than it should, but now it runs in its window with a minute or three to spare. If people want to use this for other configurations, they'll have to test it first. I'll try an "all suites" build with the SUPAFAST=ON and see what happens. |
|
I also ran #1558 branch code on Gaea and cpld_control_c192_p8 test passed against the current baseline. |
|
Yeah, it's running on my side. Hopefully, we can finish today. |
|
@SamuelTrahanNOAA I like to confirm one thing. this pr only changes affect hafs_regional_storm_following_1nest_atm_ocn_wav. no imapct on all other hafs tests, right? |
|
@SamuelTrahanNOAA @BrianCurtis-NOAA All tests are done. the gaea issue was my mistake. All ok now. This pr affects only on hafs_regional_storm_following_1nest_atm_ocn_wav. We are ready to merge the pr. |
BrianCurtis-NOAA
left a comment
There was a problem hiding this comment.
@SamuelTrahanNOAA @BrianCurtis-NOAA All tests are done. the gaea issue was my mistake. All ok now. This pr affects only on hafs_regional_storm_following_1nest_atm_ocn_wav. We are ready to merge the pr.
@jkbk2004 We haven't confirmed from @SamuelTrahanNOAA why the hafs tests don't fail comparing to previous baselines. We need that to continue.
|
Compared on WCOSS2 to previous baselines and hafs_regional_atm_ocn, hafs_regional_atm_ocn_wav, hafs_regional_storm_following_1nest_atm_ocn, and hafs_regional_storm_following_1nest_atm_ocn_wav all do not compare sucessfully. They all required new baselines. |
@BrianCurtis-NOAA cmake/Intel.cmake lines 23-27 same as develop branch. so all cases reproduce except hafs_regional_storm_following_1nest_atm_ocn_wav |
|
@jkbk2004, @BrianCurtis-NOAA, there is a HYCOM build option change taking out "-xSSE4.2". So some other HAFS HYCOM coupled RTs could also get result changing. |
Let me confirm again with some hafs tests on hera. Weird thing is all other hafs cases not affacted. |
|
@BinLiu-NOAA Doesn't the removal of -xSSE4.2 only happen for Gaea? Meaning that testing on Hera won't tell you anything about how the tests will compare on Gaea? |
@DeniseWorthen, the removal of "-xSSE4.2" is meant to happen to at least on WCOSS2 (maybe also on Hera/Orion/Jet). I think @SamuelTrahanNOAA made Gaea to keep using "-xSSE4.2" (since it failed on Gaea after taking it out). |
|
@BinLiu-NOAA Thanks, I guess I had it reversed! |
|
What is the status of this PR? |
|
@SamuelTrahanNOAA, I think we are waiting for @jkbk2004 to double check if the HAFS HYCOM coupling RTs also got result changing on Hera (which may or may not be the case). But, otherwise, it looks all the regression tests have gone through. |
|
Confirms hafs_regional_atm_ocn is not changing on hera. So all hafs cases on hera/orion/gaea/jet/cheyenne same results except hafs_regional_storm_following_1nest_atm_ocn_wav. So, we can move one at least if "-xSSE4.2" handling behaves as expected on wcoss2. @BinLiu-NOAA what do you think? |
|
@jkbk2004, @BrianCurtis-NOAA, @SamuelTrahanNOAA, we would like to remove using "-xSSE4.2" option for all platforms (except on Gaea, because it fails). Even if it turns out to change results, we can just accept the fact of result changing for those HAFS-HYCOM coupled RTs. Thanks! |
|
I agree. we can investigate more across machines. I will create an issue. |
|
@BrianCurtis-NOAA let's go ahead to merge. |
@jkbk2004, Thanks for confirming all HAFS HYCOM coupling RTs (except the one except hafs_regional_storm_following_1nest_atm_ocn_wav which modified to use the FASTER build option) can reproduce the previous baseline results on hera/orion/gaea/jet/cheyenne. And the HAFS HYCOM RTs are expected to change results on WCOSS2 (since on WCOSS2 we are not recommended to this -xSSE4.2 option). And that's why we are taking out this -xSSE4.2 option in this PR. So, for me, all the result changing the corresponding HAFS RTs are expected/understandable. Thus, I do not see any issue to be concerned/created. Thanks! |
|
created an issue #1565 |
Description
Adds a set of faster, safer, optimization options planned for the HAFS operational upgrade whose code freeze is in January 2023. The FV3 uses the operational HRRR optimization options, without -ip, since that option doesn't speed up the model on WCOSS2, and it does slow down the compilation. To enable this, use
-DFASTER=ONwhen running the ufs-weather-model cmake.Also, regardless of cmake options, the HYCOM no longer uses
-xSSE4.2option that shouldn't be turned on in any WCOSS2 build.One test,
hafs_regional_storm_following_1nest_atm_ocn_wav, runs with an executable that uses the new compilation options. That means the results from that test will change. The default optimization options, and the actions of other-Dwhateverflags, are not changed at all. Therefore, no other tests should change results.FV3 Compilation Option Change
The new FV3 compilation options use value-safe optimization via
-fp-model precise, rather than-fp-model fastlike the default FV3 optimization options. They are far faster because just about every optimization is turned on, other than the ones that are turned off automatically due to not being value safe. Most importantly,-fp-model consistentis turned off. That option is often misunderstood; the only place it serves any real purpose is Jet, a heterogeneous cluster, where it is possible to get the same results across several machines. The cost of that is a massive performance penalty, even with -fp-model fast. Any other floating-point consistency safety already comes in -fp-model precise.Oh, what was that? You don't see -fp-model fast in the current optimization options?
It is on by default. The Intel compiler has four categories of fp-model options. One of them is the value safety vs. speed trade-off: precise, fast, and fast2. If you don't specify any of those options, you get -fp-model fast. The -fp-model consistent is another category, and really serves no purpose if fp-model precise is enabled. (If you're wondering, the often used -fp-model strict turns on several options, including -fp-model precise.)
These options speed up the HAFS by 12% on WCOSS2.
HYCOM Compilation Option Change
HYCOM uses AMD chips with the Intel compiler, and none of the Intel compiler settings do a great job of optimizing for AMD. The craype module already loads the best platform target settings. Specifying additional ones will slow down the model, or cause it to crash, if they have any effect at all.
This HYCOM compilation option change has only a small effect on the runtime since FV3 is the slow component in our current configuration.
Top of commit queue on: TBD
There are no sub-component PRs.
Input data additions/changes
There will be new input data.Input data will be updated.Anticipated changes to regression tests:
The HYCOM compilation options have changed, so all tests that use HYCOM will have new results. That includes all tests with "hafs" in their name, but no other tests.
Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Fixes #1553
Testing Day Checklist:
Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.Testing Log (for CM's):