Compile CICE with safe CPU instructions#1563
Conversation
|
@DavidHuber-NOAA Are any performance impacts expected from this fix? |
|
@DeniseWorthen I would expect some minor performance changes, but nothing terribly significant. Having run the full RT suite on xjet, kjet, and Hera, I can say that no tests failed due to time limit. |
|
There are no changes in CICE repo, this is strictly building on UFSWM. So this should be ready for commit queue. @DavidHuber-NOAA or @DeniseWorthen if you can double check on the reproducibility (create baselines then check against those baselines) on one of the changed tests that would be helpful to double check. We will attempt to start running RT's later this afternoon. |
|
@BrianCurtis-NOAA Sure, I will test cpld_control_c48. |
|
@BrianCurtis-NOAA I rebaselined cpld_control_c48 and then tested against it and results were identical. |
|
I am adding new BL_DATE. |
|
We will start testing from hera. |
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
|
Automated RT Failure Notification |
|
Automated RT Failure Notification |
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
|
We will skip cheyenne. It's under maintenance whole week. with that, all tests are done. Can we have final approvals to start merging in? |
|
FYI: we will resume to test with jenkins-ci again from next pr. |
|
@jkbk2004 CISL just sent notification that all services are returned. |
let me check. |
Description
The CICE component is currently compiled with Intel using the
-xHOSTflag, which specifies CPU instructions based on the node performing the compilation. In the case of systems like Jet, this causes an issue as the head nodes are comprised of Skylake architecture capable of interpreting certain instructions like AVX-512, while the xjet partition is comprised of Haswell cores that cannot interpret AVX-512 instructions. Thus, when attempting to run a coupled forecast on xjet after compiling on the head node, the UFS will crash.Note that this could also be an issue for Hercules/Orion (if Hercules-compiled executables should run on Orion).
To fix this,
-xHOSThas been removed from the CICE interface CMake file so that the instructions specified in configure_jet.intel.cmake and Intel.cmake can be inherited. This does change regression test results for all coupled Intel tests, even on Hera, as-xHOSTbuilds with, among other things, AVX-512 instructions while the new configuration will send AVX-2 instructions.Top of commit queue on: TBD
Input data additions/changes
Anticipated changes to regression tests:
All GNU RTs should not see a change. The following Intel, coupled RTs will change due to new compiler instructions:
cpld_control_p8_mixedmode
cpld_control_gfsv17
cpld_control_p8
cpld_2threads_p8
cpld_esmfthreads_p8
cpld_decomp_p8
cpld_mpi_p8
cpld_control_ciceC_p8
cpld_control_c192_p8
cpld_bmark_p8
cpld_control_noaero_p8
cpld_control_nowave_noaero_p8
cpld_control_noaero_p8_agrid
cpld_control_c48
cpld_warmstart_c48
datm_cdeps_control_cfsr
datm_cdeps_control_gefs
datm_cdeps_iau_gefs
datm_cdeps_stochy_gefs
datm_cdeps_ciceC_cfsr
datm_cdeps_bulk_cfsr
datm_cdeps_bulk_gefs
datm_cdeps_mx025_cfsr
datm_cdeps_mx025_gefs
datm_cdeps_multiple_files_cfsr
datm_cdeps_3072x1536_cfsr
datm_cdeps_gfs
Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Closes #1262
Testing Day Checklist:
Testing Log (for CM's):