Test Ufuk's feature/era5 refactor#11
Conversation
When there were NaN values on the dataset, these checks were generating "floating invalid". At least part of the reason this is probably appearing with ifx is because ifx apparently doesn't do short-circuit conditional evaluation: in these cases, stream%mapalgo is *not* 'none', so many other compilers probably were never actually evaluating the comparison between the data and the fillvalue, but ifx is. So I'm fixing this by effectively doing the short-circuit evaluation manually - i.e., only doing the data comparison if stream%mapalgo is 'none'.
Before this change, SMS_D_Ld3_PS.f09_t232.I1850Clm60SpCrujraNoAnthro.derecho_intel.clm-decStart1851_noinitial--clm-nofireemis was failing on one of these changed lines, presumably because ifx doesn't do short-circuit evaluation of conditions, so it was failing when trying to do a comparison on a NaN. This commit refactors conditionals to avoid doing comparisons if we have found that the value is NaN.
Fixes for ifx compiler ### Description of changes The ifx compiler seems not to do short-circuit evaluation of conditionals. This was leading to failures in some debug tests when there were NaN values on an input file: checks like `if (.not. isnan(x) .and. x /= fillval)` were okay if short-circuit evaluation was being done, but are failing now. This PR refactors some conditionals to effectively do this short-circuit evaluation via splitting the conditionals. ### Specific notes Contributors other than yourself, if any: CDEPS Issues Fixed (include github issue #): Are there dependencies on other component PRs (if so list): no Are changes expected to change answers (bfb, different to roundoff, more substantial): bfb Any User Interface Changes (namelist or namelist defaults changes): no Testing performed (e.g. aux_cdeps, CESM prealpha, etc): Ran aux_cdeps plus the following tests in the context of cesm3_0_alpha08e: ``` SMS_D_Ld1.ne30pg3_t232.I1850Clm50BgcSpinup.derecho_intel.clm-cplhist SMS_D.TL319_t232.G_JRA_RYF.derecho_intel SMS_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-default ERP_D_Ln9.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s SMS_D_Ld1_PS.f09_g17.I2000Clm50BgcCru.derecho_intel.clm-datm_bias_correct_cruv7 SMS_D_Ld5.f10_f10_mg37.ISSP245Clm60BgcCropCrujra.derecho_intel.clm-datm_rcp45_anom_forc ``` All tests passed and were bfb except for one expected failure from the aux_cdeps suite (which also failed in alpha08e: `FAIL SMS_Ld3.f19_g17_rx1.2000_SATM_SLND_SICE_SOCN_DROF%NYF_SGLC_SWAV.derecho_intel CREATE_NEWCASE`) Also ran `SMS_D_Ld3_PS.f09_t232.I1850Clm60SpCrujraNoAnthro.derecho_intel.clm-decStart1851_noinitial--clm-nofireemis` with updated ccs_config that switches to ifx; that test failed before these changes and passes with these changes. Hashes used for testing: cesm3_0_alpha08e
There was a problem hiding this comment.
Code Review
This pull request introduces a skip_field_check flag to bypass field validation in the DATM component, adds support for several new ERA5 stream variables, and refactors unit conversion and stream reading logic. The review feedback highlights several critical issues in datm_datamode_era5_mod.F90, including a scaling bug in Faxa_swnet unit conversion, a missing error status argument in ESMF_MeshGet, a memory leak from an undeallocated pointer, and flawed logic for determining stream size. Additionally, the reviewer noted potential segmentation faults and undefined behavior due to missing association checks on wind components and uninitialized temperature/pressure variables, an omitted unit conversion for Faxa_snow, and an unused cosfactor calculation.
| if (associated(strm_Faxa_swdn)) then | ||
| if (.not. associated(strm_Faxa_swnet) .and. associated(Faxa_swnet)) then | ||
| avg_alb = ( 0.069 - 0.011*cos(2.0_R8*yc(n)*degtorad ) ) | ||
| Faxa_swnet(n) = strm_Faxa_swdn(n)*(1.0_R8 - avg_alb) | ||
| Faxa_swnet(n) = Faxa_swnet(n)*3600.0_r8 ! to J/m^2 | ||
| end if | ||
| end if |
There was a problem hiding this comment.
Multiplying Faxa_swnet(n) by 3600.0_r8 on line 635 is incorrect when lwmax < 1.0e4_r8 (i.e., when flux variables are already in W/m^2). In that case, the division by 3600.0_r8 in the unit conversion section is skipped, leaving Faxa_swnet 3600 times too large. Since strm_Faxa_swdn already has the correct units (either W/m^2 or J/m^2), Faxa_swnet naturally inherits the correct units from it. Removing line 635 completely resolves this bug for both cases.
if (associated(strm_Faxa_swdn)) then
if (.not. associated(strm_Faxa_swnet) .and. associated(Faxa_swnet)) then
avg_alb = ( 0.069 - 0.011*cos(2.0_R8*yc(n)*degtorad ) )
Faxa_swnet(n) = strm_Faxa_swdn(n)*(1.0_R8 - avg_alb)
end if
end if
| if (ChkErr(rc,__LINE__,u_FILE_u)) return | ||
| allocate(ownedElemCoords(spatialDim*numOwnedElements)) | ||
| allocate(yc(numOwnedElements)) | ||
| call ESMF_MeshGet(sdat%model_mesh, ownedElemCoords=ownedElemCoords) |
There was a problem hiding this comment.
The call to ESMF_MeshGet does not pass the rc argument, but the subsequent line checks rc using ChkErr. This means the error check will not detect a failure of this call and will instead check the status of the previous ESMF call. Please add rc=rc to the ESMF_MeshGet call.
call ESMF_MeshGet(sdat%model_mesh, ownedElemCoords=ownedElemCoords, rc=rc)
| integer :: n | ||
| integer :: spatialDim ! number of dimension in mesh | ||
| integer :: numOwnedElements ! size of mesh | ||
| real(r8), pointer :: ownedElemCoords(:) ! mesh lat and lons |
There was a problem hiding this comment.
ownedElemCoords is declared as a local pointer and allocated on line 192, but it is never deallocated, leading to a memory leak. Since it is only used locally to populate yc, it is highly recommended to declare it as allocatable instead of pointer. This ensures it is automatically deallocated when the subroutine exits, even in case of early returns.
real(r8), allocatable :: ownedElemCoords(:) ! mesh lat and lons
| if (associated(Sa_z)) then | ||
| lsize = size(Sa_z) | ||
| else | ||
| if (mainproc) write(logunit,*) subname,' Sa_z is not given. Try with Sa_pslv to get stream size' | ||
| end if | ||
| if (associated(Sa_pslv)) then | ||
| lsize = size(Sa_pslv) | ||
| else | ||
| if (mainproc) write(logunit,*) subname,' Sa_pslv is also not given. Exiting!' | ||
| call shr_log_error(subname//'ERROR: Sa_z and/or Sa_pslv must be associated for era5 datamode', rc=rc) | ||
| return | ||
| end if |
There was a problem hiding this comment.
The logic to determine lsize will always fail and exit if Sa_pslv is not associated, even if Sa_z is associated and has been used to set lsize. If the intention is that at least one of Sa_z or Sa_pslv must be associated, the logic should be refactored to only fail if both are unassociated.
lsize = -1
if (associated(Sa_z)) then
lsize = size(Sa_z)
else if (associated(Sa_pslv)) then
lsize = size(Sa_pslv)
end if
if (lsize < 0) then
if (mainproc) write(logunit,*) subname,' Neither Sa_z nor Sa_pslv is given. Exiting!'
call shr_log_error(subname//'ERROR: Sa_z and/or Sa_pslv must be associated for era5 datamode', rc=rc)
return
end if
| !--- calculate wind components if wind speed is provided --- | ||
| if (associated(strm_Sa_wspd)) then | ||
| Sa_u(n) = strm_Sa_wspd(n)/sqrt(2.0_r8) | ||
| Sa_v(n) = Sa_u(n) | ||
| end if | ||
| if (associated(strm_Sa_wspd10m)) then | ||
| Sa_u10m(n) = strm_Sa_wspd10m(n)/sqrt(2.0_r8) | ||
| Sa_v10m(n) = Sa_u10m(n) | ||
| end if |
There was a problem hiding this comment.
The wind components Sa_u, Sa_v, Sa_u10m, and Sa_v10m are accessed and assigned to without checking if they are associated. Since these pointers are retrieved from the export state with allowNullReturn=.true., they can be unassociated, which would cause a segmentation fault. Please add association checks before accessing them.
!--- calculate wind components if wind speed is provided ---
if (associated(strm_Sa_wspd)) then
if (associated(Sa_u)) Sa_u(n) = strm_Sa_wspd(n)/sqrt(2.0_r8)
if (associated(Sa_v)) Sa_v(n) = strm_Sa_wspd(n)/sqrt(2.0_r8)
end if
if (associated(strm_Sa_wspd10m)) then
if (associated(Sa_u10m)) Sa_u10m(n) = strm_Sa_wspd10m(n)/sqrt(2.0_r8)
if (associated(Sa_v10m)) Sa_v10m(n) = strm_Sa_wspd10m(n)/sqrt(2.0_r8)
end if
| !--- calculate specific humidity from dew point temperature --- | ||
| if (associated(strm_Sa_tdew)) then | ||
| if (associated(strm_Sa_t2m)) then | ||
| tbot = strm_Sa_t2m(n) | ||
| else if (associated(strm_Sa_tbot)) then | ||
| tbot = strm_Sa_tbot(n) | ||
| end if | ||
|
|
||
| if (associated(strm_Sa_pslv)) then | ||
| pbot = strm_Sa_pslv(n) | ||
| else if (associated(strm_Sa_pbot)) then | ||
| pbot = strm_Sa_pbot(n) | ||
| end if | ||
|
|
||
| if (td2max < 50.0_r8) strm_Sa_tdew(n) = strm_Sa_tdew(n) + tkFrz | ||
| e = datm_eSat(strm_Sa_tdew(n), tbot) | ||
| qsat = (0.622_r8 * e)/(pbot - 0.378_r8 * e) | ||
| if (associated(Sa_q2m)) Sa_q2m(n) = qsat | ||
| if (associated(Sa_shum)) Sa_shum(n) = qsat | ||
| end if |
There was a problem hiding this comment.
If neither strm_Sa_t2m nor strm_Sa_tbot is associated, tbot remains uninitialized. Similarly, if neither strm_Sa_pslv nor strm_Sa_pbot is associated, pbot remains uninitialized. This can lead to undefined behavior, division by zero, or floating-point exceptions when calculating e and qsat. Please initialize them to safe default values or guard the calculation block to ensure they are associated.
!--- calculate specific humidity from dew point temperature ---
if (associated(strm_Sa_tdew)) then
tbot = 0.0_r8
if (associated(strm_Sa_t2m)) then
tbot = strm_Sa_t2m(n)
else if (associated(strm_Sa_tbot)) then
tbot = strm_Sa_tbot(n)
end if
pbot = 0.0_r8
if (associated(strm_Sa_pslv)) then
pbot = strm_Sa_pslv(n)
else if (associated(strm_Sa_pbot)) then
pbot = strm_Sa_pbot(n)
end if
if (tbot > 0.0_r8 .and. pbot > 0.0_r8) then
if (td2max < 50.0_r8) strm_Sa_tdew(n) = strm_Sa_tdew(n) + tkFrz
e = datm_eSat(strm_Sa_tdew(n), tbot)
qsat = (0.622_r8 * e)/(pbot - 0.378_r8 * e)
if (associated(Sa_q2m)) Sa_q2m(n) = qsat
if (associated(Sa_shum)) Sa_shum(n) = qsat
end if
end if
| if (associated(Faxa_rain)) Faxa_rain(:) = Faxa_rain(:)/3600.0_r8*rhofw | ||
| if (associated(Faxa_rainc)) Faxa_rainc(:) = Faxa_rainc(:)/3600.0_r8*rhofw | ||
| if (associated(Faxa_rainl)) Faxa_rainl(:) = Faxa_rainl(:)/3600.0_r8*rhofw | ||
| if (associated(Faxa_snowc)) Faxa_snowc(:) = Faxa_snowc(:)/3600.0_r8*rhofw | ||
| if (associated(Faxa_snowl)) Faxa_snowl(:) = Faxa_snowl(:)/3600.0_r8*rhofw |
There was a problem hiding this comment.
Faxa_snow is missing from the unit conversion section. When precmax >= 0.01_r8, all other precipitation variables (Faxa_rain, Faxa_rainc, Faxa_rainl, Faxa_snowc, Faxa_snowl) are converted from m to kg/m^2/s, but Faxa_snow is left unconverted. Please add Faxa_snow to the conversion list.
if (associated(Faxa_rain)) Faxa_rain(:) = Faxa_rain(:)/3600.0_r8*rhofw
if (associated(Faxa_rainc)) Faxa_rainc(:) = Faxa_rainc(:)/3600.0_r8*rhofw
if (associated(Faxa_rainl)) Faxa_rainl(:) = Faxa_rainl(:)/3600.0_r8*rhofw
if (associated(Faxa_snow)) Faxa_snow(:) = Faxa_snow(:)/3600.0_r8*rhofw
if (associated(Faxa_snowc)) Faxa_snowc(:) = Faxa_snowc(:)/3600.0_r8*rhofw
if (associated(Faxa_snowl)) Faxa_snowl(:) = Faxa_snowl(:)/3600.0_r8*rhofw
| call shr_cal_date2julian(target_ymd, target_tod, rday, model_calendar) | ||
| rday = mod((rday - 1.0_R8),365.0_R8) | ||
| cosfactor = cos((2.0_R8*SHR_CONST_PI*rday)/365 - phs_c0) |
Description of changes
Specific notes
Contributors other than yourself, if any:
CDEPS Issues Fixed (include github issue #):
Are there dependencies on other component PRs (if so list):
Are changes expected to change answers (bfb, different to roundoff, more substantial):
Any User Interface Changes (namelist or namelist defaults changes):
Testing performed (e.g. aux_cdeps, CESM prealpha, etc):
Hashes used for testing: