Fix time offset issue on ICS with GFS nemsio and netcdf files and add new archive file name on HPSS#457
Conversation
|
@christinaholtNOAA @danielabdi-noaa, if you have any better idea, please let me know. |
danielabdi-noaa
left a comment
There was a problem hiding this comment.
Looks good to me.
| if cla.external_model == "FV3GFS" and cla.ics_or_lbcs == "LBCS": | ||
| del file_templates['nemsio']['fcst'][1] | ||
| del file_templates['netcdf']['fcst'][1] | ||
|
|
There was a problem hiding this comment.
I think you may need to add a similar entry to the "nemsio" format for this to work. Changing the format to nemsio for the new test case you added I get this error:
Traceback (most recent call last):
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 1012, in <module>
main(sys.argv[1:])
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 814, in main
file_templates = get_file_templates(
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 296, in get_file_templates
del file_templates['nemsio']['fcst'][1]
IndexError: list assignment index out of range
There was a problem hiding this comment.
I think this is happening because the gfs_file_names anchor is shared for all sources, so when it tries hpss and doesn't succeed it deletes the sfc entry. Then when it tries aws, the entry is already gone. I think you can solve this by not using anchors in the yaml file (bad) or by making a deep copy of the dictionary before deleting the entries.
There was a problem hiding this comment.
@danielabdi-noaa, nemsio is not available for the date (08/2022). netcdf and grib2 are only available. Do we need to add any other conditions for this?
There was a problem hiding this comment.
@danielabdi-noaa, I've added if-statements to check the availability of netcdf and nemsio for the cycle date in jobs/JREGIONAL_GET_EXTRN_MDL_FILES. If nemsio (or netcdf) is not available, the get_extrn_ics/lbcs will fail with an error message.
There was a problem hiding this comment.
@chan-hoo I still think we need a safeguard against multiple deletions of the sfc file -- best would be to find another way that does not delete entries in the data_locations dictionary. If hpss does not have the file for some reason (maybe it is down), and we want to try aws next, it will fail because the sfc entry has been deleted by HPSS. The invalid nemsio date run i did (although wrong) shows what could happen if we can't find the file on hpss but could be available on aws. I added this before the del if statement to make it try aws successfully when it could not find it in hpss
file_templates = deepcopy(file_templates)
There was a problem hiding this comment.
@danielabdi-noaa, yes, I agree with you. Can you help me? I've sent an invitation to you.
There was a problem hiding this comment.
@chan-hoo No big deal, i've pushed the change i think will avoid multiple deletions of the sfc file. It is not an ideal solution so lets wait for @christinaholtNOAA .
…fs-srweather-app into bugfix/timeoffset_gfs
MichaelLueken
left a comment
There was a problem hiding this comment.
@chan-hoo These changes look good to me. I have built your branch and submitted the new nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16 WE2E test on Hera. The new error message when setting a file format that has past the maximum CDATE is a great addition! I will go ahead and launch the Jenkins tests for this work now. Once @christinaholtNOAA has had the opportunity to review this work, I will approve.
|
@MichaelLueken , Thank you! |
|
Tests passed on Here. It looks good to me! |
|
@chan-hoo The |
|
@danielabdi-noaa, updated. |
MichaelLueken
left a comment
There was a problem hiding this comment.
@chan-hoo With the exception of the MET_verification test on Cheyenne (which your changes wouldn't affect), all of the Jenkins tests have successfully passed.
I would like to note that the new nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16 test fails on Cheyenne. It appears as though the test data needs to be staged on that machine. Since the data isn't available, the test will fail. Before this test can be moved to either the fundamental or comprehensive test sets, the new ICs and LBCs will need to be brought over to Cheyenne.
|
Pending no additional feedback from @christinaholtNOAA, I will merge this PR before 5 pm EST. |
|
Actually, I just now saw this. Give me just a little while to take a look? |
@christinaholtNOAA No rush. I'll wait until you have approved the PR before merging. |
@danielabdi-noaa It looks like there is an issue with the metplus installation on Cheyenne:
I'll run a test using @natalie-perlin's Cheyenne HPC-stack location and see if this error persists. |
|
@MichaelLueken Thanks for looking into the issue. There are some more issues to address with EPIC modulefiles which I've summarized in this issue #458 |
christinaholtNOAA
left a comment
There was a problem hiding this comment.
@chan-hoo I'm sorry for the late response. These emails didn't immediately grab my attention.
I left a couple of comments below with some concerns and suggestions.
|
|
||
| # Remove sfc files from fcst in file_names of FV3GFS for LBCs | ||
| # sfc files needed in fcst when time_offset is not zero. | ||
| if cla.external_model == "FV3GFS" and cla.ics_or_lbcs == "LBCS": |
There was a problem hiding this comment.
Can this just be if "lbcs" and "fcst"? That way we don't tightly couple the tool to specifics of the models we use as inputs?
There was a problem hiding this comment.
@christinaholtNOAA, I am afraid that it will cause another unexpected errors with other models. For example, 'GDAS', 'GSMGFS', 'RAP', 'HRRR', and 'NAM' have only one file name for 'fcst'. In these cases, we will have the same index error. For 'GEFS', we should not remove the second array from the file names. We just want to remove the 'sfc' file from the list. 'FV3GFS' only has this issue. What do you think of this?
There was a problem hiding this comment.
This logic is still pretty brittle. It assumes that there must be at least 2 and that the 2nd one is the surface file. I think that if we want to protect that assumption, we need a functional test.
I also think it's probably safer to do something like delete the entry if "sfc" is in the name of the file. That type of logic may get us the additional functionality we'd need to use offset GDAS netcdf/nemsio as ICS and expand the logic to "lbcs" and "fcst" instead of FV3GFS-specific logic. In general that says "if we're looking for lbcs from forecasts and a surface file is listed, don't try to get it".
As best I can tell, sfc files only show up for GDAS, FV3GFS, and GSMGFS.
I'd prefer to see something like this:
if cla.ics_or_lbcs == "LBCS":
for format in ['netcdf', 'nemsio']:
for i, tmpl in enumerate(file_templates.get('format', {}).get('fcst', [])):
if "sfc" in tmpl:
del file_templates[format]['fcst'][i]
It reduces the assumptions about order and the length of the list provided. It assures that we're only removing surface files, and not atm files accidentally. It also helps us if we want to expand our experiments run with offset GDAS files as ICs -- we can just add the sfc file to the GDAS fcst entry.
There was a problem hiding this comment.
@christinaholtNOAA, I tested your script, but sfc file was not removed from the list for LBCs: DEBUG: Looking for files like ['gfs.t{hh}z.atmf{fcst_hr:03d}.nemsio', 'gfs.t{hh}z.sfcf{fcst_hr:03d}.nemsio']. Any suggestion?
There was a problem hiding this comment.
I found a typo there: 'format' => format. I am testing it again.
There was a problem hiding this comment.
Apologies...I was just writing directly in the text box here and didn't do a full test.
There was a problem hiding this comment.
@christinaholtNOAA , the tests were completed successfully!! The part has been replaced with your script. Thank you!
| data: | ||
| GSMGFS: compath.py ${envir}/gsmgfs/${gsmgfs_ver}/gsmgfs.${PDY} | ||
| FV3GFS: compath.py ${envir}/gfs/${gfs_ver}/gfs.${PDY} | ||
| FV3GFS: compath.py ${envir}/gfs/${gfs_ver}/gfs.${PDY}/${cyc}/atmos |
There was a problem hiding this comment.
Why not use ${hh} here instead of ${cyc}? This is a bit misleading because this ${cyc} is not necessarily aligned with the cyc set for the RRFS config.
Then, there's no need to rename cyc in the ex-script.
There was a problem hiding this comment.
yes, I agree. it has been replaced with 'hh'.
| dd=${yyyymmddhh:6:2} | ||
| hh=${yyyymmddhh:8:2} | ||
|
|
||
| cyc=${hh} |
There was a problem hiding this comment.
This is potentially a problem. It resets an NCO required variable. This is the GFS cyc that we're looking for, which is a namespace collision with the existing cyc for RRFS.
|
@christinaholtNOAA, no problem at all. I understand. We are receiving so many emails from SRW :) I've replaced 'cyc' with 'hh'. However, I'd like to hear from you about the if-statement condition as I mentioned above. |
|
@chan-hoo I have gone ahead and relaunched the Jenkins test for Cheyenne with the GNU compiler. All tests successfully passed. Once @christinaholtNOAA has approved, I will go ahead and merge this PR. |
DESCRIPTION OF CHANGES:
file_namesofnemsio/fcstandnetcdf/fcstinparm/data_locations.yml.make_icstask readsgfs.t12z.atmanl.nemsio(netcdf)andgfs.t12z.sfcanl.nemsio(netcdf)make_icstask readsgfs.t06z.atmf006.nemsio(netcdf)andgfs.t06z.sfcf006.nemsio(netcdf). The flaganlis switched tofcst.gfs.t{hh}z.sfcf{fcst_hr:03d}is missing infcstofFV3GFS.gfs.t{hh}z.sfcf{fcst_hr:03d}should be removed formake_lbcsto avoid any unnecessary work with the files. Therefore, the flagics_or_lbcsis added toush/retrieve_data.py.parm/data_locations.yml._prod_): com_gfs_prod_gfs.{yyyymmdd}_{hh}.gfs_pgrb2.tar_v16.2_): com_gfs_v16.2_gfs.{yyyymmdd}_{hh}.gfs_pgrb2.tarType of change
TESTS CONDUCTED:
New WE2E test:
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
WE2E tests:
community_ensemble_2mems_stoch
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km
hera.intel
orion.intel
cheyenne.intel
cheyenne.gnu
gaea.intel
jet.intel
wcoss2.intel
NOAA Cloud (indicate which platform)
Jenkins
fundamental test suite
comprehensive tests (specify which if a subset was used)
ISSUE:
Fixes issue mentioned in #456
CHECKLIST
CONTRIBUTORS