Skip to content

Commit

Permalink
Merge pull request #87 from noaa-oar-arl/feature/timestep
Browse files Browse the repository at this point in the history
Adding new Feature/timestep
  • Loading branch information
drnimbusrain authored Sep 25, 2023
2 parents 5c71949 + daaec36 commit 019aaac
Show file tree
Hide file tree
Showing 26 changed files with 12,631 additions and 4,049 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ jobs:
- name: Check that default input is nc
run: |
f90nml -g filenames -v file_vars="'input/gfs.t12z.20220701.sfcf000.canopy.nc'" \
input/namelist.canopy input/namelist.canopy
f90nml -g userdefs -v ntime=1 \
input/namelist.canopy input/namelist.canopy
python -c '
import f90nml
with open("input/namelist.canopy") as f:
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Namelist Option : `file_vars` Full name of input file (Supports either text or

- See example file inputs for variables and format (`gfs.t12z.20220701.sfcf000.canopy.txt` or `gfs.t12z.20220701.sfcf000.canopy.nc`). Example surface met/land/soil inputs are based on NOAA's UFS-GFSv16 inputs initialized on July 01, 2022 @ 12 UTC (forecast at hour 000). Other external inputs for canopy related and other calculated variables are from numerous sources. See [Table 2](#table-2-canopy-app-required-input-variables) below for more information. **Note:** The example GFSv16 domain has been cut to the southeast U.S. region only in this example for size/time constraints here.
- Canopy-App assumes the NetCDF input files are in CF-Convention and test file is based on UFS-GFSv16; recommend using double or float for real variables. Input data must be valid values.
- Canopy-App can also be run with a single point of 1D input data in a text file (e.g. `input_variables_point.txt`).
- Canopy-App can also be run with a single point of 1D input data in a text file (e.g. `point_file_20220701.sfcf000.txt`).

The Canopy-App input data in [Table 2](#table-2-canopy-app-required-input-variables) below is based around NOAA's UFS operational Global Forecast System Version 16 (GFSv16) gridded met data, and is supplemented with external canopy data (from numerous sources) and other external and calculated input variables.

Expand Down Expand Up @@ -187,6 +187,10 @@ You can also [generate global inputs using Python (see python/global_data_proces
| Namelist Option | Namelist Description and Units |
| --------------- | ---------------------------------------------------------------------------------- |
| `infmt_opt` | integer for choosing 1D text (= `1`) or 2D NetCDF input file format (= `0`, default) |
| `time_start` | Start/initial time stamp in YYYY-MM-DD-HH:MM:SS.SSSS for simulation/observation inputs |
| `time_end` | End time stamp in YYYY-MM-DD-HH:MM:SS.SSSS for simulation/observation inputs |
| `ntime` | Number of time steps for simulation/observation inputs |
| `time_intvl` | Integer time interval for simulation/observation input time steps in seconds (default = 3600) |
| `nlat` | number of latitude cells (must match # of LAT in `file_vars` above) |
| `nlon` | number of longitude cells (must match # of LON in `file_vars` above) |
| `modlays` | number of model (below and above canopy) layers |
Expand Down
Binary file added input/gfs.t12z.20220630.sfcf023.canopy.nc
Binary file not shown.
3,699 changes: 3,699 additions & 0 deletions input/gfs.t12z.20220630.sfcf023.canopy.txt

Large diffs are not rendered by default.

Binary file modified input/gfs.t12z.20220701.sfcf000.canopy.nc
Binary file not shown.
7,396 changes: 3,698 additions & 3,698 deletions input/gfs.t12z.20220701.sfcf000.canopy.txt

Large diffs are not rendered by default.

Binary file added input/gfs.t12z.20220701.sfcf001.canopy.nc
Binary file not shown.
3,699 changes: 3,699 additions & 0 deletions input/gfs.t12z.20220701.sfcf001.canopy.txt

Large diffs are not rendered by default.

2 changes: 0 additions & 2 deletions input/input_variables_point.txt

This file was deleted.

17 changes: 14 additions & 3 deletions input/namelist.canopy
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
&FILENAMES
file_vars = 'input/gfs.t12z.20220701.sfcf000.canopy.nc'
! file_vars = 'input/gfs.t12z.20220701.sfcf000.canopy.txt'
file_out = 'output/test_out_southeast_us'
!2D Text and NCF Examples
! Recommend set file_out prefix to initial 'YYYY-MM-DD-HH-MMSS_region_identifier'
file_vars = 'input/gfs.t12z.20220630.sfcf023.canopy.nc' 'input/gfs.t12z.20220701.sfcf000.canopy.nc' 'input/gfs.t12z.20220701.sfcf001.canopy.nc'
! file_vars = 'input/gfs.t12z.20220630.sfcf023.canopy.txt' 'input/gfs.t12z.20220701.sfcf000.canopy.txt' 'input/gfs.t12z.20220701.sfcf001.canopy.txt'
file_out = 'output/2022-07-01-11-0000_southeast_us'

!1D Point Example
! Recommend set file_out prefix to initial 'YYYY-MM-DD-HH-MMSS_point_identifier'
! file_vars = 'input/point_file_20220630.sfcf023.txt' 'input/point_file_20220701.sfcf000.txt' 'input/point_file_20220701.sfcf001.txt'
! file_out = 'output/2022-07-01-11-0000_point'
/

&USERDEFS
infmt_opt = 0
time_start = '2022-07-01-11:00:00.0000'
time_end = '2022-07-01-13:00:00.0000'
ntime = 3
time_intvl = 3600
nlat = 43
nlon = 86
modlays = 100
Expand Down
2 changes: 2 additions & 0 deletions input/point_file_20220630.sfcf023.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
34.97,270.00,7.0925,-0.0897,2.2551,0.7206,0.8677,4,0.2156,0.1949,0.0236,0.2500,41.9993,7.3748,10.00,4,100620.8125,3.7858,-15.6539,293.3365,293.8452,0.0148,102.2751,0.0000
2 changes: 2 additions & 0 deletions input/point_file_20220701.sfcf000.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
34.97,270.00,7.0925,-0.1842,2.5479,0.7134,1.1774,4,0.2156,0.2770,0.2127,0.2500,-130.1519,0.0000,10.00,4,100694.0859,112.6779,14.5789,295.7195,295.4205,0.0156,162.2376,0.0000
2 changes: 2 additions & 0 deletions input/point_file_20220701.sfcf001.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
34.97,270.00,7.0925,0.0331,2.8131,0.7117,0.9407,4,0.2156,0.3292,0.4080,0.2500,-42.9638,0.0000,10.00,4,100735.8750,309.3322,74.7410,298.9967,297.7534,0.0160,313.8891,0.0000
5 changes: 3 additions & 2 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ You can override default namelist options by passing a dictionary to `run()`.
# Point setup
ds = run(
config={
"filenames": {"file_vars": "../input/input_variables_point.txt"},
"filenames": {"file_vars": "../input/point_file_20220701.sfcf000.txt"},
"userdefs": {"infmt_opt": 1, "nlat": 1, "nlon": 1},
},
)
Expand All @@ -40,8 +40,9 @@ There are also helper functions for running sets of experiments with different n
from canopy_app import config_cases, run_config_sens

cases = config_cases(
file_vars="../input/input_variables_point.txt",
file_vars="../input/point_file_20220701.sfcf000.txt",
infmt_opt=1,
ntime=1,
nlat=1,
nlon=1,
z0ghc=[0.001, 0.01],
Expand Down
165 changes: 127 additions & 38 deletions python/canopy_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,20 @@ def _load_default_config() -> f90nml.Namelist:
with open(REPO / "input" / "namelist.canopy") as f:
config = f90nml.read(f)

# Make input paths absolute in default config
for k, v in config["filenames"].items():
p0 = Path(v)
def as_abs(p_str: str) -> str:
p0 = Path(p_str)
if not p0.is_absolute():
p = REPO / p0
else:
p = p0
config["filenames"][k] = p.as_posix()
return p.as_posix()

# Make input paths absolute in default config
for k, v in config["filenames"].items():
if isinstance(v, list):
config["filenames"][k] = [as_abs(v_) for v_ in v]
else:
config["filenames"][k] = as_abs(v)

return config

Expand All @@ -58,15 +64,15 @@ def out_and_back(p: Path, *, finally_: Callable | None = None):


DEFAULT_POINT_INPUT = pd.read_csv(
REPO / "input" / "input_variables_point.txt", index_col=False
REPO / "input" / "point_file_20220701.sfcf000.txt", index_col=False
)

_TXT_STEM_SUFFS = {
"wind": "_output_canopy_wind",
"waf": "_output_waf",
"eddy": "_output_eddy_Kz",
"phot": "_output_phot",
"bio": "_output_bio",
"wind": "_canopy_wind",
"waf": "_waf",
"eddy": "_eddy",
"phot": "_phot",
"bio": "_bio",
}


Expand Down Expand Up @@ -120,20 +126,48 @@ def run(
ofp_stem = output_dir / "out"
full_config["filenames"]["file_out"] = ofp_stem.relative_to(case_dir).as_posix()

# Check input file
ifp = Path(full_config["filenames"]["file_vars"])
if not ifp.is_file():
raise ValueError(f"Input file {ifp.as_posix()!r} does not exist")
if not ifp.is_absolute():
full_config["filenames"]["file_vars"] = ifp.resolve().as_posix()
if ifp.suffix in {".nc", ".nc4", ".ncf"}: # consistent with sr `canopy_check_input`
nc_out = True
assert full_config["userdefs"]["infmt_opt"] == 0
elif ifp.suffix in {".txt"}:
nc_out = False
assert full_config["userdefs"]["infmt_opt"] == 1
# Check input file(s)
input_files_setting = full_config["filenames"]["file_vars"]
if isinstance(input_files_setting, list):
ifps_to_check = [Path(s) for s in input_files_setting]
else:
raise ValueError(f"Unexpected input file type: {ifp.suffix}")
ifps_to_check = [Path(input_files_setting)]
nc_outs = []
for ifp in ifps_to_check:
if not ifp.is_file():
raise ValueError(f"Input file {ifp.as_posix()!r} does not exist")
if ifp.suffix in {
".nc",
".nc4",
".ncf",
}: # consistent with sr `canopy_check_input`
nc_out = True
assert full_config["userdefs"]["infmt_opt"] == 0
elif ifp.suffix in {".txt"}:
nc_out = False
assert full_config["userdefs"]["infmt_opt"] == 1
else:
raise ValueError(f"Unexpected input file extension: {ifp.suffix}")
nc_outs.append(nc_out)
if not len(set(nc_outs)) == 1:
raise ValueError(
f"Expected all input files to be of the same type (nc or txt). "
f"filenames.file_vars: {input_files_setting}."
)
nc_out = nc_outs[0]
if isinstance(input_files_setting, list):
abs_path_strs = []
for s in input_files_setting:
ifp = Path(s)
if not ifp.is_absolute():
abs_path_strs.append(ifp.resolve().as_posix())
else:
abs_path_strs.append(ifp.as_posix())
full_config["filenames"]["file_vars"] = abs_path_strs
else:
ifp = Path(input_files_setting)
if not ifp.is_absolute():
full_config["filenames"]["file_vars"] = ifp.resolve().as_posix()

# Write namelist
if verbose:
Expand All @@ -153,7 +187,17 @@ def run(

# Load nc
if nc_out:
ds0 = xr.open_dataset(ofp_stem.with_suffix(".nc"))
# Should be just one file, even if multiple output time steps
patt = f"{ofp_stem.name}*.nc"
cands = sorted(output_dir.glob(patt))
if not cands:
raise ValueError(
f"No matches for pattern {patt!r} in directory {output_dir.as_posix()!r}. "
f"Files present are: {[p.as_posix() for p in output_dir.glob('*')]}."
)
if len(cands) > 1:
print("Taking the first nc file only.")
ds0 = xr.open_dataset(cands[0], decode_times=True)
ds = (
ds0.rename_dims(grid_xt="x", grid_yt="y")
.swap_dims(level="z")
Expand All @@ -180,22 +224,40 @@ def run(
f"warning: skipping {which!r} ({ifcan}) output since stem suffix unknown."
)
continue
df = read_txt(
ofp_stem.with_name(f"{ofp_stem.name}{stem_suff}").with_suffix(".txt")
)
# NOTE: Separate file for each time
patt = f"{ofp_stem.name}_*{stem_suff}.txt"
cands = sorted(output_dir.glob(patt))
if not cands:
raise ValueError(
f"No matches for pattern {patt!r} in directory {output_dir.as_posix()!r}. "
f"Files present are: {[p.as_posix() for p in output_dir.glob('*')]}."
)
if verbose:
print(f"detected output files for {ifcan}:")
print("\n".join(f"- {p.as_posix()}" for p in cands))
dfs_ifcan = []
for cand in cands:
df_t = read_txt(cand)
df_t["time"] = df_t.attrs["time"]
dfs_ifcan.append(df_t)
df = pd.concat(dfs_ifcan, ignore_index=True)
df.attrs.update(which=which)
df.attrs.update(df_t.attrs)
dfs.append(df)

# Merge
units: dict[str, str] = {}
dss = []
for df in dfs:
if {"lat", "lon", "height"}.issubset(df.columns):
ds_ = df.set_index(["height", "lat", "lon"]).to_xarray().squeeze()
elif {"lat", "lon"}.issubset(df.columns):
ds_ = df.set_index(["lat", "lon"]).to_xarray().squeeze()
if {"time", "lat", "lon", "height"}.issubset(df.columns):
ds_ = df.set_index(["time", "height", "lat", "lon"]).to_xarray().squeeze()
elif {"time", "lat", "lon"}.issubset(df.columns):
ds_ = df.set_index(["time", "lat", "lon"]).to_xarray().squeeze()
else:
raise ValueError("Expected df to have columns 'lat', 'lon' [,'height'].")
raise ValueError(
"Expected df to have columns 'time', 'lat', 'lon' [,'height']. "
f"Got: {sorted(df)}."
)
units.update(df.attrs["units"])
for vn in ds_.data_vars:
assert isinstance(vn, str)
Expand Down Expand Up @@ -230,22 +292,30 @@ def read_txt(fp: Path) -> pd.DataFrame:
with open(fp) as f:
for i, line in enumerate(f):
if i == 0:
pattern = r" *time stamp\: *([0-9\.\:\-]*)"
m = re.match(pattern, line)
if m is None:
raise ValueError(
f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
)
time_stamp = pd.Timestamp(m.group(1))
elif i == 1:
pattern = r" *reference height, h\: *([0-9\.]*) m"
m = re.match(pattern, line)
if m is None:
raise ValueError(
f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
)
href = float(m.group(1))
elif i == 1:
elif i == 2:
pattern = r" *number of model layers\: *([0-9]*)"
m = re.match(pattern, line)
if m is None:
raise ValueError(
f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
)
nlay = int(m.group(1))
elif i == 2:
elif i == 3:
# Column names (some with units)
heads = re.split(r" {2,}", line.strip())
names: list[str] = []
Expand All @@ -264,17 +334,25 @@ def read_txt(fp: Path) -> pd.DataFrame:
break
else:
raise ValueError(
"Unexpected file format. Expected 3 header lines followed by data."
"Unexpected file format. Expected 4 header lines followed by data."
)

df = pd.read_csv(fp, index_col=False, skiprows=3, header=None, delimiter=r"\s+")
df = pd.read_csv(
fp,
index_col=False,
skiprows=4,
header=None,
delimiter=r"\s+",
dtype=np.float32,
)
if len(names) != len(df.columns):
raise RuntimeError(
f"Unexpected file format. Detected columns names {names} ({len(names)}) "
f"are of a different number than the loaded dataframe ({len(df.columns)})."
)
df = df.replace(np.float32(-9.0e20), np.nan) # fill value, defined in const mod
df.columns = names # type: ignore[assignment]
df.attrs.update(href=href, nlay=nlay, units=units)
df.attrs.update(href=href, nlay=nlay, units=units, time=time_stamp)

return df

Expand Down Expand Up @@ -367,6 +445,16 @@ def config_cases(*, product: bool = False, **kwargs) -> list[dict[str, Any]]:
f"scalar, list[scalar], or list[list[scalar]], got: {type(v)}."
)

if k == "file_vars":
# Only support single time step runs for now
if type(v) is list:
assert type(v[0]) is str
mults[k] = v
else:
assert type(v) is str
sings[k] = v
continue

if (np.isscalar(DEFAULT_CONFIG[_k_sec(k)][k]) and np.isscalar(v)) or (
type(DEFAULT_CONFIG[_k_sec(k)][k]) is list
and type(v) is list
Expand Down Expand Up @@ -414,8 +502,9 @@ def config_cases(*, product: bool = False, **kwargs) -> list[dict[str, Any]]:

if __name__ == "__main__":
cases = config_cases(
file_vars="../input/input_variables_point.txt",
file_vars="../input/point_file_20220701.sfcf000.txt",
infmt_opt=1,
ntime=1,
nlat=1,
nlon=1,
z0ghc=[0.001, 0.01],
Expand Down
Loading

0 comments on commit 019aaac

Please sign in to comment.