Merge pull request #87 from noaa-oar-arl/feature/timestep

Adding new Feature/timestep
noaa-oar-arl · Sep 25, 2023 · 019aaac · 019aaac
2 parents 5c71949 + daaec36
commit 019aaac
Show file tree

Hide file tree

Showing 26 changed files with 12,631 additions and 4,049 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -39,6 +39,10 @@ jobs:
 
       - name: Check that default input is nc
         run: |
+          f90nml -g filenames -v file_vars="'input/gfs.t12z.20220701.sfcf000.canopy.nc'" \
+            input/namelist.canopy input/namelist.canopy
+          f90nml -g userdefs -v ntime=1 \
+            input/namelist.canopy input/namelist.canopy
           python -c '
           import f90nml
           with open("input/namelist.canopy") as f:

diff --git a/README.md b/README.md
@@ -123,7 +123,7 @@ Namelist Option : `file_vars`  Full name of input file (Supports either text or
 
 - See example file inputs for variables and format (`gfs.t12z.20220701.sfcf000.canopy.txt` or `gfs.t12z.20220701.sfcf000.canopy.nc`).  Example surface met/land/soil inputs are based on NOAA's UFS-GFSv16 inputs initialized on July 01, 2022 @ 12 UTC (forecast at hour 000). Other external inputs for canopy related and other calculated variables are from numerous sources.  See [Table 2](#table-2-canopy-app-required-input-variables) below for more information.  **Note:** The example GFSv16 domain has been cut to the southeast U.S. region only in this example for size/time constraints here.
 - Canopy-App assumes the NetCDF input files are in CF-Convention and test file is based on UFS-GFSv16; recommend using double or float for real variables.  Input data must be valid values.
-- Canopy-App can also be run with a single point of 1D input data in a text file (e.g. `input_variables_point.txt`).
+- Canopy-App can also be run with a single point of 1D input data in a text file (e.g. `point_file_20220701.sfcf000.txt`).
 
 The Canopy-App input data in [Table 2](#table-2-canopy-app-required-input-variables) below is based around NOAA's UFS operational Global Forecast System Version 16 (GFSv16) gridded met data, and is supplemented with external canopy data (from numerous sources) and other external and calculated input variables.  
 
@@ -187,6 +187,10 @@ You can also [generate global inputs using Python (see python/global_data_proces
 | Namelist Option | Namelist Description and Units                                                     |
 | --------------- | ---------------------------------------------------------------------------------- |
 | `infmt_opt`     | integer for choosing 1D text (= `1`)  or 2D NetCDF input file format (= `0`, default) |
+| `time_start`    | Start/initial time stamp in YYYY-MM-DD-HH:MM:SS.SSSS for simulation/observation inputs  |
+| `time_end`      | End time stamp in YYYY-MM-DD-HH:MM:SS.SSSS for simulation/observation inputs       |
+| `ntime`         | Number of time steps for simulation/observation inputs                             |
+| `time_intvl`    | Integer time interval for simulation/observation input time steps in seconds (default = 3600) |
 | `nlat`          | number of latitude cells (must match # of LAT in `file_vars` above)                |
 | `nlon`          | number of longitude cells (must match # of LON in `file_vars` above)               |
 | `modlays`       | number of model (below and above canopy) layers                                    |

diff --git a/input/gfs.t12z.20220630.sfcf023.canopy.nc b/input/gfs.t12z.20220630.sfcf023.canopy.nc
diff --git a/input/gfs.t12z.20220630.sfcf023.canopy.txt b/input/gfs.t12z.20220630.sfcf023.canopy.txt
diff --git a/input/gfs.t12z.20220701.sfcf000.canopy.nc b/input/gfs.t12z.20220701.sfcf000.canopy.nc
diff --git a/input/gfs.t12z.20220701.sfcf000.canopy.txt b/input/gfs.t12z.20220701.sfcf000.canopy.txt
diff --git a/input/gfs.t12z.20220701.sfcf001.canopy.nc b/input/gfs.t12z.20220701.sfcf001.canopy.nc
diff --git a/input/gfs.t12z.20220701.sfcf001.canopy.txt b/input/gfs.t12z.20220701.sfcf001.canopy.txt
diff --git a/input/input_variables_point.txt b/input/input_variables_point.txt
diff --git a/input/namelist.canopy b/input/namelist.canopy
@@ -1,11 +1,22 @@
 &FILENAMES
-  file_vars    = 'input/gfs.t12z.20220701.sfcf000.canopy.nc'
-!  file_vars    = 'input/gfs.t12z.20220701.sfcf000.canopy.txt'
-  file_out     = 'output/test_out_southeast_us'
+!2D Text and NCF Examples
+! Recommend set file_out prefix to initial 'YYYY-MM-DD-HH-MMSS_region_identifier'
+  file_vars    = 'input/gfs.t12z.20220630.sfcf023.canopy.nc' 'input/gfs.t12z.20220701.sfcf000.canopy.nc' 'input/gfs.t12z.20220701.sfcf001.canopy.nc'
+!  file_vars    = 'input/gfs.t12z.20220630.sfcf023.canopy.txt' 'input/gfs.t12z.20220701.sfcf000.canopy.txt' 'input/gfs.t12z.20220701.sfcf001.canopy.txt'
+  file_out     = 'output/2022-07-01-11-0000_southeast_us'
+
+!1D Point Example
+! Recommend set file_out prefix to initial 'YYYY-MM-DD-HH-MMSS_point_identifier'
+!   file_vars    = 'input/point_file_20220630.sfcf023.txt' 'input/point_file_20220701.sfcf000.txt' 'input/point_file_20220701.sfcf001.txt'
+!   file_out     = 'output/2022-07-01-11-0000_point'
 /
 
 &USERDEFS
   infmt_opt   =  0
+  time_start  = '2022-07-01-11:00:00.0000'
+  time_end    = '2022-07-01-13:00:00.0000'
+  ntime       =  3
+  time_intvl  =  3600
   nlat        =  43
   nlon        =  86
   modlays     =  100

diff --git a/input/point_file_20220630.sfcf023.txt b/input/point_file_20220630.sfcf023.txt
@@ -0,0 +1,2 @@
+lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
+34.97,270.00,7.0925,-0.0897,2.2551,0.7206,0.8677,4,0.2156,0.1949,0.0236,0.2500,41.9993,7.3748,10.00,4,100620.8125,3.7858,-15.6539,293.3365,293.8452,0.0148,102.2751,0.0000
diff --git a/input/point_file_20220701.sfcf000.txt b/input/point_file_20220701.sfcf000.txt
@@ -0,0 +1,2 @@
+lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
+34.97,270.00,7.0925,-0.1842,2.5479,0.7134,1.1774,4,0.2156,0.2770,0.2127,0.2500,-130.1519,0.0000,10.00,4,100694.0859,112.6779,14.5789,295.7195,295.4205,0.0156,162.2376,0.0000
diff --git a/input/point_file_20220701.sfcf001.txt b/input/point_file_20220701.sfcf001.txt
@@ -0,0 +1,2 @@
+lat,lon,fh,ugrd10m,vgrd10m,clu,lai,vtype,ffrac,fricv,csz,sfcr,mol,frp,href,sotyp,pressfc,dswrf,shtfl,tmpsfc,tmp2m,spfh2m,hpbl,prate_ave
+34.97,270.00,7.0925,0.0331,2.8131,0.7117,0.9407,4,0.2156,0.3292,0.4080,0.2500,-42.9638,0.0000,10.00,4,100735.8750,309.3322,74.7410,298.9967,297.7534,0.0160,313.8891,0.0000
diff --git a/python/README.md b/python/README.md
@@ -28,7 +28,7 @@ You can override default namelist options by passing a dictionary to `run()`.
 # Point setup
 ds = run(
     config={
-        "filenames": {"file_vars": "../input/input_variables_point.txt"},
+        "filenames": {"file_vars": "../input/point_file_20220701.sfcf000.txt"},
         "userdefs": {"infmt_opt": 1, "nlat": 1, "nlon": 1},
     },
 )
@@ -40,8 +40,9 @@ There are also helper functions for running sets of experiments with different n
 from canopy_app import config_cases, run_config_sens
 
 cases = config_cases(
-    file_vars="../input/input_variables_point.txt",
+    file_vars="../input/point_file_20220701.sfcf000.txt",
     infmt_opt=1,
+    ntime=1,
     nlat=1,
     nlon=1,
     z0ghc=[0.001, 0.01],

diff --git a/python/canopy_app.py b/python/canopy_app.py
@@ -29,14 +29,20 @@ def _load_default_config() -> f90nml.Namelist:
     with open(REPO / "input" / "namelist.canopy") as f:
         config = f90nml.read(f)
 
-    # Make input paths absolute in default config
-    for k, v in config["filenames"].items():
-        p0 = Path(v)
+    def as_abs(p_str: str) -> str:
+        p0 = Path(p_str)
         if not p0.is_absolute():
             p = REPO / p0
         else:
             p = p0
-        config["filenames"][k] = p.as_posix()
+        return p.as_posix()
+
+    # Make input paths absolute in default config
+    for k, v in config["filenames"].items():
+        if isinstance(v, list):
+            config["filenames"][k] = [as_abs(v_) for v_ in v]
+        else:
+            config["filenames"][k] = as_abs(v)
 
     return config
 
@@ -58,15 +64,15 @@ def out_and_back(p: Path, *, finally_: Callable | None = None):
 
 
 DEFAULT_POINT_INPUT = pd.read_csv(
-    REPO / "input" / "input_variables_point.txt", index_col=False
+    REPO / "input" / "point_file_20220701.sfcf000.txt", index_col=False
 )
 
 _TXT_STEM_SUFFS = {
-    "wind": "_output_canopy_wind",
-    "waf": "_output_waf",
-    "eddy": "_output_eddy_Kz",
-    "phot": "_output_phot",
-    "bio": "_output_bio",
+    "wind": "_canopy_wind",
+    "waf": "_waf",
+    "eddy": "_eddy",
+    "phot": "_phot",
+    "bio": "_bio",
 }
 
 
@@ -120,20 +126,48 @@ def run(
     ofp_stem = output_dir / "out"
     full_config["filenames"]["file_out"] = ofp_stem.relative_to(case_dir).as_posix()
 
-    # Check input file
-    ifp = Path(full_config["filenames"]["file_vars"])
-    if not ifp.is_file():
-        raise ValueError(f"Input file {ifp.as_posix()!r} does not exist")
-    if not ifp.is_absolute():
-        full_config["filenames"]["file_vars"] = ifp.resolve().as_posix()
-    if ifp.suffix in {".nc", ".nc4", ".ncf"}:  # consistent with sr `canopy_check_input`
-        nc_out = True
-        assert full_config["userdefs"]["infmt_opt"] == 0
-    elif ifp.suffix in {".txt"}:
-        nc_out = False
-        assert full_config["userdefs"]["infmt_opt"] == 1
+    # Check input file(s)
+    input_files_setting = full_config["filenames"]["file_vars"]
+    if isinstance(input_files_setting, list):
+        ifps_to_check = [Path(s) for s in input_files_setting]
     else:
-        raise ValueError(f"Unexpected input file type: {ifp.suffix}")
+        ifps_to_check = [Path(input_files_setting)]
+    nc_outs = []
+    for ifp in ifps_to_check:
+        if not ifp.is_file():
+            raise ValueError(f"Input file {ifp.as_posix()!r} does not exist")
+        if ifp.suffix in {
+            ".nc",
+            ".nc4",
+            ".ncf",
+        }:  # consistent with sr `canopy_check_input`
+            nc_out = True
+            assert full_config["userdefs"]["infmt_opt"] == 0
+        elif ifp.suffix in {".txt"}:
+            nc_out = False
+            assert full_config["userdefs"]["infmt_opt"] == 1
+        else:
+            raise ValueError(f"Unexpected input file extension: {ifp.suffix}")
+        nc_outs.append(nc_out)
+    if not len(set(nc_outs)) == 1:
+        raise ValueError(
+            f"Expected all input files to be of the same type (nc or txt). "
+            f"filenames.file_vars: {input_files_setting}."
+        )
+    nc_out = nc_outs[0]
+    if isinstance(input_files_setting, list):
+        abs_path_strs = []
+        for s in input_files_setting:
+            ifp = Path(s)
+            if not ifp.is_absolute():
+                abs_path_strs.append(ifp.resolve().as_posix())
+            else:
+                abs_path_strs.append(ifp.as_posix())
+        full_config["filenames"]["file_vars"] = abs_path_strs
+    else:
+        ifp = Path(input_files_setting)
+        if not ifp.is_absolute():
+            full_config["filenames"]["file_vars"] = ifp.resolve().as_posix()
 
     # Write namelist
     if verbose:
@@ -153,7 +187,17 @@ def run(
 
     # Load nc
     if nc_out:
-        ds0 = xr.open_dataset(ofp_stem.with_suffix(".nc"))
+        # Should be just one file, even if multiple output time steps
+        patt = f"{ofp_stem.name}*.nc"
+        cands = sorted(output_dir.glob(patt))
+        if not cands:
+            raise ValueError(
+                f"No matches for pattern {patt!r} in directory {output_dir.as_posix()!r}. "
+                f"Files present are: {[p.as_posix() for p in output_dir.glob('*')]}."
+            )
+        if len(cands) > 1:
+            print("Taking the first nc file only.")
+        ds0 = xr.open_dataset(cands[0], decode_times=True)
         ds = (
             ds0.rename_dims(grid_xt="x", grid_yt="y")
             .swap_dims(level="z")
@@ -180,22 +224,40 @@ def run(
                     f"warning: skipping {which!r} ({ifcan}) output since stem suffix unknown."
                 )
                 continue
-            df = read_txt(
-                ofp_stem.with_name(f"{ofp_stem.name}{stem_suff}").with_suffix(".txt")
-            )
+            # NOTE: Separate file for each time
+            patt = f"{ofp_stem.name}_*{stem_suff}.txt"
+            cands = sorted(output_dir.glob(patt))
+            if not cands:
+                raise ValueError(
+                    f"No matches for pattern {patt!r} in directory {output_dir.as_posix()!r}. "
+                    f"Files present are: {[p.as_posix() for p in output_dir.glob('*')]}."
+                )
+            if verbose:
+                print(f"detected output files for {ifcan}:")
+                print("\n".join(f"- {p.as_posix()}" for p in cands))
+            dfs_ifcan = []
+            for cand in cands:
+                df_t = read_txt(cand)
+                df_t["time"] = df_t.attrs["time"]
+                dfs_ifcan.append(df_t)
+            df = pd.concat(dfs_ifcan, ignore_index=True)
             df.attrs.update(which=which)
+            df.attrs.update(df_t.attrs)
             dfs.append(df)
 
         # Merge
         units: dict[str, str] = {}
         dss = []
         for df in dfs:
-            if {"lat", "lon", "height"}.issubset(df.columns):
-                ds_ = df.set_index(["height", "lat", "lon"]).to_xarray().squeeze()
-            elif {"lat", "lon"}.issubset(df.columns):
-                ds_ = df.set_index(["lat", "lon"]).to_xarray().squeeze()
+            if {"time", "lat", "lon", "height"}.issubset(df.columns):
+                ds_ = df.set_index(["time", "height", "lat", "lon"]).to_xarray().squeeze()
+            elif {"time", "lat", "lon"}.issubset(df.columns):
+                ds_ = df.set_index(["time", "lat", "lon"]).to_xarray().squeeze()
             else:
-                raise ValueError("Expected df to have columns 'lat', 'lon' [,'height'].")
+                raise ValueError(
+                    "Expected df to have columns 'time', 'lat', 'lon' [,'height']. "
+                    f"Got: {sorted(df)}."
+                )
             units.update(df.attrs["units"])
             for vn in ds_.data_vars:
                 assert isinstance(vn, str)
@@ -230,22 +292,30 @@ def read_txt(fp: Path) -> pd.DataFrame:
     with open(fp) as f:
         for i, line in enumerate(f):
             if i == 0:
+                pattern = r" *time stamp\: *([0-9\.\:\-]*)"
+                m = re.match(pattern, line)
+                if m is None:
+                    raise ValueError(
+                        f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
+                    )
+                time_stamp = pd.Timestamp(m.group(1))
+            elif i == 1:
                 pattern = r" *reference height, h\: *([0-9\.]*) m"
                 m = re.match(pattern, line)
                 if m is None:
                     raise ValueError(
                         f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
                     )
                 href = float(m.group(1))
-            elif i == 1:
+            elif i == 2:
                 pattern = r" *number of model layers\: *([0-9]*)"
                 m = re.match(pattern, line)
                 if m is None:
                     raise ValueError(
                         f"Unexpected file format. Line {i} failed to match regex {pattern!r}."
                     )
                 nlay = int(m.group(1))
-            elif i == 2:
+            elif i == 3:
                 # Column names (some with units)
                 heads = re.split(r" {2,}", line.strip())
                 names: list[str] = []
@@ -264,17 +334,25 @@ def read_txt(fp: Path) -> pd.DataFrame:
                 break
         else:
             raise ValueError(
-                "Unexpected file format. Expected 3 header lines followed by data."
+                "Unexpected file format. Expected 4 header lines followed by data."
             )
 
-    df = pd.read_csv(fp, index_col=False, skiprows=3, header=None, delimiter=r"\s+")
+    df = pd.read_csv(
+        fp,
+        index_col=False,
+        skiprows=4,
+        header=None,
+        delimiter=r"\s+",
+        dtype=np.float32,
+    )
     if len(names) != len(df.columns):
         raise RuntimeError(
             f"Unexpected file format. Detected columns names {names} ({len(names)}) "
             f"are of a different number than the loaded dataframe ({len(df.columns)})."
         )
+    df = df.replace(np.float32(-9.0e20), np.nan)  # fill value, defined in const mod
     df.columns = names  # type: ignore[assignment]
-    df.attrs.update(href=href, nlay=nlay, units=units)
+    df.attrs.update(href=href, nlay=nlay, units=units, time=time_stamp)
 
     return df
 
@@ -367,6 +445,16 @@ def config_cases(*, product: bool = False, **kwargs) -> list[dict[str, Any]]:
                     f"scalar, list[scalar], or list[list[scalar]], got: {type(v)}."
                 )
 
+            if k == "file_vars":
+                # Only support single time step runs for now
+                if type(v) is list:
+                    assert type(v[0]) is str
+                    mults[k] = v
+                else:
+                    assert type(v) is str
+                    sings[k] = v
+                continue
+
             if (np.isscalar(DEFAULT_CONFIG[_k_sec(k)][k]) and np.isscalar(v)) or (
                 type(DEFAULT_CONFIG[_k_sec(k)][k]) is list
                 and type(v) is list
@@ -414,8 +502,9 @@ def config_cases(*, product: bool = False, **kwargs) -> list[dict[str, Any]]:
 
 if __name__ == "__main__":
     cases = config_cases(
-        file_vars="../input/input_variables_point.txt",
+        file_vars="../input/point_file_20220701.sfcf000.txt",
         infmt_opt=1,
+        ntime=1,
         nlat=1,
         nlon=1,
         z0ghc=[0.001, 0.01],