Add initial version of ccpp_track_variables.py by mkavulich · Pull Request #419 · NCAR/ccpp-framework

mkavulich · 2021-11-19T00:35:26Z

Initial version of script for tracking which variables are modified by which schemes in a given suite.

This script takes in as input the following arguments:

A suite definition file
A path to scheme metadata files
A CCPP config file
A variable name

The script output changes based on the input:

If the given variable is found in any of the schemes for the given SDF, outputs a list of schemes (in their calling order, including duplicates) along with the variable intent for that scheme
If the given variable is not found, but a partial match is made to any of the found variables, the script outputs a list of schemes that contain variables with partial matches (along with those partial-match variables)
If the given variable is not found with no partial matches either, the script exits with an error indicating as such

This is a brand new script, so it does not change any existing interfaces. It does add a new variable and method to the Suite class: call_tree is a list of schemes in order (including duplicates) for the given Suite. The method make_call_tree is used to populate the call_tree list; it is not called by the write() method so must be called separately if desired.

Please provide feedback on the format and structure of the program!

I started with roughly the ccpp_prebuild script as a guide, but I am not tied to any particular style or philosophy here.

Testing:
test removed: None
unit tests: None
system tests: None
manual testing: Ran script with a variety of inputs, gives expected output. Examples below:

Running examples

No new python packages or other prerequisites are needed, so you can run on any platform that CCPP can already run on. It is also not necessary to build any code or run prebuild scripts; all you need is the ccpp-framework and ccpp-physics repositories, a model config file (this requirement can hopefully be removed in the future), and the xmls for the suites you wish to analyze.

Here is an example setup based on the ufs-weather-model on Hera:

git clone --recurse-submodules https://github.com/ufs-community/ufs-weather-model
cd ufs-weather-model/FV3/ccpp/framework/
git remote add mkavulich git@github.com:mkavulich/ccpp-framework
git fetch mkavulich
git checkout feature/track_variables_through_suite
cd ..

From here, you can run the following example commands:

$ framework/scripts/ccpp_track_variables.py --help
usage: ccpp_track_variables.py [-h] -s SDF -m METADATA_PATH -c CONFIG -v VARIABLE [--debug]

optional arguments:
  -h, --help            show this help message and exit
  -s SDF, --sdf SDF     suite definition file to parse
  -m METADATA_PATH, --metadata_path METADATA_PATH
                        path to CCPP scheme metadata files
  -c CONFIG, --config CONFIG
                        path to CCPP prebuild configuration file
  -v VARIABLE, --variable VARIABLE
                        variable to track through CCPP suite
  --debug               enable debugging output

Successful output: prints list of schemes that use the specified variable, along with the variable's intent

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v canopy_water_amount -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
For suite suites/suite_FV3_GFS_v16_noahmp.xml, the following schemes (in order) use the variable canopy_water_amount:
GFS_phys_time_vary_init (intent in)
noahmpdrv_run (intent inout)
noahmpdrv_run (intent inout)

Unknown variable: script exits with descriptive error message

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v volcano -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
ERROR: Variable volcano not found in any suites for sdf suites/suite_FV3_GFS_v16_noahmp.xml

Partial match for variable: outputs list of partial matches for each scheme

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v latent_heat -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
ERROR: Variable latent_heat not found in any suites for sdf suites/suite_FV3_GFS_v16_noahmp.xml

Did find partial matches that may be of interest:

In GFS_suite_interstitial_2_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']
In GFS_surface_generic_post_run found variable(s) ['surface_upward_potential_latent_heat_flux', 'soil_upward_latent_heat_flux', 'canopy_upward_latent_heat_flux', 'snow_deposition_sublimation_upward_latent_heat_flux', 'snow_freezing_rain_upward_latent_heat_flux', 'cumulative_soil_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_canopy_upward_latent_heat_flu_multiplied_by_timestep', 'cumulative_snow_deposition_sublimation_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_snow_freezing_rain_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_surface_upward_potential_latent_heat_flux_multiplied_by_timestep', 'multiplicative_tuning_parameter_for_reduced_latent_heat_flux_due_to_canopy_heat_storage']
In GFS_surface_composites_pre_run found variable(s) ['surface_upward_potential_latent_heat_flux_over_ice']
In GFS_surface_composites_post_run found variable(s) ['surface_upward_potential_latent_heat_flux', 'surface_upward_potential_latent_heat_flux_over_water', 'surface_upward_potential_latent_heat_flux_over_land', 'surface_upward_potential_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_water', 'kinematic_surface_upward_latent_heat_flux_over_land', 'kinematic_surface_upward_latent_heat_flux_over_ice', 'multiplicative_tuning_parameter_for_reduced_latent_heat_flux_due_to_canopy_heat_storage']
In sfc_nst_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'kinematic_surface_upward_latent_heat_flux_over_water', 'surface_upward_potential_latent_heat_flux_over_water']
In noahmpdrv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'kinematic_surface_upward_latent_heat_flux_over_land', 'surface_upward_potential_latent_heat_flux_over_land', 'soil_upward_latent_heat_flux', 'canopy_upward_latent_heat_flux', 'snow_deposition_sublimation_upward_latent_heat_flux', 'snow_freezing_rain_upward_latent_heat_flux']
In sfc_sice_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'surface_upward_potential_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_water']
In GFS_PBL_generic_post_run found variable(s) ['instantaneous_surface_upward_latent_heat_flux', 'cumulative_surface_upward_latent_heat_flux_for_coupling_multiplied_by_timestep', 'surface_upward_latent_heat_flux_for_coupling', 'cumulative_surface_upward_latent_heat_flux_for_diag_multiplied_by_timestep', 'instantaneous_surface_upward_latent_heat_flux_for_diag', 'latent_heat_of_vaporization_of_water_at_0c', 'surface_upward_latent_heat_flux_from_coupled_process', 'kinematic_surface_upward_latent_heat_flux_over_water']
In satmedmfvdifq_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'instantaneous_surface_upward_latent_heat_flux']
In samfdeepcnv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']
In samfshalcnv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']

…ing variable against standard names

…sh out argument parsing routine more

…ree for that Suite: in other words, a list of schemes in the order that they are called, including duplicates and subcycle loops

…h metadata files

…concept later)

…etadata converted properly (?)

… of schemes for a given SDF, and a dictionary that associates those schemes with their corresponding .meta files. Last step is to simply parse those .meta files for their variables!

…tadataTable objects for each scheme's .meta file; these objects can then be parsed to get all the information we need for the final step!

…subroutine name, and intent. If partial match, output list of partial matches

…le, variable name, and directory with metadata files, and outputs a list of schemes that use the given variable, along with their intent. Will convert into a more "graph-like" output later, along with other cleanup that is needed before review/testing by the group.

…ive a graphical representation of the calling tree Also some more cleanup: - Change default logging level to WARNING to get rid of unnecessary log messages (debug level remains unchanged) - Change more strings to f strings - Remove more leftover debug printouts - Raise exception if create_metadata_filename_dict fails (esp for no .meta files found) - Remove check_var function (might re-introduce in the future)

…ve duplicate calls

ligiabernardet · 2021-11-19T00:54:22Z

Thank you for this PR. This is a very helpful new addition to the CCPP. Going forward, we would like to have a visualization capability but having the script is the foundational step.
Do we have any mechanism to make sure ccpp_track_variables does not break when new development is submitted?

mkavulich · 2021-12-02T18:46:26Z

@ligiabernardet I haven't included any unit or regression tests yet, but I can make that a priority before opening this PR for review. I'll need to familiarize myself with the existing testing framework first

christinaholtNOAA

I've heard an awful lot about this tool. Thought I'd pop in and see what it was all about. :)

christinaholtNOAA · 2022-01-11T03:57:02Z

+    if not success:
+        logging.error(f'Parsing suite definition file {sdf} failed.')
+        success = False
+        return (success, suite)


In this type of error handling situation, it could be useful to use a try/except block to get more information about how the SDF parsing failed instead of only reporting that it didn't work.

This is also a nice example of how one who isn't regularly checking the value of success in the caller of this function could just power through the rest of main with an un-parsable Suite object.

This is a limitation of how errors are handled in the existing object definitions; since they handle errors through a "success" flag I don't want to try to mess with that, especially since this code will be superseded in the (hopefully) near-future and need updating regardless.

christinaholtNOAA · 2022-01-11T04:14:00Z

+        raise Exception('Call to import_config failed.')
+
+    # Variables defined by the host model
+    (success, _, _) = gather_variable_definitions(config['variable_definition_files'], config['typedefs_new_metadata'])


A comment about this magic could be helpful. I assume given its name, it will update the config data structure in place, but it's not obvious. You are also opting (I'm assuming intentionally) to not store the output, which throws a wrench in the obvious nature of this call.

I'll be honest, I'm not 100% sure exactly what this step is doing, but it is necessary for converting metadata from an old format. I've added a comment trying to be as informative as possible.

…es_through_suite

…ctures from capgen

…ppropriate locations

… and intent(inout)

mkavulich · 2022-02-24T06:19:21Z

After addressing some initial concerns (still a few to go) and updating for the most recent commits on main, this PR is ready for a proper review.

There are a few known issues/limitations right now, I think they can be addressed later on but I figured I'd point them out off the bat:

The extraneous statements of WARNING: Encountered closing statement "end type" without type name come from the metadata parser, and I believe are due to problems with the metadata itself.
The script can only be run from above the top-level framework directory in the ufs-weather-model examples as shown above; I believe this is a limitation specific to the ufs-weather-model ccpp_prebuild_config.py file and hopefully can be eliminated in the future when the config file is no longer needed.

…from add_argument calls

… remove re-assignment of arguments and just manage them with "args" object

…" function from ccpp_prebuild

… __init__

mkavulich · 2022-03-03T20:34:01Z

@christinaholtNOAA I believe I have resolved or responded to all your comments, please let me know if you have any further comments or questions.

christinaholtNOAA

This looks pretty nice. Just one remaining thing I'm really not sure about.

christinaholtNOAA · 2022-03-03T22:00:29Z

+        raise Exception('Call to import_config failed.')
+
+    # Variables defined by the host model; this call is necessary because it converts some old
+    # metadata formats so they can be used later in the script


I looked at gather_variable_definitions and it seems to be a function that doesn't act on any input data structures in place, or have any side effects. Perhaps you are calling this only for the data checking, but I really don't think that this call is doing anything to metadata that is used downstream, so the comment seems misleading. To have it convert metadata, I think you'd need to save its outputs in a data structure. Perhaps update one of the config entries? I'm not sure how that would work since I haven't dug into the details of the objects being acted on here.

climbfuji

Nice job, I've got a few questions and suggestions ...

climbfuji · 2022-03-04T03:11:42Z

+       with that scheme"""
+
+    metadata_dict = {}
+    scheme_filenames=glob.glob(os.path.join(metapath, "*.meta"))


Is there a reason why you don't get those from ccpp_prebuild_config.py?

I don't really like using glob, and also the schemes may not all sit in the same directory. For example, the gsl folks have a chemistry fork where the chemistry metadata is in a separate subdirectory chemistry in the ccpp-physics repo.

Regarding getting the metadata files from ccpp_prebuild_config.py, I was originally hoping I could eliminate the dependency on that file, and allow users to specify whichever metadata path(s) they would like.

Re: glob, is the issue of using glob specifically or that the current design does not allow users to specify multiple directories? If it is the former I will have to re-think how the metadata files are specified, but if it is the latter I can easily put in a loop to allow multiple directories to be specified on the command line. Let me know if you think the second approach is acceptable.

Re. 1: that's ok, even if the path(s) contain schemes that are not in use by this model (i.e. not in the prebuild config), the suites won't have those schemes.

Re. 2. There is a general aversion to glob, more so for cmake than for other tools, but nonetheless. If it can be avoided, fine, but if you need it than that's ok for me. Yes, you will need a capability to have one or more metadata paths if you don't use the prebuild config. Note though that you put the burden on the user to know where all the files are located that may be used in the suite.

climbfuji · 2022-03-04T03:18:47Z

+            success = False
+            return success
+
+        # Call tree of all schemes in SDF (with duplicates and subcycles)


The first 20 lines are identical with the first twenty lines of the parse routine. Can this be combined to avoid code duplication?

Maybe have an optional (or mandatory) argument create_call_tree for the parse routine, that switches between what is done in the second half of the parse routine?

This is a good idea. I initially had the naive idea to try to not modify any existing routines so that I wouldn't have to run as many tests, but it makes more sense the way you suggested. I have implemented this change, let me know how it looks

This looks great, thanks for making the change.

…hod of Suite class with an optional argument to avoid duplicating existing code while not affecting existing calls to parse method.

climbfuji · 2022-03-27T03:01:08Z

My main comments have been addressed, so I pass the baton to @gold2718 ...

gold2718

I have a couple of questions that would help me understand the code but I don't see anything that should hold up getting it in. Sorry this took so long.

gold2718 · 2022-04-12T23:06:45Z

+            the name of the scheme and the intent of the variable within that scheme"""
+
+    # Create a list of tuples that will hold the in/out information for each scheme
+    var_graph=[]


Is there a pattern where you use spaces around the = symbol and where you omit them?

Just regular sloppiness :) I've standardized this to use spaces except when in keyword arguments; I believe that's consistent with PEP8

gold2718 · 2022-04-12T23:41:44Z

+
+    # Loop through call tree, find matching filename for scheme via dictionary schemes_in_files,
+    # then parse that metadata file to find variable info
+    partial_matches = {}


Do you have examples of partial matches? How does tracking them help?

This case is for when a user inputs something like "latent_heat" as their variable, which matches multiple standard names (e.g. latent_heat_of_vaporization_of_water_at_0c, surface_upward_potential_latent_heat_flux, etc.). This makes it easier for users who might not know the exact standard name of the variable they are looking for, or for something like, for example, any variable containing the word "temperature".

The final section in my PR message ("Partial match for variable") gives an example of this.

grantfirl · 2022-05-05T18:27:46Z

@mkavulich Would you please update this to the latest main in anticipation of merging?

…es_through_suite

mkavulich added 19 commits November 12, 2021 04:20

Initial shell version of ccpp_track_variables.py

f2619d9

A few structure changes, add function for parsing arguments and check…

2f209a5

…ing variable against standard names

Change "xml" to "sdf" to better reflect other script conventions, fle…

5715006

…sh out argument parsing routine more

Starting to make use of existing objects

4160be2

Create new method and attribute for Suite class that creates a call t…

e9da740

…ree for that Suite: in other words, a list of schemes in the order that they are called, including duplicates and subcycle loops

Changing directions a little: user must provide path to directory wit…

31245e1

…h metadata files

Add logging routines; debug flag (not utilized yet)

83f733f

working on getting dictionary of schemes <--> meta filenames

6cb14c9

Read in config file instead of metadata_path (maybe can revisit this …

0ceb853

…concept later)

Need to add call to gather_variable_definitions in order to get new m…

a332867

…etadata converted properly (?)

Finally got to where I thought I should be! I now have a calling tree…

e64dbe3

… of schemes for a given SDF, and a dictionary that associates those schemes with their corresponding .meta files. Last step is to simply parse those .meta files for their variables!

Getting close to finished now; using parse_metadata_file to return Me…

64dad53

…tadataTable objects for each scheme's .meta file; these objects can then be parsed to get all the information we need for the final step!

Find if variable matches in subroutine, if so, output variable name, …

67229d2

…subroutine name, and intent. If partial match, output list of partial matches

Code cleanup with feedback from pylint

5f9e029

Remove unneeded debug changes

5e03756

Convert var_graph from Ordered Dictionary to list of tuples to preser…

6c7cfe9

…ve duplicate calls

Improve function descriptions, remove bits of draw routine until later

9fa53bb

christinaholtNOAA reviewed Jan 11, 2022

View reviewed changes

mkavulich added 7 commits February 3, 2022 11:45

Don't raise exception if partial matches found

d64a0fb

Merge remote-tracking branch 'origin/main' into feature/track_variabl…

c2012ba

…es_through_suite

Explicitly shebang to python3, adopt new environment and logging stru…

08c2027

…ctures from capgen

Remove unnecessary "success" variables and handle exceptions in the a…

566e485

…ppropriate locations

modify --> use; this script tracks variables that are both intent(in)…

9e6b59a

… and intent(inout)

Convert remaining old-format strings to f-strings

bc6963a

Incorporate reviewer suggestion for more robust directory name parsing

1193e44

mkavulich marked this pull request as ready for review February 24, 2022 06:14

mkavulich requested review from climbfuji, gold2718 and grantfirl as code owners February 24, 2022 06:14

mkavulich added 4 commits March 3, 2022 12:50

A few more fixes from pylint, remove redundant "action='store_true'" …

c7cc6e4

…from add_argument calls

Move parsing of command-line arguments to "parse_arguments" function,…

12d9078

… remove re-assignment of arguments and just manage them with "args" object

Add a bit more information about call to "gather_variable_definitions…

ee99e7e

…" function from ccpp_prebuild

Assign "call_tree" attribute as an empty list rather than Nonetype in…

b8654c4

… __init__

restore accidentally removed store_true action from debug argument

5f545d7

christinaholtNOAA reviewed Mar 3, 2022

View reviewed changes

climbfuji reviewed Mar 4, 2022

View reviewed changes

Suggestion from Dom: move creation of scheme call tree to "parse" met…

c27f06f

…hod of Suite class with an optional argument to avoid duplicating existing code while not affecting existing calls to parse method.

mkavulich mentioned this pull request Apr 6, 2022

Add labels to "end type" statements to eliminate warnings in CCPP pre… earth-system-radiation/rte-rrtmgp#164

Closed

gold2718 approved these changes Apr 12, 2022

View reviewed changes

Standardize spaces around = character

3f1dd5b

climbfuji approved these changes Apr 21, 2022

View reviewed changes

This was referenced May 5, 2022

NSSL ccpp-physics bugfixes and new ccpp-framework debugging feature NOAA-EMC/ufsatm#529

Merged

NSSL ccpp-physics bugfixes and new ccpp-framework debugging feature ufs-community/ufs-weather-model#1202

Merged

Merge remote-tracking branch 'origin/main' into feature/track_variabl…

948b620

…es_through_suite

grantfirl merged commit 1968d57 into NCAR:main May 11, 2022

Conversation

mkavulich commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Please provide feedback on the format and structure of the program!

Running examples

Successful output: prints list of schemes that use the specified variable, along with the variable's intent

Unknown variable: script exits with descriptive error message

Partial match for variable: outputs list of partial matches for each scheme

Uh oh!

ligiabernardet commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkavulich commented Dec 2, 2021

Uh oh!

christinaholtNOAA left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mkavulich commented Feb 24, 2022

Uh oh!

mkavulich commented Mar 3, 2022

Uh oh!

christinaholtNOAA left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

climbfuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

climbfuji commented Mar 27, 2022

Uh oh!

gold2718 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grantfirl commented May 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

mkavulich commented Nov 19, 2021 •

edited

Loading

ligiabernardet commented Nov 19, 2021 •

edited

Loading