Python package to manage the database backend to support the UFS-RNR workflow experiment meta data and analyses data tracking and reporting. The score-db database uses the PostgreSQL database system which is hosted by AWS on an RDS instance (currently administered by PSL).
Tools and utilities to catalogue and evaluate observation files.
- Clone the score-db and score-hv repos. Both of these repos are required. score-db is responsible for managing the backend database which stores diagnostics data related to UFS-RNR experiments. score-db has several APIs to help users insert and collect data from the score-db database. score-hv, on the other hand, is responsible for harvesting data from the diagnostic
$ git clone https://github.com/noaa-psl/score-db.git
$ git clone https://github.com/noaa-psl/score-hv.git
- For testing and development, we recommend creating a new python environment (e.g., using mamba as shown below or other options such as conda). To install the required dependencies into a new environment using the micromamba command-line interface, run the following after installing mamba/micromamba:
$ micromamba create -f environment.yml; micromamba activate score-db-default-env
- Install score-hv using pip. From the score-hv directory, run the following:
$ pip install . # default installation into active environment
- Depending on your use case, you can install score-db using one of three methods using pip,
$ pip install . # default installation into active environment`
$ pip install -e . # editable installation into active enviroment, useful for development`
$ pip install -t [TARGET_DIR] --upgrade . # target installation into TARGET_DIR, useful for deploying for cylc workflows (see https://cylc.github.io/cylc-doc/stable/html/tutorial/runtime/introduction.html#id3)`
- Configure the PostgreSQL credentials and settings for the score-db by
creating a
.env
file and by inserting the text shown below (note: this text is taken straight from the file.env_example
). You will need to acquire the database password from the administrator (Sergey Frolov).
from the score-db repo top level, cat the example file
$ cat .env_example
SCORE_POSTGRESQL_DB_NAME = 'rnr_scores_db'
SCORE_POSTGRESQL_DB_PASSWORD = '[score_db_password]'
SCORE_POSTGRESQL_DB_USERNAME = 'ufsrnr_user'
SCORE_POSTGRESQL_DB_ENDPOINT = 'psl-score-db.cmpwyouptct1.us-east-2.rds.amazonaws.com'
SCORE_POSTGRESQL_DB_PORT = 5432
Each of the APIs is structured in a similar way and are meant to be accessible via either a direct library call or via a command line call with a yaml file or a python dictionary as the only accepted arguments.
Every action made through the API requires a name and method. Valid methods are GET or PUT. The name must be a registered value in the db_request_regsitry. Currently valid registry options are:
'region' : 'Add or get regions'
'experiment' : 'Add, get, or update experiment registration data'
'expt_metrics' : 'Add or get experiment metrics data'
'metric_types' : 'Add, get, or update metric types'
'harvest_innov_stats' : 'Gather and store innovation statistics from diagnostics files'
'plot_innov_stats' : 'Plot innovation statistics'
'file_types' : 'Add, get, or update file types'
'storage_locations' : 'Add, get, or update storage locations'
'expt_file_counts' : 'Add or get experiment file counts'
'harvest_metrics' : 'Harvest and store metrics'
'array_metric_types' : 'Add or get or update array metrics types'
'sat_meta' : 'Add or get or update sat meta data'
'expt_array_metrics' : 'Add or get experiment array metrics data'
'instrument_meta' : 'Add or get or update instrument meta data'
Example request dictionaries for each registry option are provided in the Appendix.
This API helps the user register an experiment's meta data into the score-db
experiments
table.
- Setup the experiment registration yaml file.
example experiment registration yaml
---
db_request_name: experiment
method: PUT
body:
name: EXAMPLE_EXPERIMENT_C96L64.UFSRNR.GSI_3DVAR.012016
datestr_format: "%Y-%m-%d %H:%M:%S"
cycle_start: '2016-01-01 00:00:00'
cycle_stop: '2016-01-31 18:00:00'
owner_id: [email protected]
group_id: gsienkf
experiment_type: C96L64.UFSRNR.GSI_3DVAR.012016
platform: pw_awv1
wallclock_start: '2021-07-22 09:22:05'
wallclock_end: '2021-07-24 05:31:14'
description: '{"unstructured_json": "data"}'
- Use the command line to execute the API and register the experiment.
$ python3 src/score_db/score_db_base.py tests/experiment_registration__valid.yaml
- Call the experiment registration task from a python script.
import json
import experiments as expts
from experiments import ExperimentData, Experiment, ExperimentRequest
def register_experiment():
with open(EXPERIMENT_CONFIG_FILE, 'r') as config_file:
data=config_file.read()
description = json.loads(data)
request_dict = {
'db_request_name': 'experiment',
'method': 'PUT',
'body': {
'name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'datestr_format': '%Y-%m-%d %H:%M:%S',
'cycle_start': '2016-01-01 00:00:00',
'cycle_stop': '2016-01-31 18:00:00',
'owner_id': '[email protected]',
'group_id': 'gsienkf',
'experiment_type': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'platform': 'pw_awv1',
'wallclock_start': '2021-07-22 09:22:05',
'wallclock_end': '2021-07-24 05:31:14',
'description': json.dumps(description)
}
}
er = ExperimentRequest(request_dict)
response = er.submit()
return response
response = register_experiment()
Note: the EXPERIMENT_CONFIG_FILE
is meant to contain a json version of the
experiment configuration. PostgreSQL allows searching through unstructured
JSON (JSON Functions and Operators and PostgreSQL JSON query tutorial should
help the user understand how to query data stored in the json description
column)
This API will collect innovation statistics diagnostic data produced by the UFS-RNR workflow. This diagnostics data consists of bias, rmsd, and count statistics of temperature, specific humidity, and uv wind and is stored in netcdf files. The harvester engine pulls this data out of the netcdf files and inserts it into the score-db database. The user must know the experiment name and start wallclock time so that the API can associate the statistics data with the correct registered experiment. Note: the experiment must be registered prior to any attempt to harvest the data. Additionally, the netcdf files must also already be downloaded and stored locally.
- Setup the yaml input argument.
example of the input yaml (one can cut and paste this yaml data and)
date_range:
datetime_str: '%Y-%m-%d %H:%M:%S'
end: '2016-01-31 18:00:00'
start: '2016-01-01 00:00:00'
db_request_name: harvest_innov_stats
expt_name: C96L64.UFSRNR.GSI_3DVAR.012016
expt_wallclk_strt: '2021-07-22 09:22:05'
files:
- cycles:
- 0
- 21600
- 43200
- 64800
elevation_unit: plevs
filename: innov_stats.metric.%Y%m%d%H.nc
filepath: /home/slawrence/Development/experiment_data/uncoupled_c96_3dvar/gsi/
harvester: innov_stats_netcdf
metrics:
- temperature
- spechumid
- uvwind
stats:
- bias
- count
- rmsd
output_format: tuples_list
- Call the harvester engine on the command line (see below). Both command
lines require mostly the same syntax
python3 src/score_db/score_db_base.py [command argument yaml]
$ python3 src/score_db/score_db_base.py tests/netcdf_harvester_config__valid.yaml
or one could issue the command from within a python script.
from score_db.harvest_innov_stats import HarvestInnovStatsRequest
harvester_control_dict = {
"date_range": {
"datetime_str": "%Y-%m-%d %H:%M:%S",
"end": "2016-01-31 18:00:00",
"start": "2016-01-01 00:00:00"
},
"db_request_name": "harvest_innov_stats",
"expt_name": "C96L64.UFSRNR.GSI_3DVAR.012016",
"expt_wallclk_strt": "2021-07-22 09:22:05",
"files": [
{
"cycles": [
0,
21600,
43200,
64800
],
"elevation_unit": "plevs",
"filename": "innov_stats.metric.%Y%m%d%H.nc",
"filepath": "/home/slawrence/Development/experiment_data/uncoupled_c96_3dvar/gsi/",
"harvester": "innov_stats_netcdf",
"metrics": [
"temperature",
"spechumid",
"uvwind"
],
"stats": [
"bias",
"count",
"rmsd"
]
}
],
"output_format": "tuples_list"
}
hc = HarvestInnovStatsRequest(harvester_control_dict)
hc.submit()
This API will collect innovation statistics produced by one or more UFS-RNR
experiments. The statistics must already be inserted into the expt_metrics
table and the experiments must already be registered.
- Setup the yaml input argument.
example of the input yaml (one can cut and paste this yaml data and)
date_range:
datetime_str: '%Y-%m-%d %H:%M:%S'
end: '2016-01-31 18:00:00'
start: '2016-01-01 00:00:00'
db_request_name: plot_innov_stats
experiments:
- graph_color: blue
graph_label: C96L64 GSI Uncoupled 3DVAR Experiment
name: C96L64.UFSRNR.GSI_3DVAR.012016
wallclock_start: '2021-07-22 09:22:05'
- graph_color: red
graph_label: C96L64 GSI and SOCA Coupled 3DVAR Experiment
name: C96L64.UFSRNR.GSI_SOCA_3DVAR.012016
wallclock_start: '2021-07-24 11:31:16'
fig_base_fn: C96L64_GSI_3DVAR_VS_GSI_SOCA_3DVAR
stat_groups:
- cycles:
- 0
- 21600
- 43200
- 64800
elevation_unit: plevs
metrics:
- temperature
- spechumid
- uvwind
regions:
- equatorial
- global
- north_hemis
- south_hemis
- tropics
stat_group_frmt_str: innov_stats_{metric}_{stat}
stats:
- bias
- rmsd
work_dir: /absolute/path/to/desired/figure/location
- Call the harvester engine on the command line (see below). Both command
lines require mostly the same syntax
python3 src/score_db/score_db_base.py [command argument yaml]
$ python3 src/score_db/score_db_base.py tests/plot_innov_stats_config__valid.yaml
or one could issue the command from within a python script.
from score_db.plot_innov_stats import PlotInnovStatsRequest
def plot_innov_stats_for_date_range():
plot_control_dict = {
'db_request_name': 'plot_innov_stats',
'date_range': {
'datetime_str': '%Y-%m-%d %H:%M:%S',
'start': '2016-01-01 00:00:00',
'end': '2016-01-31 18:00:00'
},
'experiments': [
{
'name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'wallclock_start': '2021-07-22 09:22:05',
'graph_label': 'C96L64 GSI Uncoupled 3DVAR Experiment',
'graph_color': 'blue'
},
{
'name': 'C96L64.UFSRNR.GSI_SOCA_3DVAR.012016',
'wallclock_start': '2021-07-24 11:31:16',
'graph_label': 'C96L64 GSI and SOCA Coupled 3DVAR Experiment',
'graph_color': 'red'
}
],
'stat_groups': [
{
'cycles': CYCLES,
'stat_group_frmt_str': 'innov_stats_{metric}_{stat}',
'metrics': ['temperature','spechumid','uvwind'],
'stats': ['bias', 'rmsd'],
'elevation_unit': 'plevs',
'regions': [
'equatorial',
'global',
'north_hemis',
'south_hemis',
'tropics'
]
}
],
'work_dir': '/absolute/path/to/desired/figure/location',
'fig_base_fn': 'C96L64_GSI_3DVAR_VS_GSI_SOCA_3DVAR'
}
plot_request = PlotInnovStatsRequest(plot_control_dict)
plot_request.submit()
plot_innov_stats_for_date_range()
The score-db backend package is comprised of four tables and several APIs
which are meant to help the user insert and select data from the score-db
database. Instructions for API use are shown below. This section describes
what information is stored in each table.
cmd_results
.
experiments
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(128), nullable=False)
cycle_start = Column(DateTime, nullable=False)
cycle_stop = Column(DateTime, nullable=False)
owner_id = Column(String(64), nullable=False)
group_id = Column(String(16))
experiment_type = Column(String(64))
platform = Column(String(16), nullable=False)
wallclock_start = Column(DateTime, nullable=False)
wallclock_end = Column(DateTime)
description = Column(JSONB(astext_type=sa.Text()), nullable=True)
created_at = Column(DateTime)
updated_at = Column(DateTime)
regions
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(79), nullable=False)
min_lat = Column(Float, nullable=False) # degrees north of equator [-90, 90)
max_lat = Column(Float, nullable=False) # degrees north of equator (-90, 90]
east_lon = Column(Float, nullable=False) # degrees east of prime meridian [0, 360)
west_lon = Column(Float, nullable=False) # degrees east of prime meridian (0, 360]
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
metrics = relationship('ExperimentMetric', back_populates='region')
metric_types
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(128), nullable=False)
long_name = Column(String(128))
measurement_type = Column(String(64), nullable=False)
measurement_units = Column(String(64))
stat_type = Column(String(64))
description = Column(JSONB(astext_type=sa.Text()), nullable=True)
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
metrics = relationship('ExperimentMetric', back_populates='metric_type')
expt_metrics
id = Column(Integer, primary_key=True, autoincrement=True)
experiment_id = Column(Integer, ForeignKey('experiments.id'))
metric_type_id = Column(Integer, ForeignKey('metric_types.id'))
region_id = Column(Integer, ForeignKey('regions.id'))
elevation = Column(Float, nullable=False)
elevation_unit = Column(String(32))
value = Column(Float)
time_valid = Column(DateTime, nullable=False)
forecast_hour = Column(Float)
ensemble_member = Column(Integer)
created_at = Column(DateTime, default=datetime.utcnow())
experiment = relationship('Experiment', back_populates='metrics')
metric_type = relationship('MetricType', back_populates='metrics')
region = relationship('Region', back_populates='metrics')
storage_locations
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(128), nullable=False)
platform = Column(String(128), nullable=False)
bucket_name = Column(String(128), nullable=False)
key = Column(String(128))
platform_region = Column(String(64))
created_at = Column(DateTime)
updated_at = Column(DateTime)
file_counts = relationship('ExptStoredFileCount' back_populates='storage_location')
file_types
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(128), nullable=False)
file_template = Column(String(64), nullable=False)
file_format = Column(String(64))
description = Column(JSONB(astext_type=sa.Text()), nullable=True)
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
file_counts = relationship('ExptStoredFileCount', back_populates='file_type')
expt_stored_file_counts
id = Column(Integer, primary_key=True, autoincrement=True)
storage_location_id = Column(Integer, ForeignKey('storage_locations.id'))
file_type_id = Column(Integer, ForeignKey('file_types.id'))
experiment_id = Column(Integer, ForeignKey('experiments.id'))
count = Column(Float, nullable=False)
folder_path = Column(String(255))
cycle = Column(DateTime)
time_valid = Column(DateTime)
forecast_hour = Column(Float)
file_size_bytes = Column(BigInteger)
created_at = Column(DateTime, default=datetime.utcnow())
experiment = relationship('Experiment', back_populates='file_counts')
file_type = relationship('FileType', back_populates='file_counts')
storage_location = relationship('StorageLocation', back_populates='file_counts')
expt_array_metrics
id = Column(Integer, primary_key=True, autoincrement=True)
experiment_id = Column(Integer, ForeignKey('experiments.id'))
array_metric_type_id = Column(Integer, ForeignKey('array_metric_types.id'))
region_id = Column(Integer, ForeignKey('regions.id'))
sat_meta_id = Column(Integer, ForeignKey('sat_meta.id'), nullable=True)
value = Column(ARRAY(Float))
assimilated = Column(Boolean)
time_valid = Column(DateTime)
forecast_hour = Column(Float)
ensemble_member = Column(Integer)
created_at = Column(DateTime, default=datetime.utcnow())
experiment = relationship('Experiment', back_populates='array_metrics')
array_metric_type = relationship('ArrayMetricType', back_populates='array_metrics')
region = relationship('Region', back_populates='array_metrics')
sat_meta = relationship('SatMeta', back_populates='array_metrics')
array_metric_types
id = Column(Integer, primary_key=True, autoincrement=True)
instrument_meta_id = Column(Integer, ForeignKey('instrument_meta.id'), nullable=True)
obs_platform = Column(String(128))
name = Column(String(128), nullable=False)
long_name = Column(String(128))
measurement_type = Column(String(64), nullable=False)
measurement_units = Column(String(64))
stat_type = Column(String(64))
array_coord_labels = Column(ARRAY(String))
array_coord_units = Column(ARRAY(String))
array_index_values = Column(ARRAY(String))
array_dimensions = Column(ARRAY(String))
description = Column(JSONB(astext_type=sa.Text()), nullable=True)
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
array_metrics = relationship('ExptArrayMetric', back_populates='array_metric_type')
instrument_meta = relationship('InstrumentMeta', back_populates='array_metric_type')
sat_meta
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(256))
sat_id = Column(Integer)
sat_name = Column(String(128))
short_name = Column(String(64))
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
array_metrics = relationship('ExptArrayMetric', back_populates='sat_meta')
instrument_meta
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(256))
num_channels = Column(Integer)
scan_angle = Column(String(256), nullable=True)
created_at = Column(DateTime, nullable=False)
updated_at = Column(DateTime)
array_metric_type = relationship('ArrayMetricType', back_populates='instrument_meta')
All requests require either a dictionary or a YAML file to configure the call to the database. GET calls, e.g., are requests to download subsets of data using filters that are specified with a nested dictionary or within a YAML file configuration hierarchy. PUT calls must similarly (via defining a key:value structured hierarchy) specify which data to upload to the database. Other calls must also be configured with a dictionary or YAML file. The following example configuration dictionaries (which could be similarly defined in a YAML file with the same hierarchy) are provided as templates for basic use cases.
Example format of request dictionaries for 'experiment' calls.
GET:
request_dict = {
'db_request_name': 'experiment',
'method': 'GET',
'params': {
'filters': {
'name': {
'exact': 'C96L64.UFSRNR.GSI_SOCA_3DVAR.012016'
},
'cycle_start': {
'from': '2015-01-01 00:00:00',
'to': '2018-01-01 00:00:00'
},
'cycle_stop': {
'from': '2015-01-01 00:00:00',
'to': '2018-01-01 00:00:00'
},
'owner_id': {
'exact': '[email protected]'
},
'experiment_type': {
'like': '%COUPLED%'
},
'platform': {
'exact': 'pw_awv1'
},
'wallclock_start': {
'from': '2022-01-01 00:00:00',
'to': '2022-07-01 00:00:00'
},
},
'ordering': [
{'name': 'group_id', 'order_by': 'desc'},
{'name': 'created_at', 'order_by': 'desc'}
],
'record_limit': 4
}
}
PUT:
request_dict = {
'db_request_name': 'experiment',
'method': 'PUT',
'body': {
'name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'datestr_format': '%Y-%m-%d %H:%M:%S',
'cycle_start': '2016-01-01 00:00:00',
'cycle_stop': '2016-01-31 18:00:00',
'owner_id': '[email protected]',
'group_id': 'gsienkf',
'experiment_type': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'platform': 'pw_awv1',
'wallclock_start': '2021-07-22 09:22:05',
'wallclock_end': '2021-07-24 05:31:14',
'description': #JSON VALUE OF DESCRIPTION
}
}
Values which can be null or not provided: group_id, experiment_type, wallclock_end, description
Example format of request dictionaries for 'expt_metrics' calls.
GET:
request_dict = {
'db_request_name': 'expt_metrics',
'method': 'GET',
'params': {
'datestr_format': '%Y-%m-%d %H:%M:%S',
'filters': {
'experiment': {
'name': {
'exact': 'UFSRNR_GSI_SOCA_3DVAR_COUPLED_122015_HC44RS_lstr_tst',
},
'wallclock_start': {
'from': '2022-08-03 02:00:00',
'to': '2022-08-03 06:00:00'
}
},
'metric_types': {
'name': {
'exact': ['innov_stats_temperature_rmsd']
},
'stat_type': {
'exact': ['rmsd']
}
},
'regions': {
'name': {
'exact': ['global']
},
},
'time_valid': {
'from': '2015-01-01 00:00:00',
'to': '2016-01-03 00:00:00',
},
},
'ordering': [
{'name': 'time_valid', 'order_by': 'asc'}
]
}
}
PUT:
request_dict = {
'db_request_name' : 'expt_metrics',
'method': 'PUT',
'body' : {
'expt_name': experiment_name,
'expt_wallclock_start': experiment_wallclock,
'metrics': {
'name': name,
'region_name': region,
'elevation': elevation,
'elevation_unit': elevation_unit,
'value': value,
'time_valid': time_valid,
'forecast_hour' : forecast_hour,
'ensemble_member' : ensemble_member
},
'datestr_format': '%Y-%m-%d %H:%M:%S',
}
}
Values which can be null or not provided: elevation_unit, forecast_hour, ensemble_member
Note: for a successful PUT call, the experiment, region, and metric type referenced in the body must already be registered using the score_db_base.py. See the first example above on How To Register an Experiment. The process is the same for the other data types.
Harvest metrics only accepts PUT calls, therefore a method is not required. Any GET call for metrics should be through 'expt_metrics'.
request_dict = {
'db_request_name' : 'harvest_metrics',
'body' : {
'expt_name': experiment_name,
'expt_wallclock_start': experiment_wallclock,
'datestr_format': '%Y-%m-%d %H:%M:%S',
},
'hv_translator': hv_translator,
'harvest_config': #Dictionary configuration for harvest
}
Note the format of 'harvest_config' is required to be a valid config for score-hv calls. 'hv_translator' needs to be a string value for a registered harvester translator in the hv_translator_registry.py.
Harvest innov stats only accepts PUT calls, therefore a method is not required. Any GET call for metrics should be through 'expt_metrics'.
request_dict = {
'db_request_name': 'harvest_innov_stats',
'date_range': {
'datetime_str': '%Y-%m-%d %H:%M:%S',
'start': '2015-12-01 0:00:00',
'end': '2015-12-01 0:00:00'
},
'files': [
{
'filepath': 'path/to/file/to/harvest',
'filename': 'innov_stats.metric.%Y%m%d%H.nc',
'cycles': CYCLES,
'harvester': 'innov_stats_netcdf',
'metrics': ['temperature','spechumid','uvwind'],
'stats': ['bias', 'count', 'rmsd'],
'elevation_unit': 'plevs'
},
],
'output_format': 'tuples_list'
}
Example format of request dictionaries for 'metric_types' calls.
GET:
request_dict = {
'db_request_name': 'metric_type',
'method': 'GET',
'params': {
'filters': {
'name': {
'exact': 'innov_stats_temperature_rmsd',
'in': ['innov_stats_temperature_rmsd', 'innov_stats_uvwind_rmsd']
},
'measurement_type': {
'exact': 'temperature'
},
'measurement_unit': {
'like': 'celsius'
},
'stat_type': {
'exact': 'rmsd'
},
'long_name': {
'exact': 'rmsd of innov stats temperature'
}
},
'ordering': [
{'name': 'name', 'order_by': 'desc'},
{'name': 'created_at', 'order_by': 'desc'}
],
'record_limit': 4
}
}
PUT:
request_types = {
'db_request_name' : 'metric_types',
'method': 'PUT',
'body' : {
'name': name,
'long_name': long_name,
'measurement_type': measurement_type,
'measurement_units': units,
'stat_type': stat_type,
'description': #JSON FORMAT OF DESCRIPTION
}
}
Values which can be null or not provided: measurement_units, stat_type, description
Example format of request dictionary for 'regions' calls.
GET:
request_dict = {
'db_request_name': 'region',
'method': 'GET',
'params': {'filter_type': 'by_name'},
'body': {
'regions': [
'global',
'equatorial',
]
}
}
PUT:
request_dict = {
'db_request_name': 'region',
'method': 'PUT',
'body': {
'regions': [
{'name': 'global', 'min_lat': -90.0, 'max_lat': 90.0, 'east_lon': 0.0, 'west_lon': 360.0},
{'name': 'equatorial', 'min_lat': -5.0, 'max_lat': 5.0, 'east_lon': 0.0, 'west_lon': 360.0},
]
}
}
Example format of request dictionary for 'storage_locations' calls.
GET:
request_dict = {
'db_request_name': 'storage_locations',
'method': 'GET',
'params': {
'datestr_format': '%Y-%m-%d %H:%M:%S',
'filters': {
'name': {
'exact': 's3_example_bucket'
},
}
}
}
PUT:
request_dict = {
'db_request_name': 'storage_locations',
'method': 'PUT',
'body': {
'name': 's3_example_bucket',
'platform': 'aws_s3',
'bucket_name': 'noaa-example-score-db-bucket',
'key': 'reanalysis',
'platform_region': 'n/a'
}
}
Values which can be null or not provided: key, platform_region
Example request dictionaries for the 'file_types' calls.
GET:
request_dict = {
'db_request_name': 'file_types',
'method': 'GET',
'params' : {
'filters': {
'name' :{
'exact' : 'example_type'
}
}
}
}
PUT:
request_dict = {
'db_request_name': 'file_types',
'method' : 'PUT',
'body' :{
'name': 'example_type',
'file_template': '*.example',
'file_format': 'text',
'description': json.dumps({"name": "example"})
}
}
Values which can be null or not provided: file_format, description
Example request dictionaries for the 'expt_file_counts' calls.
GET:
request_dict = {
'db_request_name' : 'expt_file_counts',
'method': 'GET',
'params' : {
'filters': {
'experiment': {
'experiment_name': {
'exact': 'C96L64.UFSRNR.GSI_3DVAR.012016'
}
},
'file_types': {
'file_type_name': {
'exact': 'example_type',
},
},
'storage_locations': {
'storage_loc_name' :{
'exact': 's3_example_bucket',
},
},
}
}
}
PUT:
request_dict = {
'db_request_name': 'expt_file_counts',
'method': 'PUT',
'body': {
'experiment_name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'wallclock_start': '2021-07-22 09:22:05',
'file_type_name': 'example_type',
'file_extension': '.example',
'time_valid': '2023-02-05 06:00:00',
'forecast_hour' : 120,
'file_size_bytes' : 1234567890,
'bucket_name' : 'noaa-example-score-db-bucket',
'platform': 'aws_s3',
'key': 'reanalysis',
'count': 1230,
'folder_path': 'noaa-example-score-db-bucket/reanalysis/2023/02/23/2023022306',
'cycle': '2023-02-03 06:00:00'
}
}
Values which can be null or not provided: folder_path, cycle, time_valid, forecast_hour, file_size_bytes
Note: the associated experiment, file type, and storage locations referenced in the body values must already be registered for a successful file count PUT call using the score_db_base.py. See the first example above on How To Register an Experiment. The process is the same for the other data types.
Example dictionaries for 'sat_meta' calls.
GET:
request_dict = {
'db_request_name': 'sat_meta',
'method': 'GET',
'params' : {
'filters': {
'name' :{
'exact' : 'example_sat_meta'
}
}
}
}
PUT:
request_dict = {
'db_request_name': 'sat_meta',
'method' : 'PUT',
'body' :{
'name': 'example_sat_meta',
'sat_id': 123456789,
'sat_name': 'Example Sat Name',
'short_name': 'esm_1',
}
}
Each value can be null as necessary. A combination of sat_name, sat_id, and short_name is a unique sat meta.
Example dictionaries for 'instrument_meta' calls.
GET:
request_dict = {
'db_request_name': 'instrument_meta',
'method': 'GET',
'params' : {
'filters' : {
'name' : {
'exact' : 'example_instrument'
}
}
}
}
PUT:
request_dict = {
'db_request_name': 'instrument_meta',
'method' : 'PUT',
'body' : {
'name': 'example_instrument',
'num_channels': 34,
'scan_angle': 'The example of scan angle'
}
}
Values which can be null: scan_angle and num_channels
Example dictionaries for 'array_metric_types' calls.
GET:
request_dict = {
'db_request_name': 'array_metric_types',
'method':'GET',
'params':{
'filters':{
'name':{
'exact':'vertical_example_metric'
},
'instrument_meta_name':{
'exact':'example_instrument'
},
'stat_type':{
'exact':'example_stat'
}
},
'limit':1
}
}
To search based on instrument_meta use instrument_meta_name or innstrument_name.
PUT:
request_dict = {
'db_request_name': 'array_metric_types',
'method': 'PUT',
'body': {
'name': 'vertical_example_metric',
'longname': 'vertical long name example',
'obs_platform': 'satellite',
'measurement_type': 'example_measurement_type',
'measurement_units': 'example_measurement_units',
'stat_type': 'example_stat',
'array_coord_labels': ['temperature', 'elevation'],
'array_coord_units': ['K', 'feet'],
'array_index_values': [[10, 20, 30],[1000, 5000, 10000]],
'array_dimensions': [3, 3],
'description': json.dumps("example array metric type for testing purposes"),
'instrument_meta_name': 'example_instrument',
}
}
The 'instrument_meta_name' is only required when obs_platform is satellite to associate the appropriate instrument_meta value to the type. The instrument_meta value must be pre-registered before registering the array metric type.
Example dictionaries for 'expt_array_metrics' calls.
GET:
request_dict = {
'db_request_name': 'expt_array_metrics',
'method': 'GET',
'params': {
'datestr_format': '%Y-%m-%d %H:%M:%S',
'filters': {
'experiment': {
'name': {
'exact': 'C96L64.UFSRNR.GSI_3DVAR.012016',
},
'wallclock_start': {
'from': '2021-07-22 02:22:05',
'to': '2021-07-22 10:22:05'
}
},
'array_metric_types': {
'name': {
'exact': ['vertical_example_metric']
},
},
'regions': {
'name': {
'exact': ['global']
},
},
'time_valid': {
'from': '2015-01-01 00:00:00',
'to': '2016-01-03 00:00:00',
},
},
'ordering': [
{'name': 'time_valid', 'order_by': 'asc'}
]
}
}
PUT:
request_dict = {
'db_request_name': 'expt_array_metrics',
'method': 'PUT',
'body': {
'expt_name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'expt_wallclock_start': '2021-07-22 09:22:05',
'array_metrics': [
ExptArrayMetricInputData('vertical_example_metric','global',[[1, 2, 3],[4, 5, 6],[7, 8, 9]], True, '2015-12-02 06:00:00', None, None, 'example_sat_meta', None, None, None),
ExptArrayMetricInputData('vertical_example_metric2','global',[[111, 222, 333],[444, 555, 666],[777, 888, 999]], True, '2015-12-02 18:00:00', 24, 12, 'example_sat_meta', 123456789, 'Example Sat Name', 'esm_1')
],
'datestr_format': '%Y-%m-%d %H:%M:%S'
}
}
Multiple array metrics can be provided as a list at one time as seen in this example.
Note: the associated array metric type, provided via the name, must be pre-registered for a successful put call.
Example request dictionary of a call to 'plot_innov_stats'.
plot_control_dict = {
'db_request_name': 'plot_innov_stats',
'date_range': {
'datetime_str': '%Y-%m-%d %H:%M:%S',
'start': '2016-01-01 00:00:00',
'end': '2016-01-31 18:00:00'
},
'experiments': [
{
'name': 'C96L64.UFSRNR.GSI_3DVAR.012016',
'wallclock_start': '2021-07-22 09:22:05',
'graph_label': 'C96L64 GSI Uncoupled 3DVAR Experiment',
'graph_color': 'blue'
},
{
'name': 'C96L64.UFSRNR.GSI_SOCA_3DVAR.012016',
'wallclock_start': '2021-07-24 11:31:16',
'graph_label': 'C96L64 GSI and SOCA Coupled 3DVAR Experiment',
'graph_color': 'red'
}
],
'stat_groups': [
{
'cycles': CYCLES,
'stat_group_frmt_str': 'innov_stats_{metric}_{stat}',
'metrics': ['temperature','spechumid','uvwind'],
'stats': ['bias', 'rmsd'],
'elevation_unit': 'plevs',
'regions': [
'equatorial',
'global',
'north_hemis',
'south_hemis',
'tropics'
]
}
],
'work_dir': '/absolute/path/to/desired/figure/location',
'fig_base_fn': 'C96L64_GSI_3DVAR_VS_GSI_SOCA_3DVAR'
}
The code base is structured using separate files to handle interactions for separate tables as well as extra functionalities, i.e., harvester interactions. All of the code functionality resides with in the src folder.
The main file is score_db_base.py which is the launch point for all calls made via command line or outside script. This file contains the code which parses the input yaml or dictionary into a useful dictionary format and assigns the call to the appropriate registry item.
The db_request_registry.py contains the registration for all valid 'name' values for inputs. The registry is used to direct the call to the correct set of code and table interactions, i.e. experiments vs storage_locations.
The db_utils.py, time_utils.py, file_utils.py and yaml_utils.py files contains utility functions that are referenced frequently throughout the code base for their respective topics (database interactions, handling datetimes, files, and yaml format files).
The db_action_response.py file defines the response code structure that is expected by all return calls for values called via the registry.
The score_table_models.py file defines all the of the tables in the database using SQLAlchemy and contains the code to build those repsective tables.
Majority of the files are used for calls to manipulate specific database tables. The relationship between those files and the tables is as follows:
- experiments: experiments.py
- regions: regions.py
- metric_types: metric_types.py
- expt_metrics: expt_metrics.py
- storage_locations: storage_locations.py
- file_types: file_types.py
- expt_stored_file_counts: expt_file_counts.py
- sat_meta: sat_meta.py
- array_metric_types: array_metric_types.py
- expt_array_mterics: expt_array_metrics.py
The general purpose of all of these files is to define a code structure based on the columns of the database, handle the processing of input data into the appropriate values, handle any input filters or order_by statements for GET calls, and finally to handle the GET and PUT calls using SQLAlchemy to input or retrieve data from the database.
The expt_metrics.py, expt_file_counts.py, and expt_array_metrics.py files have interactions with the other files for the relevant tables related to the id relationships between those tables, for example, both call the experiments.py code to get an experiment id. See the Table Schemas for the full set of table interactions.
The harvest_metrics.py and harvest_innov_stats.py files handle calls which require harvesting of data via score-hv and then inputs that info into the expt_metrics table via calls to the expt_metrics.py code. harvest_metrics.py is a more generic version of harvest_innov_stats.py and can process any type of harvested data if an appropriate translator is provided in harvest_translator.py. Only innov_stats files can be used with harvest_innov_stats.py.
The harvest_translator.py file contains the code for translation between score-hv output values and the expected values necessary for inserting into the metrics table.
The hv_translator_registry.py contains the registered translator, found in harvest_translator.py, that can be called and is used to process calls appropriately by the specific translator in calls to harvest_metrics.
The plot_innov_stats.py file is used for creating plots about innov stats data. The innov_stats_plot_attrs.py file is used for defining attributes used in this plotting.
The tests folder contains all the pytests files used for testing. Inside there is a configs folder and experiment-data folder which contain configurations and data that are necessary for testing purposes and referenced by certain test files.
Each test_{name}.py file is used for testing the code referenced by the name. In general for the database table files it will make a get and a put call in the tests.
The .env_example file is used to show the required inputs for database connections. A .env file with the appropriate information, including database password, is required for running.
The score_db_utils.sh bash script is used as a utility for setting up the connection between score-db and score-hv code repositories. It should be run prior to making calls to score_db_base.py. The script assumes score-hv is downloaded at the same place as score-db is at a repo level.