Skip to content

Initial testing framework#2095

Merged
islas merged 12 commits intowrf-model:release-v4.6.1from
islas:initial-testing-framework
Sep 19, 2024
Merged

Initial testing framework#2095
islas merged 12 commits intowrf-model:release-v4.6.1from
islas:initial-testing-framework

Conversation

@islas
Copy link
Collaborator

@islas islas commented Aug 9, 2024

TYPE: enhancement

KEYWORDS: testing, regression, test framework

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The current regression suite code is complex, requires maintenance of multiple alternate repositories, and takes involved effort to add a new test making community contribution limited at best. Likewise, the complexity of the system reduces the likelihood of independent local testing of changes, leading to a development cycle of one-off commits done to reinvoke testing to see if meaningful commits fix the issues.

Solution:
This new proposed regression suite addresses these shortcomings in a number of discrete ways:

  1. Modularize the testing framework to an generalized independent repo usable by any repo seeking to set up tests that can run locally, on HPC systems, and within any CI/CD framework
  2. Write WRF-specific test scripts inside the WRF repo and in a manner that does not rely on specific layouts/hardware/etc. so long as WRF can compile and run on intended system (i.e. able to be run locally)
  3. Write CI/CD tests in a simple and generally CI/CD framework-agnostic method where definitions of these also reside within the WRF repo
  4. Utilize HPC resources in a safe manner to increase breadth of testing to allow testing of many more compilers and on similar hardware to the general use case of WRF

As a first pass at demonstrating this solution, this PR implements a simple set of compilation tests using GNU x86 configurations testing serial, sm, dm, and sm+dm selections. The CI/CD portion is done via GitHub workflow actions on a specific trigger event. The values and trigger methods are configurable, but this initial implementation will use the labeled trigger, which will initiate tests when compile-tests or all-tests is added as a label to a pull request.

TESTS CONDUCTED:

  1. Testing of this github workflow was done in a separate fork also testing on Derecho. Both positive and negative tests were used to demonstrate respective output usefulness.

RELEASE NOTE:
Introduce a modularized testing framework that allows testing locally and natively on HPC systems that lives within the WRF repository

islas added 7 commits August 8, 2024 14:41
In order to run test scripts outside of a testing framework, the handling of
environment setup should not be solely dependent on running within a dedicated
test framework. This has the added benefit of compartmentalizing the duties of
environment and dependency solving from running the tests.

These environment scripts allow for the selection of a particular environment
with the default being the fqdn of the current host. From there, arguments are
routed using standard POSIX-sh to a respective script. In the case of Derecho
(applicable to any system using lmod) all subsequent argument are treated as
modules to load into the current session.

The hostenv.sh script relies on one "argument" $AS_HOST being passed in via
variable setting to facilitate selection.

The helpers.sh script provides convenience features for substing checking in sh,
delayed environment variable expansion via eval, and quick banner creation.

The derecho.sh script is included as the first supported environment.
This script will facilitate the first tests. There are only three requirements
of any given test script with the planned testing framework. If a different
testing framework is used in the future, these requirements of the test scripts
can and should be re-evaluated.

The test script should :
1. Take the intended host / configuration environment as the first argument
2. Take the working directory to immediately change to as the second argument
3. Output some key phrase at the end of the test to denote success, anything else
   (non-zero exit code, no phrase but return zero) is a failure

This particular compilation test script satisfies the above while also providing
enough flexibility to select compile target, stanza configuration, parallel jobs,
and other command-line options into the make build.

Additionally, for convenience environment variables can be passed in as command-line
options to the test script to modularize certain inputs.
Following the documentation of the hpc-workflows testing framework and the
testing structure found in .ci/, a JSON file for a GNU compilation test was added.
This test will compile the em_real core using the GNU Linux x86 stanza configuration.

All other options are left as default. If this test is run using the derecho
configuration the appropriate modules will attempt to be loaded. For non-derecho
environments, per the testing structure under .ci/, if no configuration exists in
.ci/hostenv.sh then the current environment wil be used verbatim.
This reusable workflow balances quick setup with github actions-specific features.
It assumes that the tests can be controlled via a label being set in a PR.

To coordinate PR vs primary branch testing, a suffix is generated using either
the PR number or the branch name. This suffix is then used to relocate log files
to an archival location in an organized fashion. Github artifacts are still used
for failed test capture, but logs will also be moved to the archive location for
quicker access if one has access to where these tests execute.

To allow for parallelized testing available from hpc-workflows, the workflow can
make duplicate directories of the repository that can each run their own test
instance without clobbering files.

Once tests are run, results are gathered, relocated to archival location,
reported and printed to the screen, summarized into the actions summary page,
and then packaged into an artifact if failure occured.

Finally, the test label is removed if the named tests and label match.
This pipeline is triggered if any pushes occur on master or develop OR if a PR
is labeled with an appropriate tag as specified by the tests within this
workflow. Additionally, a specific label to trigger all tests can be used that
will be removed from the PR when all tests finish, regardless of exit status.

The pipeline makes extensive use of the reusable test_workflow.yml to
instantiate tests on runners.

This pipeline currently only includes the definition for one test to be run on
a github runner with tags that satisfy "derecho". Likewise, other hard-coded
values appearing in here assume a particular runner setup and environment.
@islas islas requested a review from a team as a code owner August 9, 2024 21:20
@islas
Copy link
Collaborator Author

islas commented Aug 9, 2024

I'm using the approach we're using in MPAS to setup testing with a very limited minimal setup (simple compilation tests) at first to get something started.

The idea would be to then gradually translate the current tests to a usable format by this framework.

@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type              | Expected  | Received |  Failed
= = = = = = = = = = = = = = = = = = = = = = = =  = = = =
Number of Tests        : 23           24
Number of Builds       : 60           57
Number of Simulations  : 158           150        0
Number of Comparisons  : 95           86        0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

@mgduda mgduda self-requested a review September 16, 2024 23:21
@mgduda mgduda self-requested a review September 17, 2024 00:32
@islas islas merged commit 958ce12 into wrf-model:release-v4.6.1 Sep 19, 2024
islas added a commit that referenced this pull request Feb 19, 2026
This PR introduces a set of tests that allows replication of the [WRF
Coop
Tests](https://github.com/kkeene44/wrf-coop/blob/update-v16/build.csh)
which are normally run as regression tests for PRs.

TYPE: enhancement

KEYWORDS: testing, cicd, continuous integration

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The current regression tests found in the WRF Coop repository suffer
from a few key design points:
1. located in a separate repository allowing code divergence and extra
maintenance burden
2. confusing layout due to multiple repositories and data file locations
3. test logic obfuscation due to actual code to be executed
auto-generated
4. limited execution tightly coupled to a containerized environment

PR #2095 tried to remedy this using `hpc-workflows`, however the
framework likewise suffered from issues:
1. manual unconventional environment management
2. duplication of effort between tests and lack of support for
dependencies between common actions (e.g. re-using builds across
multiple tests)
3. limited support for extensibility outside of argument manipulation

Solution:
This PR does not aim to entirely replace PR #2095 (notably the CI/CD
GitHub worklow) and instead leverages this point in PR #2095:
> 3. Write CI/CD tests in a simple and generally CI/CD
framework-agnostic method where definitions of these also reside _within
the WRF repo_

These tests follow this same mantra of _"CI/CD framework-agnostic"_ such
that they can more or less be a drop in replacement only for the
`hpc-workflows`-based tests.

The tests will cover the WRF Coop Test Cases (provided is a default
configuration for Derecho):
| Tests  |  |
| ------------- | ------------- |
| em_real  | em_realG  |
| em_realA  | em_realH  |
| em_realB  | em_realI  |
| em_realC  | em_realJ  |
| em_realD  | em_realK  |
| em_realE  | em_realL  |
| em_realF  | various build tests  |


The tests are now written in the [SANE
Workflows](https://github.com/islas/sane_workflows) framework, which
solves most of the issues faced by the other two setups. Data is still
spread across multiple locations, but that is separate from the testing
code.

The structure of the tests is as follows:
```
.sane/                          #< The root directory in WRF where the testing code is kept
└── wrf                         #< A subfolder to make all python-imports look like `import wrf`
    ├── custom_actions
    │   └── run_wrf.py          #< A module that has our custom reusable classes to setup initial conditions and model runs
    ├── hosts
    │   ├── derecho_envs.jsonc  #< The environments that derecho.jsonc has
    │   └── derecho.jsonc       #< Definition of derecho HPC system for this framework
    ├── scripts                 #< A subfolder to house all our shell helper scripts that do the bulk of the work
    │   ├── buildCMake.sh
    │   ├── buildMake.sh
    │   ├── compare_wrf.sh      #< Use diffwrf to compare two runs
    │   ├── run_init.sh         #< Configurable to run initial conditions (em_real.exe or ideal.exe)
    │   ├── run_wrf_restart.sh  #< Runs wrf.exe again in previous run folder and compares history
    │   └── run_wrf.sh          #< Runs wrf.exe
    └── tests                   #< Where our tests live
        ├── builds
        │   └── builds.py       #< Python module that sets up ALL our compilation tests (make + cmake)
        └── regtests
            └── wrf_coop.py     #< Python module that sets up the WRF Coop em_real* tests
```
Documentation for this new framework can be found at:
https://sane-workflows.readthedocs.io/en/latest/

One could run these tests on Derecho using the following commands
(inside a WRF repo clone):
```bash
python3 -m venv .venv/wrf_testing
source .venv/wrf_testing/bin/activate
python3 -m pip install --pre sane-workflows
# Runs the em_real test case
sane_runner --path .sane/ --actions em_real --run
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants