-
Notifications
You must be signed in to change notification settings - Fork 24
Benchmark with DaCe cpu and gpu backends #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f1d77b8
038caf4
3bd1772
3ea7ce3
5d3539f
e7f2c39
6f4d710
e69aaa1
735e972
095ec26
0c68c5f
29a4727
101980d
d0696f3
34eeea4
208fd62
e7fd3b4
ea4762f
35ef834
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| [submodule "external/gt4py"] | ||
| path = external/gt4py | ||
| url = https://github.com/gridtools/gt4py.git | ||
| [submodule "external/dace"] | ||
|
|
||
| [submodule "dacefix"] | ||
| path = external/dace | ||
| url = https://github.com/spcl/dace.git | ||
| url = https://github.com/FlorianDeconinck/dace.git | ||
| branch = fix/gcc_dies_on_dacecpu | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # Benchmarking README | ||
|
|
||
| The tests contained in this archive are for benchmarking purposes only. Any | ||
| distribution beyond those personnel performing the tests need explicit approval | ||
| from NOAA/GFDL (Seth Underwood or Rusty Benson). | ||
|
|
||
| ## Cloning benchmark repository and generating conda environment | ||
|
|
||
| Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend. You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed). | ||
|
|
||
| ```shell | ||
| cd BOOST/ROOT | ||
| wget https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz | ||
| tar -xzf boost_1_79_0.tar.gz | ||
| mkdir -p boost_1_79_0/include | ||
| mv boost_1_79_0/boost boost_1_79_0/include/ | ||
| export BOOST_ROOT=BOOST/ROOT/boost_1_79_0 | ||
| ``` | ||
|
|
||
| To clone the benchmark branch use the command: | ||
|
|
||
| ```shell | ||
| git clone --recursive -b benchmark git@github.com:NOAA-GFDL/pace.git | ||
| ``` | ||
|
|
||
| or if you have already cloned the repository: | ||
|
|
||
| ```shell | ||
| git submodule update --init --recursive | ||
| ``` | ||
|
|
||
| After cloning, change into the directory containing the clone. To generate the conda environment use the following commands: | ||
|
|
||
| ```shell | ||
| conda create -y --name <desired_name> python=3.8 | ||
| conda activate <desired_name> | ||
| pip3 install --upgrade pip setuptools wheel | ||
| pip3 install -r requirements_dev.txt -c constraints.txt | ||
| ``` | ||
|
|
||
| ## Benchmarking configurations | ||
|
|
||
| There are four configurations of the PACE application contained within the branch to be used for benchmarking: | ||
|
|
||
| ```shell | ||
| driver/examples/configs/baroclinic_c384_cpu.yaml | ||
| driver/examples/configs/baroclinic_c384_gpu.yaml | ||
| driver/examples/configs/baroclinic_c3072_cpu.yaml | ||
| driver/examples/configs/baroclinic_c3072_gpu.yaml | ||
| ``` | ||
|
|
||
| ## Building | ||
|
|
||
| To build with the DaCe backends, set the following environment variables: | ||
|
|
||
| ```shell | ||
| FV3_DACEMODE=Build | ||
| PACE_FLOAT_PRECISION=64 | ||
| PACE_LOGLEVEL=INFO | ||
| PYTHONOPTIMIZE=1 | ||
| OMP_NUM_THREAD=1 | ||
| ``` | ||
|
|
||
| Adjust the time of the configuration to be built such that the time of the build is for one timestep. For example: | ||
|
|
||
| ```shell | ||
| dt_atmos: 450 | ||
| seconds: 450 | ||
| ``` | ||
| ## Running | ||
| To build with the DaCe backends, set the following environment variables: | ||
|
|
||
| ```shell | ||
| FV3_DACEMODE=Run | ||
| PACE_FLOAT_PRECISION=64 | ||
| PACE_LOGLEVEL=INFO | ||
| PYTHONOPTIMIZE=1 | ||
| OMP_NUM_THREAD=1 | ||
| ``` | ||
|
|
||
| Adjust the time of the configuration to be run to the desired length, example: | ||
|
|
||
| ```shell | ||
| dt_atmos: 450 | ||
| days: 9 | ||
| ``` | ||
|
|
||
| The time for the build or run can be set with units of seconds, minutes, hours, or days. | ||
|
|
||
| An example command to start the build or run process with MPI using the DaCe CPU backend for the c384 configuration: | ||
|
|
||
| ```shell | ||
| mpirun -n 1536 python3 -m pace.driver.run driver/examples/configs/baroclinic_c384_cpu.yaml | ||
| ``` | ||
|
|
||
| The build or run requires 1536 ranks, given that layout of 16x16 ranks per tile, and there are 6 tiles. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| stencil_config: | ||
| compilation_config: | ||
| backend: numpy | ||
| rebuild: false | ||
| validate_args: true | ||
| format_source: false | ||
| device_sync: false | ||
| initialization: | ||
| type: analytic | ||
| config: | ||
| case: baroclinic | ||
| performance_config: | ||
| collect_performance: true | ||
| experiment_name: c12_baroclinic | ||
| nx_tile: 12 | ||
| nz: 79 | ||
| dt_atmos: 225 | ||
| minutes: 15 | ||
| layout: | ||
| - 1 | ||
| - 1 | ||
| diagnostics_config: | ||
| path: output | ||
| output_format: netcdf | ||
| names: | ||
| - u | ||
| - v | ||
| - ua | ||
| - va | ||
| - pt | ||
| - delp | ||
| - qvapor | ||
| - qliquid | ||
| - qice | ||
| - qrain | ||
| - qsnow | ||
| - qgraupel | ||
| z_select: | ||
| - level: 65 | ||
| names: | ||
| - pt | ||
| dycore_config: | ||
| a_imp: 1.0 | ||
| beta: 0. | ||
| consv_te: 0. | ||
| d2_bg: 0. | ||
| d2_bg_k1: 0.2 | ||
| d2_bg_k2: 0.1 | ||
| d4_bg: 0.15 | ||
| d_con: 1.0 | ||
| d_ext: 0.0 | ||
| dddmp: 0.5 | ||
| delt_max: 0.002 | ||
| do_sat_adj: true | ||
| do_vort_damp: true | ||
| fill: true | ||
| hord_dp: 6 | ||
| hord_mt: 6 | ||
| hord_tm: 6 | ||
| hord_tr: 8 | ||
| hord_vt: 6 | ||
| hydrostatic: false | ||
| k_split: 1 | ||
| ke_bg: 0. | ||
| kord_mt: 9 | ||
| kord_tm: -9 | ||
| kord_tr: 9 | ||
| kord_wz: 9 | ||
| n_split: 1 | ||
| nord: 3 | ||
| nwat: 6 | ||
| p_fac: 0.05 | ||
| rf_cutoff: 3000. | ||
| rf_fast: true | ||
| tau: 10. | ||
| vtdm4: 0.06 | ||
| z_tracer: true | ||
| do_qa: true | ||
| tau_i2s: 1000. | ||
| tau_g2v: 1200. | ||
| ql_gen: 0.001 | ||
| ql_mlt: 0.002 | ||
| qs_mlt: 0.000001 | ||
| qi_lim: 1.0 | ||
| dw_ocean: 0.1 | ||
| dw_land: 0.15 | ||
| icloud_f: 0 | ||
| tau_l2v: 300. | ||
| tau_v2l: 90. | ||
| fv_sg_adj: 0 | ||
| n_sponge: 48 | ||
|
|
||
| physics_config: | ||
| hydrostatic: false | ||
| nwat: 6 | ||
| do_qa: true | ||
| schemes: | ||
| - GFS_microphysics |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,11 +15,10 @@ performance_config: | |
| nx_tile: 384 | ||
| nz: 79 | ||
| dt_atmos: 450 | ||
| minutes: 7 | ||
| seconds: 30 | ||
| days: 9 | ||
| layout: | ||
| - 1 | ||
| - 1 | ||
| - 16 | ||
| - 16 | ||
| diagnostics_config: | ||
| path: output | ||
| output_format: netcdf | ||
|
|
@@ -72,7 +71,7 @@ dycore_config: | |
| nwat: 6 | ||
| p_fac: 0.1 | ||
| rf_cutoff: 800. | ||
| rf_fast: false | ||
| rf_fast: true | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Has rf_fast=True been implemented?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It has not, but when run with it set to false, it throws an NotImplementedError when tau != 0. Should this instead be removed or tau == 0?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could cherry pick over the not-implemented PR from Florian and remove the options from the yaml config.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'd rather change tau than set something in the config that we haven't actually implemented in the model, at least for now. I'll create an issue for it as well though so we can implement a config that's internally consistent. |
||
| tau: 5. | ||
| vtdm4: 0.06 | ||
| z_tracer: true | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really do not love doing this, cannot wait to have it resolved