Skip to content

fix issue with scaling over instances#329

Merged
jedwards4b merged 2 commits into
ESCOMP:masterfrom
jedwards4b:ninst_scaling
Dec 15, 2022
Merged

fix issue with scaling over instances#329
jedwards4b merged 2 commits into
ESCOMP:masterfrom
jedwards4b:ninst_scaling

Conversation

@jedwards4b
Copy link
Copy Markdown
Collaborator

Description of changes

Remove a performance funnel in ensemble_driver which caused linear scaling with number of members

Specific notes

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #): Fixes #326

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial)

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed

Testing performed if application target is CESM:

  • (recommended) CIME_DRIVER=nuopc scripts_regression_tests.py
    • machines:
    • details (e.g. failed tests):
  • (recommended) CESM testlist_drv.xml
    • machines and compilers:
    • details (e.g. failed tests):
  • (optional) CESM prealpha test
    • machines and compilers
    • details (e.g. failed tests):
  • (other) please described in detail
    • machines and compilers
    • details (e.g. failed tests):

Testing performed if application target is UFS-coupled:

  • (recommended) UFS-coupled testing
    • description:
    • details (e.g. failed tests):

Testing performed if application target is UFS-HAFS:

  • (recommended) UFS-HAFS testing
    • description:
    • details (e.g. failed tests):

Hashes used for testing:

  • CESM:
  • UFS-coupled, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch/hash:
  • UFS-HAFS, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch/hash:

@jedwards4b jedwards4b self-assigned this Dec 14, 2022
Comment thread cesm/driver/ensemble_driver.F90
@jedwards4b
Copy link
Copy Markdown
Collaborator Author

I tested the C5 case and saw considerable improvement over what @fischer-ncar reported. I would like if you and @fischer-ncar confirm this result.

Region PETs PEs Count Mean (s) Min (s) Min PET Max (s) Max PET
[ESMF] 540 540 1 217.2501 215.4383 176 217.9992 5
[ensemble] Init 1 540 540 1 204.3150 202.3066 206 205.1041 96
[ESM0004] IPDv02p3 108 108 1 166.4674 166.4671 324 166.4693 395
[LND] IPDv01p3 108 108 1 120.8491 120.8489 344 120.8492 388

@fischer-ncar
Copy link
Copy Markdown
Collaborator

I'm cleaning out my office this morning. So I'll give it a try this afternoon.

@fischer-ncar
Copy link
Copy Markdown
Collaborator

This is what I'm seeing for C2 and C5. So I can confirm a considerable improvement.

Region                                                                       PETs   PEs    Count    Mean (s)    Min (s)     Min PET Max (s)     Max PET
[ensemble] Init 1                                                            216    216    1        226.9003    226.8974    109     226.9062    18   
[ensemble] Init 1                                                            540    540    1        222.5143    222.5101    113     222.5175    216

@jedwards4b jedwards4b merged commit 78448ba into ESCOMP:master Dec 15, 2022
@jedwards4b jedwards4b deleted the ninst_scaling branch December 15, 2022 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-instances init time scaling linearly.

3 participants