Disable TUV diagnostics unless debugging by zxdawn · Pull Request #1243 · wrf-model/WRF

zxdawn · 2020-07-08T05:30:59Z

TYPE: bug

KEYWORDS: photolysis scheme, TUV, diagnostics, debug_level

SOURCE: Xin Zhang (NUIST)

DESCRIPTION OF CHANGES:
Problem:
When using WRF-Chem with the new Photolysis option (phot_opt = 4) activated, the model spends too much time
writing the TUV.diags. These diagnostics should only used for debugging photolysis rates. The diagnostics should be
enabled only for an explicitly requested debug value.

Solution:
Enable TUV diagnostics only when debug_level >= 100.

ISSUE:
Fixes #1242 Speed up writing the TUV.diags file

LIST OF MODIFIED FILES:
M chem/rxn.F

TESTS CONDUCTED:

TUV init is much quicker and the WRF-Chem one domain simulation test is successful.
The time is the time of the diagnostics call.

Here's the table if debug_level = 0:

	Before Commit	After Commit
24 cores	1s	skip
120 cores	10 min	skip

If I change debug_level to 100, this is the result:

	Before Commit	After Commit
24 cores	1s	1s
120 cores	10 min	10 min

Jenkins test is successful.

RELEASE NOTE: When using WRF-Chem with the new Photolysis option (phot_opt = 4) activated, the model spent too much time on looping and writing the TUV.diags which is only used for debugging photolysis rates. The diagnostics are now enabled only for debug_level >=100.

davegill · 2020-07-09T01:02:51Z

@zxdawn
Xin,
Thank you for getting this PR ready so quickly.

Would you please explain WHY this change has been made. Does this impact all of WRF Chem? Is it only for certain options? Are the files really large? For the uninitiated, also explain what TUV is. This only needs to be a few sentences. Then take that text and use it in the release notes.

zxdawn · 2020-07-09T01:42:52Z

@davegill OK. Added the RELEASE NOTE above.

davegill · 2020-07-09T04:29:37Z

@zxdawn
Xin,

Would you post the jenkins test email in these comments.
You mention that the processing is faster. Can you quantify that with before vs after timings?

zxdawn · 2020-07-09T12:58:14Z

jenkins test email

Please find result of the WRF regression test cases in the attachment. This build is for Commit ID: 3eac9fe5c4680a704e5172180b4c0f81a0a052a9, requested by: zxdawn for PR: https://github.com/wrf-model/WRF/pull/1243. For any query please send e-mail to David Gill.

    Test Type              | Expected  | Received |  Failed
    = = = = = = = = = = = = = = = = = = = = = = = =  = = = =
    Number of Tests        : 19           18
    Number of Builds       : 48           46
    Number of Simulations  : 166           164        0
    Number of Comparisons  : 105           104        0

    Failed Simulations are: 
    None
    Which comparisons are not bit-for-bit: 
    None

Time used for TUV init

I just found that if I use 24 cores, the time just cost < 1 second.
But, when I switch to 120 cores, the time costs ~10 minutes when I check the log of rsl.error.0000 by tail -f rsl.error.0000.
BTW, the outputs of start_datetime and end_datetime defined below are same ...

      IF ( 100 .LE. debug_level ) THEN 
        call date_and_time(VALUES=values)
        write(emsg,*)'start_datetime',values
        call wrf_message(trim(emsg))
        call wrf_message('Xin: call diagnostics')
        call diagnostics
        write(emsg,*)'end_datetime',values
        call wrf_message(trim(emsg))
      ENDIF

davegill · 2020-07-09T15:05:58Z

@zxdawn
Xin,
When you use 24 cores, the job takes < 1 s. When you use 120 cores, the job takes 10 minutes.

I have a couple of questions about this paraphrased statement:

Is this the time of the entire job, or the time of the diagnostics call?
Is there an improvement in timing when using the new code that bypasses the diagnostics?

Would you fill in the time (s) in a table such as this for the commit message:

	Before Commit	After Commit
24 cores
120 cores

davegill · 2020-07-14T21:58:15Z

@zxdawn @jordanschnell
Xin,
Your commit message is a bit confusing, that is why I asked for the table to be filled in.

Jordan,
Please review

zxdawn · 2020-07-15T05:17:24Z

@davegill Sorry for the late reply. I was preparing some field observations these days.

The time is the time of the diagnostics call.

Here's the table if debug_level = 0:

	Before Commit	After Commit
24 cores	1s	skip
120 cores	10 min	skip

If I change debug_level to 100, this is the result:

	Before Commit	After Commit
24 cores	1s	1s
120 cores	10 min	10 min

I guess there's still some space of improvement for writing the TUV.diags file?
I'm not sure what's the exact problem causing slow writing when using more cores.
I have tested the default I/O setting:

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /

The cost time is as same as my personal setting:

 &namelist_quilt
 nio_tasks_per_group = 5,
 nio_groups = 2,
 /

So, that's not caused by quilting, right?

davegill · 2020-07-15T05:26:10Z

@zxdawn

I have tested the default I/O setting:
The cost time is as same as my personal setting:
So, that's not caused by quilting, right?

Xin,
I agree. This does not appear to be a quilting issue. The focus of this PR was to reduce the non-debug time. Mission accomplished.

davegill · 2020-07-15T05:26:29Z

@jordanschnell
Jordan,
I am good with this PR

jordanschnell · 2020-07-15T15:00:27Z

@davegill @zxdawn - Maybe @stacywalters can help provide some insight into the slow write speed, but I am good with the PR as is as well.

jordanschnell

Approved by Chem

stacywalters · 2020-07-15T16:09:03Z

I did not realize the diagnostic caused performance problems. The diagnostic really was intended for development only. In all honesty it should have been removed from the repository code. Stacy

…

On Wed, Jul 15, 2020 at 11:09 AM jordanschnell ***@***.***> wrote: @davegill <https://github.com/davegill> @zxdawn <https://github.com/zxdawn> - Maybe @stacywalters <https://github.com/stacywalters> can help provide some insight into the slow write speed, but I am good with the PR as is as well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1243 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACA4BERT2TIRIVOAKMSBNDTR3XBCZANCNFSM4OUF2C4Q> .

Enable TUV diags for higher debug_level

3eac9fe

zxdawn requested a review from a team as a code owner July 8, 2020 05:30

davegill added bug WRF Chem release-v4.2.1 labels Jul 9, 2020

davegill changed the title ~~Disable TUV diags for low debug_level~~ Disable TUV diagnostics unless debugging Jul 9, 2020

jordanschnell approved these changes Jul 15, 2020

View reviewed changes

kkeene44 merged commit 7248de4 into wrf-model:release-v4.2.1 Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable TUV diagnostics unless debugging#1243

Disable TUV diagnostics unless debugging#1243
kkeene44 merged 1 commit intowrf-model:release-v4.2.1from
zxdawn:tuv_diag

zxdawn commented Jul 8, 2020 •

edited by davegill

Loading

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

zxdawn commented Jul 9, 2020

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

zxdawn commented Jul 9, 2020 •

edited

Loading

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

davegill commented Jul 14, 2020

Uh oh!

zxdawn commented Jul 15, 2020

Uh oh!

davegill commented Jul 15, 2020

Uh oh!

davegill commented Jul 15, 2020

Uh oh!

jordanschnell commented Jul 15, 2020

Uh oh!

jordanschnell left a comment

Uh oh!

stacywalters commented Jul 15, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zxdawn commented Jul 8, 2020 • edited by davegill Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

zxdawn commented Jul 9, 2020

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

zxdawn commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davegill commented Jul 9, 2020

Uh oh!

davegill commented Jul 14, 2020

Uh oh!

zxdawn commented Jul 15, 2020

Uh oh!

davegill commented Jul 15, 2020

Uh oh!

davegill commented Jul 15, 2020

Uh oh!

jordanschnell commented Jul 15, 2020

Uh oh!

jordanschnell left a comment

Choose a reason for hiding this comment

Uh oh!

stacywalters commented Jul 15, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zxdawn commented Jul 8, 2020 •

edited by davegill

Loading

zxdawn commented Jul 9, 2020 •

edited

Loading