Skip to content

Disable TUV diagnostics unless debugging#1243

Merged
kkeene44 merged 1 commit intowrf-model:release-v4.2.1from
zxdawn:tuv_diag
Jul 15, 2020
Merged

Disable TUV diagnostics unless debugging#1243
kkeene44 merged 1 commit intowrf-model:release-v4.2.1from
zxdawn:tuv_diag

Conversation

@zxdawn
Copy link
Contributor

@zxdawn zxdawn commented Jul 8, 2020

TYPE: bug

KEYWORDS: photolysis scheme, TUV, diagnostics, debug_level

SOURCE: Xin Zhang (NUIST)

DESCRIPTION OF CHANGES:
Problem:
When using WRF-Chem with the new Photolysis option (phot_opt = 4) activated, the model spends too much time
writing the TUV.diags. These diagnostics should only used for debugging photolysis rates. The diagnostics should be
enabled only for an explicitly requested debug value.

Solution:
Enable TUV diagnostics only when debug_level >= 100.

ISSUE:
Fixes #1242 Speed up writing the TUV.diags file

LIST OF MODIFIED FILES:
M chem/rxn.F

TESTS CONDUCTED:

  1. TUV init is much quicker and the WRF-Chem one domain simulation test is successful.
    The time is the time of the diagnostics call.

Here's the table if debug_level = 0:

Before Commit After Commit
24 cores 1s skip
120 cores 10 min skip

If I change debug_level to 100, this is the result:

Before Commit After Commit
24 cores 1s 1s
120 cores 10 min 10 min
  1. Jenkins test is successful.

RELEASE NOTE: When using WRF-Chem with the new Photolysis option (phot_opt = 4) activated, the model spent too much time on looping and writing the TUV.diags which is only used for debugging photolysis rates. The diagnostics are now enabled only for debug_level >=100.

@zxdawn zxdawn requested a review from a team as a code owner July 8, 2020 05:30
@davegill
Copy link
Contributor

davegill commented Jul 9, 2020

@zxdawn
Xin,
Thank you for getting this PR ready so quickly.

Would you please explain WHY this change has been made. Does this impact all of WRF Chem? Is it only for certain options? Are the files really large? For the uninitiated, also explain what TUV is. This only needs to be a few sentences. Then take that text and use it in the release notes.

@zxdawn
Copy link
Contributor Author

zxdawn commented Jul 9, 2020

@davegill OK. Added the RELEASE NOTE above.

@davegill
Copy link
Contributor

davegill commented Jul 9, 2020

@zxdawn
Xin,

  1. Would you post the jenkins test email in these comments.
  2. You mention that the processing is faster. Can you quantify that with before vs after timings?

@zxdawn
Copy link
Contributor Author

zxdawn commented Jul 9, 2020

  1. jenkins test email
Please find result of the WRF regression test cases in the attachment. This build is for Commit ID: 3eac9fe5c4680a704e5172180b4c0f81a0a052a9, requested by: zxdawn for PR: https://github.com/wrf-model/WRF/pull/1243. For any query please send e-mail to David Gill.

    Test Type              | Expected  | Received |  Failed
    = = = = = = = = = = = = = = = = = = = = = = = =  = = = =
    Number of Tests        : 19           18
    Number of Builds       : 48           46
    Number of Simulations  : 166           164        0
    Number of Comparisons  : 105           104        0

    Failed Simulations are: 
    None
    Which comparisons are not bit-for-bit: 
    None
  1. Time used for TUV init

I just found that if I use 24 cores, the time just cost < 1 second.
But, when I switch to 120 cores, the time costs ~10 minutes when I check the log of rsl.error.0000 by tail -f rsl.error.0000.
BTW, the outputs of start_datetime and end_datetime defined below are same ...

      IF ( 100 .LE. debug_level ) THEN 
        call date_and_time(VALUES=values)
        write(emsg,*)'start_datetime',values
        call wrf_message(trim(emsg))
        call wrf_message('Xin: call diagnostics')
        call diagnostics
        write(emsg,*)'end_datetime',values
        call wrf_message(trim(emsg))
      ENDIF

@davegill
Copy link
Contributor

davegill commented Jul 9, 2020

@zxdawn
Xin,
When you use 24 cores, the job takes < 1 s. When you use 120 cores, the job takes 10 minutes.

I have a couple of questions about this paraphrased statement:

  1. Is this the time of the entire job, or the time of the diagnostics call?
  2. Is there an improvement in timing when using the new code that bypasses the diagnostics?

Would you fill in the time (s) in a table such as this for the commit message:

Before Commit After Commit
24 cores
120 cores

@davegill davegill changed the title Disable TUV diags for low debug_level Disable TUV diagnostics unless debugging Jul 9, 2020
@davegill
Copy link
Contributor

@zxdawn @jordanschnell
Xin,
Your commit message is a bit confusing, that is why I asked for the table to be filled in.

Jordan,
Please review

@zxdawn
Copy link
Contributor Author

zxdawn commented Jul 15, 2020

@davegill Sorry for the late reply. I was preparing some field observations these days.

The time is the time of the diagnostics call.

Here's the table if debug_level = 0:

Before Commit After Commit
24 cores 1s skip
120 cores 10 min skip

If I change debug_level to 100, this is the result:

Before Commit After Commit
24 cores 1s 1s
120 cores 10 min 10 min

I guess there's still some space of improvement for writing the TUV.diags file?
I'm not sure what's the exact problem causing slow writing when using more cores.
I have tested the default I/O setting:

 &namelist_quilt
 nio_tasks_per_group = 0,
 nio_groups = 1,
 /

The cost time is as same as my personal setting:

 &namelist_quilt
 nio_tasks_per_group = 5,
 nio_groups = 2,
 /

So, that's not caused by quilting, right?

@davegill
Copy link
Contributor

@zxdawn

I have tested the default I/O setting:
The cost time is as same as my personal setting:
So, that's not caused by quilting, right?

Xin,
I agree. This does not appear to be a quilting issue. The focus of this PR was to reduce the non-debug time. Mission accomplished.

@davegill
Copy link
Contributor

@jordanschnell
Jordan,
I am good with this PR

@jordanschnell
Copy link
Contributor

@davegill @zxdawn - Maybe @stacywalters can help provide some insight into the slow write speed, but I am good with the PR as is as well.

Copy link
Contributor

@jordanschnell jordanschnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by Chem

@stacywalters
Copy link
Contributor

stacywalters commented Jul 15, 2020 via email

@kkeene44 kkeene44 merged commit 7248de4 into wrf-model:release-v4.2.1 Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants