Small improvements#51
Conversation
69786ba to
83d103d
Compare
83d103d to
e9fcecf
Compare
|
rebased on top of other open pull requests |
|
made sure some of the loops in btstep are submitted as a single kernel. No differences in double_gyre, but got some nice speedups on benchmark - probably because of the vert 22 layers in benchmark vs double_gyre's 2. I get similar improvements on the A100s at monash. Pretty significant given that I only touched a handful of the loops. Not sure how well this scales if we increase domain size laterally though. vertvisc could probably benefit from a similar treatment. Will look tomorrow. |
|
fused a few loops in vertical viscosity 204106f on gadi Not sure I want to spend too much more time on fusing loops in vertvisc since there's a columnar rewrite in dev gfdl we'll port eventually. |
|
Are you able to rebase this? The content in #48 has been merged. |
|
The switch to j/k/i could be quite significant for the CPU as we eventually port to dev/gfdl. I'm becoming convinced that it may be the best path forward, but we should probably test these in some production runs before merging. |
a2cde1e to
d05618e
Compare
rebased! |
|
This has effectively become our baseline for performance, so I think it's time to merge this in. Although the j/k/i swaps do take us away from the live code (dev/gfdl or main), we can come back and sort it out down the road. There's a lot of commits, but there's also quite a variety of changes, so I'll merge without any squashing. |
Hi @marshallward @JorgeG94
This is a pull request that fixes a few small things:
These changes should be compatible with the other pull requests open atm.
feel free to leave this until you're back marshall.