MEKE: No-division implementation of src_btm_drag#9
Merged
ElizabethYankovsky merged 2 commits intoSep 17, 2024
Merged
Conversation
This patch modifies the calculation of `src_btm_drag` to avoid
additional division operations.
We first note the following variable renames:
* `Isfac` is now `damping`, since it describes the net damping or
reduction of any damped fields. `sfac` has been removed since it no
longer appears in the expressions.
* `ldamping` is now `damp_rate`. This is primarily to avoid confusion
with `damping`, but also to more accurately describe its role.
* `ldamping_Strang1` is replaced to `damp_rate_s1`. It is used for a
similar purpose (cacheing of values from the first stage) but now
holds a slightly different value.
The following modifications were made.
1. The conditional split of `sdt` into `sdt_damping` substeps is
quantified by a new term, `damp_step` which is equal to 0.5 or 1.
`sdt_damp` is computed as `sdt * damp_step`.
The presence of `sdt_damp / sdt` in our expressions is replaced with
`damp_step`, which avoids an extra division.
2. The first expression (using new notation) was originally
sfac = 1 + sdt_damp * damp_rate
D = M * (1 - sfac) / (sdt * sfac)
where `D` is `src_btm_drag` and `M` is `MEKE_current(i,j)`
This has been transformed to
D = - M * (sdt_damp * damp_rate)
/ (sdt * (1 + sdt_damp * damp_rate))
= - M * (sdt_damp / sft) * damp_rate
* (1 / (1 + sdt_damp * damp_rate))
= - M * damp_step * damp_rate * damping
This new expression for `D` no longer requires a division.
3. In the second stage expression for `src_btm_drag`, we again use
`damp_rate` to replace `ldamping`. We also use `damp_rate1` to
denote the original value of `ldaming_Strang1`.
sfac = (1 + sdt_damp * damp_rate1) * (1 + sdt_damp * damp_rate)
D = M * (1 - sfac) / (sdt * sfac)
Using `damping1` to denote the damping from the first stage, `D` can
be transformed as follows.
D = -M * (sdt_damp * damp_rate + sdt_damp * damp_rate1
+ sdt_damp**2 * damp_rate * damp_rate1)
/ (sdt * (1 + sdt_damp * damp_rate) * (1 + sdt_damp * damp_rate1))
= -M * (sdt_damp / sdt) * (1 / (1 + sdt_damp * damp_rate))
* (damp_rate + damp_rate1 + sdt_damp * damp_rate * damp_rate1)
/ (1 + sdt_damp * damp_rate1)
= -M * damp_step * damping
* (damp_rate * (1 + sdt_damp * damp_rate1) + damp_rate1)
/ (1 + sdt_damp * damp_rate1)
= -M * damp_step * damping * (damp_rate + damp_rate1 * damping1)
We now store `damp_rate1 * damping1` in `damp_rate_s1(:,:)` rather
than just `damping1`, and can reduce this expression to
D = -M * damp_step * damping * (damp_rate + damp_rate_s1(i,j))
which requires no division. It also requires no additional
read or writes.
Assuming there are no mistakes here, this should only cause negligible
bitwise differences in the results.
Also, comments have been added which explain that we cannot modify the
existing expressions for `MEKE%MEKE`, since they would alter the bitwise
results of existing runs.
In the second stage of Strang splitting, there is a check for `sdt > sdt_damping`, following a check for nonzero `KH` or `K4`. However, `sdt_damp` can only be less than `sdt` when these are nonzero. Otherwise, `sdt_damping` equals `sdt`. Best as I can tell, this check is unnecessary and potentially problematic for optimization. This patch removes the redundant check.
Author
|
There are some significant changes to the equations here, so probably needs a thorough review from @ElizabethYankovsky or @Wendazhang33 if possible. Also: I was able to confirm that replacing |
Author
|
I think these regressions are not a problem. We would expect that replacement of the divisions would shuffle the bits around. Seems reassuring that the answers are mostly the same, however. (Even more reassuring that the testing suite detected and caught the regression!) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains two commits. The first eliminates divisions from the calculation of
src_btm_dragin Strang splitting. The second removes a redundant if-block in the second-stage of Strang splitting.Re-indentation of the second commit can make the PR a bit more complicated to read, so it might be easier to inspect the individual commits.
Commit messages are posted below.
MEKE: No-division implementation of src_btm_drag
This patch modifies the calculation of
src_btm_dragto avoidadditional division operations.
We first note the following variable renames:
Isfacis nowdamping, since it describes the net damping orreduction of any damped fields.
sfachas been removed since it nolonger appears in the expressions.
ldampingis nowdamp_rate. This is primarily to avoid confusionwith
damping, but also to more accurately describe its role.ldamping_Strang1is replaced todamp_rate_s1. It is used for asimilar purpose (cacheing of values from the first stage) but now
holds a slightly different value.
The following modifications were made.
The conditional split of
sdtintosdt_dampingsubsteps isquantified by a new term,
damp_stepwhich is equal to 0.5 or 1.sdt_dampis computed assdt * damp_step.The presence of
sdt_damp / sdtin our expressions is replaced withdamp_step, which avoids an extra division.The first expression (using new notation) was originally
where
Dissrc_btm_dragandMisMEKE_current(i,j)This has been transformed to
This new expression for
Dno longer requires a division.In the second stage expression for
src_btm_drag, we again usedamp_rateto replaceldamping. We also usedamp_rate1todenote the original value of
ldaming_Strang1.Using
damping1to denote the damping from the first stage,Dcanbe transformed as follows.
We now store
damp_rate1 * damping1indamp_rate_s1(:,:)ratherthan just
damping1, and can reduce this expression towhich requires no division. It also requires no additional
read or writes.
Assuming there are no mistakes here, this should only cause negligible
bitwise differences in the results.
Also, comments have been added which explain that we cannot modify the
existing expressions for
MEKE%MEKE, since they would alter the bitwiseresults of existing runs.
In the second stage of Strang splitting, there is a check for
sdt > sdt_damping, following a check for nonzeroKHorK4.However,
sdt_dampcan only be less thansdtwhen these are nonzero.Otherwise,
sdt_dampingequalssdt.Best as I can tell, this check is unnecessary and potentially
problematic for optimization. This patch removes the redundant check.