-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multigrid MPI Bug Fix #966
Multigrid MPI Bug Fix #966
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (just two clarification comments/questions)
Apart from the bugfix, I like that we don't rely on implicit block[0]
info for MeshData
any more (this ties MeshData
closer to the Mesh
than to the MeshBlock
.
int mg_reg_dep_id = 0; | ||
TaskRegion ®ion = tc.AddRegion(num_partitions); | ||
for (int i = 0; i < num_partitions; ++i) { | ||
TaskList &tl = region[i]; | ||
AddMultiGridTasksLevel(region, tl, none, i, mg_reg_dep_id, max_level, min_level, | ||
max_level); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking: mg_reg_dep_id
is passed in as reference and then incremented in that function so that for multiple partition, the same logical function are being put in different regions, correct?
And is that also the indented behavior (rather than having each all blocks in each partition being logically in the same region so that all blocks on the mesh progress region by region)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just copied what was done in some other code for this where the reg_dep_id
was initialized outside of the loop over partitions and then incremented for each partition. It is not clear to me how the regional dependence ids work under the hood, so maybe it is not correct as is. The desired behavior is to have "all blocks in each partition being logically in the same region so that all blocks on the mesh progress region by region".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I suggest to defer the question to @jdolence who IIRC wrote the regional dependencies so that we don't run into unexpected issues with pack_size != 1
or != -1
PR Summary
It turns out I never bothered to actually test MG with MPI. This PR makes changes to the MG PR for MG to work correctly with MPI. First, it makes some small changes so that empty
MeshData
objects don't cause errors and adds MPI communicators for GMG fields that are involved in restriction. Second, and more importantly, it corrects a bug in the task list that allowed certain parts of composite GMG levels to run ahead of coarser levels in the hierarchy. This resulted in boundary data not being up to date at coarse-fine boundaries when any rank only owned coarse blocks on a composite grid. The multi-grid task list now includes task list dependencies and regional dependencies to ensure that the coarser levels of the hierarchy have finished before the post prolongation steps take place on finer two-level composite grids. To ensure the correct ordering within task lists,AddMultiGridTasksLevel
is now called recursively. I did not need to add extra buffers, which I had mentioned as a possible solution in the Parthenon sync last Thursday.Once this is reviewed, I will merge it into the main multi-grid PR and then merge that PR into develop.
PR Checklist