-
Notifications
You must be signed in to change notification settings - Fork 37
param restarts #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
param restarts #887
Conversation
Issue #683 discusses this path vs an alternative path adding variables at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is passing tests, it LGTM. That being said, I am a little queasy about the general philosophy of putting more and more stuff into params, especially things that may be better thought of as simulation state (e.g. quantities that are evolved but are not directly tied to the grid). I haven't thought carefully about how it would work, but it seems like it would make more logical sense to have a way of registering fields at the mesh level of arbitrary shape and being able to write those as hdf5 datasets (instead of as metadata). It seems worth it to have a discussion at the Parthenon meeting about this.
Edit: Hadn't carefully read #683 before writing this. I think I generally agree with @pgrete suggestion there.
It is not currently passing tests. Both for the regression test reason mentioned above, and for some other reason on the cuda machines. I need to debug. The fully general functionality for writing/reading arbitrary shapes was tricky to get right.
This is fair. Yes I'd like to discuss at the parthenon meeting. I think my solution is significantly more flexible than @pgrete 's suggestion, as it's type aware, capable of looking at shape, and knows how to handle both host- and device-side data, but this may or may not be a good thing. I do think my solution could be relatively straightforwardly translated into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the AddParams
from the documentation needs to be added to StateDescriptor
template <typename T, REQUIRES(implements<scalar(T)>::value)> | ||
void HDF5WriteAttribute(const std::string &name, const T &value, hid_t location) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it this what might enable writing arbitrary types to as attributes? I'm not sure how to understand REQUIRES(implements<scalar(T)>::value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's using @lroberts36 concepts. It means it knows how to write things like a single double or an integer, but it doesn't do arbitrary types. You'll see I also have specializations for std::vector
, ParArray
and Kokkos::View
. To do other types, one would need to add the appropriate template specialization.
Restarts with AthenaPK seem to work fine, AthenaPK tests are passing so far |
Big picture are you happy with the design @forrestglines ? |
I think it's good, I see it as an extension of the existing design and not any significant departure. If it's possible to add our own custom types to Param restarts in the future then it might make some of the usage of Params in my cluster pgen cleaner. |
I think this is possible, but I'd rather leave it to a future PR. I think it requires an additional interface to register custom I/O ops |
@forrestglines how do things look now? I fixed the missing function in Also, because I always forget how to add regression tests, I added a doc page for it. And I fixed some errors I got when building the docs locally. |
|| ncycle_out_mesh || 0 || int || Number of cycles between printing the mesh structure (e.g., total number of MeshBlocks and number of MeshBlocks per level) to standard out. Use a negative number to also print every time the mesh was modified. Default: 0 (i.e, off). | | ||
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. | | ||
+------------------------------+---------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sphinx didn't seem to like this ultrawide table. I had to shrink it to get it to render.
Just to make sure comment is here for posterity: We discussed in the last parthenon call and decided to go with this design ans there are open questions about a variable based approach, and it was decided an in-hand solution was desired. That said the whole team agreed with the concern about putting more stuff into params. |
Clang tests currently failing due to #630 . |
Discussed with @bprather and currently he's happy. Calling it good assuming tests pass after the fix and @forrestglines approves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Mostly I'm happy that this is opt-in (at least, the reading -- iiuc it will make restart files bigger regardless, which I think is a small enough price to pay).
I don't think I carry around the kind of global state that would benefit from this, but it may prove useful to me at some point.
The merge seems to have broken compilation for my build of Parthenon within Athenapk:
EDIT: It's a simple fix, only happens when building without sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonahm-LANL I pushed a tiny fix, I think it's ready to merge
Oops good catch. Thanks for the fix @forrestglines ! |
Forcing merge since tests pass and the cuda/clang tests are currently broken. Thanks @pgrete for confirming. |
PR Summary
We've discussed previously that we now use
Params
for many purposes, including storing mesh-level data. In this PR, I implement scaffolding for this use case. In particular, I add:Kokkos::View
andParArray*D
params to diskImmutable
,Mutable
, orRestart
. HereMutable
is as it was before (though now this is an enum instead of a bool.)Restart
is a superset ofMutable
. ARestart
param is mutable, but also re-read from the restart file upon restart. It is first set to whatever the package initialization provides, then overwritten.This new machinery enables some important use-cases. For Phoebus, it lets us stash spacetime information in
Params
, which is useful in some cases. I imagine that for things like turbulent forcing, or derived integral quantities, the Athena-PK team may also be interested.Limitations, technical questions
Params
or readingParams
that differ on different MPI ranks is undefined behavior. This was already true, but I now document it. I think permitting differentParams
on different ranks would be a pretty big lift, and I recommend against. At least for now.@bprather I know I've talked to you about this design. Please take a look at this and let me know if it meets your constraints. If I recall, there was some use-case you had, where you did not want the infrastructure to modify your mutable params, which is why I added the three mutability levels.
PR Checklist