-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume Rendering Crash (Summit + WarpX) #825
Comments
WarpX Run path: |
I can replicate with Replay, however only at scale. I tried with latest Ascent. Best to try next with newer VTK-m?
|
Same issue with newer VTK-m (at scale)
|
Crash is reported in this functor: The heavy lifting is clearly in: We are using 80 nodes ( 480 GPUS) There should be plenty of memory -- the HDF5 size of this data (all domains) on disk is only 3.6 GB. |
Some more context, the min and max of the input datasets
The total across ranks in memory seems to be: (I suspect lots of zero so those must be compressing well to HDF5 to get to 3gb on disk) Based on this, we shouldn't hit a memory problem? I think I worked out the data sizes on the bad ranks as well
|
Replay Exe, and Actions and Blueprint files: /gpfs/alpine/world-shared/csc340/2021_11_warpx_vol_rend_issue |
Thanks for all this great information. Did you try isolating domain 71, 74 or 52 and running with just one GPU? (I can do this, but just wanted to make sure you hadn't done it already) |
Yes - I pulled those out into a blueprint hdf5 dataset and tried with replay. |
This was resolved by @goodbadwolf with fixes in VTK-m that are in the 1.7.1 release. Thanks again @goodbadwolf ! |
The text was updated successfully, but these errors were encountered: