Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in dxtbx.image_average #207

Closed
phyy-nx opened this issue Aug 5, 2020 · 0 comments · Fixed by #208
Closed

Error in dxtbx.image_average #207

phyy-nx opened this issue Aug 5, 2020 · 0 comments · Fixed by #208

Comments

@phyy-nx
Copy link

phyy-nx commented Aug 5, 2020

From @dwpaley, after downloading a SACLA h5 file:

mpirun -n 32 dxtbx.image_average --mpi True --verbose ../data/run1/run371999-0.h5
Processing ../data/run1/run371999-0.h5: 0
Processing ../data/run1/run371999-0.h5: 32
Processing ../data/run1/run371999-0.h5: 64
Processing ../data/run1/run371999-0.h5: 96
Processing ../data/run1/run371999-0.h5: 128
Processing ../data/run1/run371999-0.h5: 160
Traceback (most recent call last):
  File "/net/viper/raid1/dwpaley/xfelgui2/w/dials-v3-0-4/build/../modules/dxtbx/command_line/image_average.py", line 435, in <module>
    sys.exit(run())
  File "/net/viper/raid1/dwpaley/xfelgui2/w/dials-v3-0-4/build/../modules/dxtbx/command_line/image_average.py", line 312, in run
    results = comm.gather(results, root=0)
  File "mpi4py/MPI/Comm.pyx", line 1262, in mpi4py.MPI.Comm.gather
  File "mpi4py/MPI/msgpickle.pxi", line 680, in mpi4py.MPI.PyMPI_gather
  File "mpi4py/MPI/msgpickle.pxi", line 685, in mpi4py.MPI.PyMPI_gather
  File "mpi4py/MPI/msgpickle.pxi", line 148, in mpi4py.MPI.Pickle.allocv
  File "mpi4py/MPI/msgpickle.pxi", line 139, in mpi4py.MPI.Pickle.alloc
SystemError: Negative size passed to PyBytes_FromStringAndSize

My testing finds it works with 10 processors but not 32. My googling indicates we are hitting a pickle limitation: https://groups.google.com/g/mpi4py/c/r95uGPcqXLA?pli=1. I think with 32 processors, we are sending 32 results back to rank 0, which apparently spills over the 2GB pickle limit.

Anyway, the proper way to do it to not use comm.gather at all and to instead use comm.reduce (like I did in my original implementation using lcls only data). Seems doable.

phyy-nx added a commit that referenced this issue Aug 5, 2020
phyy-nx added a commit that referenced this issue Aug 25, 2020
phyy-nx added a commit that referenced this issue Aug 28, 2020
phyy-nx added a commit that referenced this issue Sep 16, 2020
phyy-nx added a commit that referenced this issue Sep 16, 2020
ndevenish added a commit to dials/dxtbx that referenced this issue Sep 28, 2020
- ``dxtbx.image_average``: Better use of MPI to avoid errors and increase performance (cctbx#207)
- Update DLS I23 bad pixel mask after detector has been cleaned, fixing previously bad modules. (cctbx#220)
- Change default bit depth for DLS Eigers where header information is missing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant