-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tools for parallel slurm
processing across positions
#141
Comments
Hi @talonchandler, I'm still unfamiliar with
When you're writing distinct chunks, there shouldn't be any issues (metadata is a different story, but most workflows don't need to update more than once). The only problem that comes to mind is that
This hasn't been an issue to me so far when using
You could use SLURM dependency to avoid race conditions. You can have task A to create the array and metadata and additional tasks B that process and save the data (assuming disjoint chunks) that depend on A to be finished. Using dask or mpi are other options, but I they add much more complexity.
I have been testing and processing our datasets using our library slurmkit. No issues related to race conditions so far, but I always create the dataset/metadata before submitting the jobs using the same python script. Here is an example where I compute a registration model, apply them when fusing the data and estimate a flow field from the fused volume. |
Adding this for reference. It's basically what Jordao suggested. recOrder Slurm Scripts |
See documentation. This is expected as no channel metadata is written. Here you would need to create image arrays to generate position metadata (which needs information about the names, transformations etc of the arrays in this position). |
Providing channel names to |
The
|
Another approach I can think of is to offer an utility to 'guess' HCS metadata from existing directory structure. The workflow will be like so:
|
Thanks for the careful and helpful answers @JoOkuma + @ziw-liu, and thanks for the in-person discussion today @edyoshikun + @mattersoflight. @ziw-liu I think the utility you're describing would be very useful, and my attempts to create such a utility have led me to error messages that are difficult for me to understand. Can I suggest an example use case that will demonstrate SLURM processing of an hcs iohub .zarr? I would suggest an example like the following:
I'm more than happy to collaborate on this, but I would really appreciate your leadership @ziw-liu because I've been struggling to get this to work in the way you're describing. |
After some thought this util would have broader use if it does not require a certain directory structure beforehand. I can see how combining existing HCS stores or arbitrary collection of FOVs can be useful for e.g. pooling datasets for ML tasks. |
I would avoid creating a command just to do this. I think the API should "just work" and avoid these problems. In dexp, we took the CLI first approach, and I'm not happy with the results because it leads to complex command line scripts or duplication of commands when there's a new use case, which a better API could solve. |
This can close from my perspective. @edyoshikun has a working solution in the mantis repo, and gathering the FOVs into a plate is very useful feature. Thanks @ziw-liu. |
The mantis project is depending on
iohub
in part because of thezarr
format's ability to parallelize on SLURM. This issue is a request to clarify several behaviors ofiohub
, and to document best practices for parallel processing of hcs stores.Question 1: When is plate-level and position-level metadata written?
When multiple jobs try to write metadata to a .zarr store at the same time, race conditions can occur. @edyoshikun and I have been struggling to find a way to use
iohub
that avoids these race conditions.zarr.errors.ContainsGroupError: path '0/27/0' contains a group
-type errors?channel_names
supplied toopen_ome_zarr
? For example, is the following expected behavior?Question 2: How should we use
iohub
to read/write to multiple positions at the same time?iohub
provide tools to assist with this?The text was updated successfully, but these errors were encountered: