-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial setup #1
Comments
I drafted the README. We should discuss the scope with the goal to reach an MVP and start to use the library in other repos. The library will evolve based on how it is used. |
One question about this item from the README
Would a deep learning dataset class (i.e. a |
I agree that this package should not provide a Dataset class. I was thinking of efficient thin wrappers around the data tree written by the library to enable succinct usage like this:
|
@ziw-liu , @JoOkuma , @talonchandler I recommend reading this preprint. This discussion related to chunked TIFF format is particularly relevant to the problem of passing data to analysis pipelines such as CellPose: "Some success has been achieved with OME-TIFF, a 2D multi-resolution image format that captures acquisition metadata as OME-XML in the TIFF header 2,7,8. Reference software implementations are available in Java (https://github.com/ome/bioformats/), C++ (https://gitlab.com/codelibre/ome/ome-files-cpp) and Python (e.g., https://github.com/AllenCellModeling/aicsimageio, https://github.com/apeer-micro/apeer-ometiff-library, https://github.com/cgohlke/tifffile). OME-TIFF is supported by several commercial imaging companies (see https://www.openmicroscopy.org/commercial-partners/) and is the recommended format for public data projects like Image Data Resource (IDR) or Allen Institute of Cell Science, making their data available from https://open.quiltdata.com/b/allencell/. As our and others’ use of existing tools for conversion to OME-TIFF grew, TIFF’s linear binary layout became a bottleneck. Larger files took increasingly long to write. This problem was most obvious in projects that required the conversion of large numbers of whole slide images from PFFs to OME-TIFF for use in data lakes that are used for AI training sets (https://pathlake.org/; https://icaird.com/). The need for a scalable conversion motivated our development of two tools, bioformats2raw (https://github.com/glencoesoftware/bioformats2raw) and raw2ometiff (https://github.com/glencoesoftware/raw2ometiff). Together they provide a parallel pipeline using Bio-Formats to convert any supported PFF into multi-resolution OME-TIFF. This is achieved by breaking images into atomic “chunks”, writing them independently to disk, and generating subresolutions from them when none are available, whereupon a second process can efficiently write these chunks into TIFF (Figure 1b)." If we need to convert existing data to TIFF, we can write scripts that use some of the above tools and share them via When the user wants to write data into TIFF, we can rely on tifffile. |
I feel that we now have a clear path towards these goals. Closing in favor of specific issues. |
After migrating the
io
module ofwaveorder
(at mehta-lab/waveorder@5f60f0a) to this new repository, the next steps may include:WaveorderReader
toImageReader
#46iohub.Dataset
class that offer an array-like interface), and make sure the existing feature set works under it. Refactor API for NGFF datasets #31 Universal entry points #40 Universal reader output base classes #132Where each of these can be elaborated/debated upon in spin-off issues
@mattersoflight @JoOkuma @royerloic @talonchandler @Christianfoley please feel free to add to or modify these objectives.
The text was updated successfully, but these errors were encountered: