Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature(Write predictions to disk): Write strategies cache samples, to mirror source files organisation #324

Closed
wants to merge 43 commits into from

Conversation

melisande-c
Copy link
Member

@melisande-c melisande-c commented Dec 11, 2024

Description

This PR got quite big sorry

This is the improved version of the PredictionWriterCallback since the feedback on the initial PR #189, particularly the prediction file organisation should mirror the the source files. I.e. the all the 2D samples/slices in a source file should also be grouped in the prediction files.

The motivation for this set up is to unify writing to ZARR with writing to TIFF. The WriteZarrTiles class is still not implemented but once it is, it will allow for writing each tile directly to a ZARR file.

  • What:
    • Split the write strategies across 4 files:
      • protocol.py containing the interface
      • write_image.py
      • write_tiles.py
      • write_tiles_zarr.py
    • Extracted the caching functionality to two additional classes
      • TileCache: used by the WriteTiles strategy
      • SampleCache: used by both the WriteTiles and WriteImage strategies.
  • Maybe most importantly: the write strategies now need to be given the source filenames and the number of "samples" per file. This means an additional PR will be needed to create "metadata extractors" that retrieve this information at the start of CAREamist.predict_to_disk
  • Why: Progressing towards an abstracted way to write predictions to disk, that will unify writing to TIFF and ZARR.
  • How: Following the strategy pattern. PredictionWriterCallback contains a write strategy as an attribute that determines how predictions are written to disk. PredictionWriterCallback only interacts with write strategy interface defined in protocol.py as WriteStrategy protocol class.

Changes Made

  • Added:
    • caches.py
    • protocol.py
    • write_image.py
    • write_tiles.py
    • write_tiles_zarr.py
    • tests
    • README.md: contains some schematic mermaid diagrams
  • Modified:
    • write_strategy_factory.py: write_strategies need additional params filenames and n_samples_per_file
    • tests
  • Removed:
    • write_strategy.py (split into 4 files)

Related Issues

Additional Notes and Examples

Include any additional notes or context that reviewers should be aware of, including snippets of code illustrating your new feature.

There are a lot of classes so see below a class diagram for clarity (copied from the included README.md).

---
title: PredictionWriterCallback related classes
---
classDiagram
    PredictionWriterCallback*--WriteStrategy : composition
    WriteStrategy<--WriteTiles : implements
    WriteStrategy<--WriteImage : implements
    WriteStrategy<--WriteZarrTiles : implements
    WriteTiles*--TileCache : composition
    WriteTiles*--SampleCache : composition
    WriteImage*--SampleCache : composition
    class PredictionWriterCallback
    PredictionWriterCallback : +bool writing_predictions
    PredictionWriterCallback : +WriteStrategy write_strategy
    PredictionWriterCallback : +write_on_batch_end(...) 
    PredictionWriterCallback : +on_predict_epoch_end(...)
    class WriteStrategy
    <<interface>> WriteStrategy
    WriteStrategy : +write_batch(...)*
    WriteStrategy : +set_file_data(lists~str~ write_filenames, list~int~ n_samples_per_file)*
    WriteStrategy : +reset()*
    class WriteTiles
    WriteTiles : +WriteFunc write_func
    WriteTiles : +TileCache tile_cache
    WriteTiles : +SampleCache sample_cache 
    WriteTiles : +write_batch(...)
    WriteTiles : +set_file_data(lists~str~ write_filenames, list~int~ n_samples_per_file)
    WriteTiles : +reset()
    class WriteImage
    WriteImage : +WriteFunc write_func
    WriteImage : +SampleCache sample_cache
    WriteImage : +write_batch(...)
    WriteImage : +set_file_data(lists~str~ write_filenames, list~int~ n_samples_per_file)
    WriteImage : +reset()
    class WriteZarrTiles
    WriteZarrTiles : +write_batch(...) NotImplemented
    WriteZarrTiles : +set_file_data(lists~str~ write_filenames, list~int~ n_samples_per_file) NotImplemented
    WriteZarrTiles : +reset() NotImplemented
    class TileCache
    TileCache : +list~NDArray~ array_cache
    TileCache : +list~TileInformation~ tile_info_cache
    TileCache : +add(NDArray, list~TileInformation~ item)
    TileCache : +has_last_tile() bool
    TileCache : +pop_image_tiles() NDArray, list~TileInformation~
    TileCache : +reset()
    class SampleCache
    SampleCache : +list~int~ n_samples_per_file
    SampleCache : +Iterator n_samples_iter
    SampleCache : +int n_samples
    SampleCache : +sample_cache list~NDArray~
    SampleCache : +add(NDArray item)
    SampleCache : +has_all_file_samples() bool
    SampleCache : +pop_file_samples() list~NDArray~
    SampleCache : +reset()
Loading

Please ensure your PR meets the following requirements:

  • Code builds and passes tests locally, including doctests
  • New tests have been added (for bug fixes/features)
  • Pre-commit passes
  • PR to the documentation exists (for bug fixes / features)

@melisande-c melisande-c changed the title Mc/feat/cache pred samples Feature(Write predictions to disk): Write strategies cache samples, to mirror source files organisation Dec 11, 2024
@melisande-c melisande-c marked this pull request as ready for review December 12, 2024 09:15
@melisande-c melisande-c requested review from jdeschamps, a team and veegalinova December 12, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant