Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video preprocessing (denoising) feature #83

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

t-sasatani
Copy link
Collaborator

@t-sasatani t-sasatani commented Dec 7, 2024

This PR adds a denoising module for neural recording videos. It can be used offline, and with some updates, it could be used in real time with the streamDaq.

It would be better to quickly merge this to the main branch to add other processing features in a relatively stable location. Also, nothing depends on this now anyway.

Processing features

  • Detect broken buffers by comparing buffers with the same position buffer in the previous frame (I'm only making mean error now, but we can also add more statistic functions).
  • Remove frames that have broken buffers. These broken frames are individually stacked and tracked to examine which frame got removed. Remove broken buffers by copying in the same position buffer in the previous frame.
  • Spatial frequency-based filtering.
  • Generate the minimum projection out of a stack of frames.

Following is an example frame that detects noisy regions and patches it with the prior frame buffer:
image

Interface

You can run this with mio process denoise using an example video.
The denoising parameters and export settings can be defined with a YAML file.

mio process denoise -i .\user_dir\test.avi -c denoise_example

Minor changes

  • I moved VideoWriter to io module because it's pretty generic, and I wanted to use it for the denoising.

📚 Documentation preview 📚: https://miniscope-io--83.org.readthedocs.build/en/83/

@coveralls
Copy link
Collaborator

coveralls commented Dec 7, 2024

Pull Request Test Coverage Report for Build 12208523886

Details

  • 17 of 346 (4.91%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-14.7%) to 62.478%

Changes Missing Coverage Covered Lines Changed/Added Lines %
miniscope_io/cli/main.py 0 2 0.0%
miniscope_io/cli/process.py 0 12 0.0%
miniscope_io/io.py 15 32 46.88%
miniscope_io/models/process.py 0 32 0.0%
miniscope_io/models/frames.py 0 52 0.0%
miniscope_io/plots/video.py 0 59 0.0%
miniscope_io/process/video.py 0 155 0.0%
Totals Coverage Status
Change from base Build 12208310258: -14.7%
Covered Lines: 1069
Relevant Lines: 1711

💛 - Coveralls

@t-sasatani t-sasatani force-pushed the feat-preprocess branch 3 times, most recently from 1385223 to bc9a268 Compare December 7, 2024 01:05
@sneakers-the-rat
Copy link
Collaborator

omfg you already did it!?!??!?!?! u are so fast. cant wait to check this out!!!!!

@coveralls
Copy link
Collaborator

coveralls commented Dec 11, 2024

Coverage Status

coverage: 74.931% (-4.2%) from 79.112%
when pulling 11f0c19 on feat-preprocess
into 56397e9 on main.

@MarcelMB
Copy link
Contributor

MarcelMB commented Jan 8, 2025

amazing!!!! Thanks, Takuya for working on this already!

  • I was wondering if replacing a broken buffer with the one from the previous frame is a good strategy. Unsure. For one broken frame that might be fine but if there is a stretch of corrupted data and we basically have the same image for seconds not sure.

  • by subtracting a minimum projection at the end I was wondering if maybe not low-intensity biological signals could eventually get lost?

  • I need to understand the frequency mask a bit more to see what goes through and what is blocked.

@sneakers-the-rat
Copy link
Collaborator

I was wondering if replacing a broken buffer with the one from the previous frame is a good strategy. Unsure. For one broken frame that might be fine but if there is a stretch of corrupted data and we basically have the same image for seconds not sure.

agree, I think we want to drop data rather than copy data - it's scientific data after all and the numbers matter, so duplicating a frame is effectively fabricating data (even though this is not a bad strategy if we were streaming a movie and didn't care about the pixel values as much)

by subtracting a minimum projection at the end I was wondering if maybe not low-intensity biological signals could eventually get lost?

I need to understand the frequency mask a bit more to see what goes through and what is blocked.

this is a decent pair of thoughts that sort of highlights that we might want to separate cosmetic/display-oriented processing stages from signal repair stages: the frequency filtering is repairing the image, while doing the minimum projection is just for display purposes (right? i'm assuming in analysis people will want to have the full signal).

been working through backlog of issues after the holiday and my first big chunk of work will be on mio so i'll review this and stuff this week. thanks again for this Takuya :)

@t-sasatani
Copy link
Collaborator Author

Thanks for the comments! I'm stuck with other stuff at the moment but will make it ready for review and ping you guys hopefully next week. Something I kind of want to know from @sneakers-the-rat is whether the high-level structure seems mergeable to the pipeline refactor. No intention to make this really compatible because we need this now and this is an independent module but the code structure is pretty arbitrary so it might be good if there is something I can easily anchor to.

And yes, Phil mentioned too dropping broken frames will be good so that's planned (might just add an option of dropping because masking is already there). I was just too lazy to look in how processing pipelines handle this and track dropped buffers.

For the FFT and minimum projection, these are just modules so I think these just should be available whenever needed. The frequency mask might be used in actual preprocessing before CMNE pipelines but I'm not sure.
https://github.com/Aharoni-Lab/Miniscope-v4/wiki/Removing-Horizontal-Noise-from-Recordings

@MarcelMB
Copy link
Contributor

adding a bug here for the buffer patching
most likely due to multiple frames being corrupted after each other and the code currently always looking at the last frame to patch the next one. Resulting in loosing most of each frame for the rest of the video

files used to get this bug: invivo-20241203-1-003, invivo-20241203-3.avi

using commit:
9cd2bc3

Screenshot 2025-01-09 at 12 49 12 PM

"""

@staticmethod
def denoise(
Copy link
Member

@raymondwjang raymondwjang Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like there's an opportunity for improvement in terms of SOLID design, especially the single responsibility principle. This denoise method for example appears to:

  1. check for module
  2. read video
  3. write video
  4. noise patch
  5. freq. filtering
  6. interactive display
  7. image stacking

This will make it incredibly painful to debug, maintain, and extend the code. if we want to add a new processing step, for instance:
a. we wouldn't know where to add the new step
b. the whole processing function now has to be modified
c. the test for the entire processing step has to be updated
d. if something goes wrong we wouldn't know where it went wrong cause we modified the entire processing step.

and if we want to modify this function, everything downstream that uses this function that may be completely irrelevant to the changes has to be also modified. if we want to remove one of the processing steps for the saved video, but keep seeing it in the interactive display, the entire function has to be completely overhauled.

The rule of thumb normally is that if there are more than 2-3 if statements in a non-low level logic function, it's a symptom that it can be improved for readability and maintainability. :)

I'd hazard a guess that a lot of these if blocks can be packaged into different classes and files with clearer names and more consistent formatting across methods for better recyclability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. shouldn't be too tricky to split each of these into separate classes with an __init__ that takes config, process() that applies the transform, and finish() (or whatever we want to call it) that handles writing output if requested.

Copy link
Collaborator Author

@t-sasatani t-sasatani Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as you can see, it's literally just a raw script disguised as some class now. I'll try to be more explicit that these aren't awaiting formatting review or avoid uploading PRs in draft status so we can work efficiently.

Copy link
Collaborator

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for patience, finally got time to review this.

Three changes I think we should for sure make before merging this:

  • split out each of the processing methods into separate classes with a common API and then make the video processor class generic over those classes, details in comments.
  • unify the existing Frame and Frames classes with the new NamedFrame class, making separate classes for a single frame vs. collections of frames.
  • have tests for all of this

The rest are perf improvements and style comments

whether the high-level structure seems mergeable to the pipeline refactor. No intention to make this really compatible because we need this now and this is an independent module but the code structure is pretty arbitrary so it might be good if there is something I can easily anchor to.

If we split up each of these processing stages into separate classes with a single API then i can handle putting them in pipeline system once it drops.

.gitignore Outdated
!user_dir/.gitkeep

~/.config/miniscope_io/logs/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not too much harm in having extra entries in .gitignore, but this should hopefully be unnecessary (not in the repo directory, so shouldn't have an effect anyway)

Copy link
Collaborator Author

@t-sasatani t-sasatani Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is just an accident don't know how it got in

.pdm-python
user_dir/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these directories? we should ideally keep everything we create underneath the single user_dir set up in the config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're talking about the user_dir? I just wanted somewhere to keep data while I work in the repo for my convenience. I'll get rid of them before getting out of draft state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot user_dir is used for config. I might keep a user_data dir just for convenience in dev.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you where you want to put it, no problem having an extra line in .gitignore.

if you wanted this to persist you would set user_dir in the global config for your machine (mio config global path)

mio/data/config/process/denoise_example.yml Outdated Show resolved Hide resolved
mio/data/config/process/denoise_example.yml Outdated Show resolved Hide resolved
@@ -19,6 +19,107 @@
from mio.types import ConfigSource


class VideoWriter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of an odd object with a single staticmethod, esp since this one returns a cv2.VideoWriter and the the VideoReader class wraps it. fine for now bc it's on me to get the pipeline system in place where we can make this a predictably structured Sink class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just pulled this out from stream daq without thinking much hre


if config.frequency_masking.enable:
freq_filtered_frame, frame_freq_domain = processor.apply_freq_mask(
img=patched_frame,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if noise_patch.enable == False this won't be defined. we want to pass a single frame through several processing stages, and mutating that frame is a reasonable expectation within the processing pipeline. making a new copy for each processing stage would get pretty dang memory and perf intensive pretty fast.

Comment on lines 302 to 306
minimum_projection = VideoProcessor.get_minimum_projection(output_frames)

subtract_minimum = [(frame - minimum_projection) for frame in output_frames]

subtract_minimum = VideoProcessor.normalize_video_stack(subtract_minimum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely a separate class as well

freq_filtered_frames.append(freq_filtered_frame)
output_frame = freq_filtered_frame
output_frames.append(output_frame)
finally:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, if each of the processing classes had a finish() (or whatever name) method that handled its output routine then all we would do here is

for processor in processors:
    processor.finish()

Comment on lines 359 to 374
if config.interactive_display.enable:
videos = [
raw_video,
noise_patch,
patched_video,
freq_filtered_video,
freq_domain_video,
min_proj_frame,
freq_mask_frame,
]
VideoPlotter.show_video_with_controls(
videos,
start_frame=config.interactive_display.start_frame,
end_frame=config.interactive_display.end_frame,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a good example of why we want all these things to be separable functions/classes/etc. that can work framewise - ideally this would work on demand, where we just open the UI and compute it as requested in the context of a GUI (not something mio should try and provide) rather than needing to precompute everything and set up a UI for it afterwards.

return min_projection

@staticmethod
def normalize_video_stack(image_list: list[np.ndarray]) -> list[np.ndarray]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either should be a separate class like the other processor classes or a function, strange to have it here in a different form than all the other video processing methods.

@raymondwjang
Copy link
Member

raymondwjang commented Jan 14, 2025

Also would like to add that while I'm also a frequent violator of this 😭 , each PR should ideally consist of ~10 commits, with fewer than 500 lines of codes.

@t-sasatani: Here's something that might help that @sneakers-the-rat and I presented during the lab meeting that you couldn't attend (link)

@t-sasatani
Copy link
Collaborator Author

Thanks. I'll look at these later, but could you guys hold off on formatting reviews while the PR is marked draft? I commented in this thread last week that I'll change it to ready for review and ping you. It's not like I'm bothered or anything, but I do prefer using the draft as draft so we can discuss experimental prototypes before thinking about formatting or structure.

@t-sasatani
Copy link
Collaborator Author

And thanks for the info, @raymondwjang. I think I attended that meeting, but I sent a view request because I couldn't access the slides.

I can squash commit counts later if you prefer, but I guess it is unlikely that PRs for refactoring or adding new modules (like this one) will fit in 500 lines of code. That rule makes sense for minor additions or fixes, but is it considered better for everything?

@raymondwjang
Copy link
Member

raymondwjang commented Jan 14, 2025

shared! obviously there are reasonable violations to the rule, but simply seeing 1000+ lines on a PR is pretty threatening as a reviewer.

even within enterprise level large projects, massive PRs are rare unless there's a sweeping feature change or a complete refactor, as they always introduce exponentially larger review overheads, bug risks, rollback costs, etc. Even with new features it is not super common to see PRs approaching 1000 and above, and that's including the tests and documentations. Also remember that these repos are generally hosted with an extensive suite of CI/CD tools that make sure the codes are healthy. (which we don't currently have)

at present this PR is 1000 lines, but with tests and documentations it might approach 2000. that can turn into hours of reviewing and still missing bugs and edge cases. i had a similar pr that grew uncontrollably, and ended up having to chop it up into 5 different PRs so that it's somewhat reviewable.

as @sneakers-the-rat can attest, i often fail to keep things nicely packaged for reviewers and future me as well. but from what i've learned, it seems like it's always recommended to keep things small, nimble and modular. do as i say not as i do lmao

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Jan 14, 2025

I commented in this thread last week that I'll change it to ready for review and ping you. It's not like I'm bothered or anything, but I do prefer using the draft as draft so we can discuss experimental prototypes before thinking about formatting or structure.

no problem! i was just getting started working on mio again and wanted to catch up here, draft status totally understood, giving my feedback that hopefully might help with some ideas on different ways to finalize it :)

and Raymond super appreciate the advocacy of small PRs. There are a few gigantic refactors that need to get done so i'll be guilty of a few very large PRs myself here soon enough, but once we get into a roughly stable code structure we should be able to iterate much more cleanly :).

@t-sasatani
Copy link
Collaborator Author

t-sasatani commented Jan 14, 2025

Ok I'll try to keep these short, though I'll probably violate it sometimes. It might be easier for me to overlook review comments for draft PRs, and sry for that in advance.

from ..conftest import DATA_DIR, CONFIG_DIR

def test_interactive_display_config():
config = DenoiseConfig.from_id("denoise_test").interactive_display
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sneakers-the-rat, it might be in docs somewhere, but could you let me know how to find these configurations in the test dir? I'm putting it into the same directory as the test config for stream DAQ, but getting this. Do you need some registering or anything?

No config with id denoise_test found in /path/to/somewhere/Application Support/mio/config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a note here so I don't forget I'm skipping these config tests for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think we need these kinda tests (if we wanted to ensure that these values don't change, we could just copy it to tests and assert that the copy in tests was equal to the one in the package), but the fact that it isn't being found would be a bug with the config discovery methods or how we are monkeypatching them to include the tests/data/config directory during testing. i'll check this out in the morning

@t-sasatani t-sasatani marked this pull request as ready for review January 15, 2025 16:36
@t-sasatani
Copy link
Collaborator Author

t-sasatani commented Jan 15, 2025

I'm switching this to ready for review as I'm done for today.
I'm sorry this is becoming gigantic, and it might be painful to rebase @MarcelMB 's branch...

It's ready structure-wise, and I changed the "patch" frame function to "drop" frames and added a tracking/export thing.

I haven't gone through the review comments yet, more documents/tests need to be completed, and the interfaces, like the naming of the export paths, are unsettled, so I'll return to them later this week. In case anyone wants to take over or make a significant change in the meantime, feel free to do so.

return mask


class ZStackHelper:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this idea :)

self.height = height
self.width = width

def split_by_length(self, array: np.ndarray, segment_length: int) -> list[np.ndarray]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this seems pretty general purpose, but we don't have a good place to put 'general array manipulation functions' like a utils junk drawer would be. i like this helper class idea, bundling some operations together with some staticmethods. i get where this is coming from being split off from the previous implementation, so i'm fine with this, but in the future i might want to play with this helper class idea, make it its own module, and make some generic "ArrayHelper"/"BufferHelper" kind of classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants