Add video preprocessing (denoising) feature #83

t-sasatani · 2024-12-07T00:29:22Z

This PR adds a denoising module for neural recording videos. It can be used offline, and with some updates, it could be used in real time with the streamDaq.

It would be better to quickly merge this to the main branch to add other processing features in a relatively stable location. Also, nothing depends on this now anyway.

Processing features

Detect broken buffers by comparing buffers with the same position buffer in the previous frame (I'm only making mean error now, but we can also add more statistic functions).
Remove frames that have broken buffers. These broken frames are individually stacked and tracked to examine which frame got removed. ~~Remove broken buffers by copying in the same position buffer in the previous frame.~~
Spatial frequency-based filtering.
Generate the minimum projection out of a stack of frames.

Following is an example frame that detects noisy regions and patches it with the prior frame buffer:

Interface

You can run this with mio process denoise using an example video.
The denoising parameters and export settings can be defined with a YAML file.

mio process denoise -i .\user_dir\test.avi -c denoise_example

Minor changes

I moved VideoWriter to io module because it's pretty generic, and I wanted to use it for the denoising.

📚 Documentation preview 📚: https://miniscope-io--83.org.readthedocs.build/en/83/

coveralls · 2024-12-07T00:36:02Z

Pull Request Test Coverage Report for Build 12208523886

Details

17 of 346 (4.91%) changed or added relevant lines in 8 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-14.7%) to 62.478%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
miniscope_io/cli/main.py	0	2	0.0%
miniscope_io/cli/process.py	0	12	0.0%
miniscope_io/io.py	15	32	46.88%
miniscope_io/models/process.py	0	32	0.0%
miniscope_io/models/frames.py	0	52	0.0%
miniscope_io/plots/video.py	0	59	0.0%
miniscope_io/process/video.py	0	155	0.0%

Totals
Change from base Build 12208310258:	-14.7%
Covered Lines:	1069
Relevant Lines:	1711

💛 - Coveralls

sneakers-the-rat · 2024-12-10T03:37:26Z

omfg you already did it!?!??!?!?! u are so fast. cant wait to check this out!!!!!

coveralls · 2024-12-11T01:23:25Z

coverage: 74.931% (-4.2%) from 79.112%
when pulling 11f0c19 on feat-preprocess
into 56397e9 on main.

MarcelMB · 2025-01-08T01:08:20Z

amazing!!!! Thanks, Takuya for working on this already!

I was wondering if replacing a broken buffer with the one from the previous frame is a good strategy. Unsure. For one broken frame that might be fine but if there is a stretch of corrupted data and we basically have the same image for seconds not sure.
by subtracting a minimum projection at the end I was wondering if maybe not low-intensity biological signals could eventually get lost?
I need to understand the frequency mask a bit more to see what goes through and what is blocked.

sneakers-the-rat · 2025-01-08T02:44:11Z

I was wondering if replacing a broken buffer with the one from the previous frame is a good strategy. Unsure. For one broken frame that might be fine but if there is a stretch of corrupted data and we basically have the same image for seconds not sure.

agree, I think we want to drop data rather than copy data - it's scientific data after all and the numbers matter, so duplicating a frame is effectively fabricating data (even though this is not a bad strategy if we were streaming a movie and didn't care about the pixel values as much)

by subtracting a minimum projection at the end I was wondering if maybe not low-intensity biological signals could eventually get lost?

I need to understand the frequency mask a bit more to see what goes through and what is blocked.

this is a decent pair of thoughts that sort of highlights that we might want to separate cosmetic/display-oriented processing stages from signal repair stages: the frequency filtering is repairing the image, while doing the minimum projection is just for display purposes (right? i'm assuming in analysis people will want to have the full signal).

been working through backlog of issues after the holiday and my first big chunk of work will be on mio so i'll review this and stuff this week. thanks again for this Takuya :)

t-sasatani · 2025-01-08T03:19:18Z

Thanks for the comments! I'm stuck with other stuff at the moment but will make it ready for review and ping you guys hopefully next week. Something I kind of want to know from @sneakers-the-rat is whether the high-level structure seems mergeable to the pipeline refactor. No intention to make this really compatible because we need this now and this is an independent module but the code structure is pretty arbitrary so it might be good if there is something I can easily anchor to.

And yes, Phil mentioned too dropping broken frames will be good so that's planned (might just add an option of dropping because masking is already there). I was just too lazy to look in how processing pipelines handle this and track dropped buffers.

For the FFT and minimum projection, these are just modules so I think these just should be available whenever needed. The frequency mask might be used in actual preprocessing before CMNE pipelines but I'm not sure.
https://github.com/Aharoni-Lab/Miniscope-v4/wiki/Removing-Horizontal-Noise-from-Recordings

MarcelMB · 2025-01-10T01:31:44Z

adding a bug here for the buffer patching
most likely due to multiple frames being corrupted after each other and the code currently always looking at the last frame to patch the next one. Resulting in loosing most of each frame for the rest of the video

files used to get this bug: invivo-20241203-1-003, invivo-20241203-3.avi

using commit:
9cd2bc3

raymondwjang · 2025-01-13T22:44:19Z

mio/process/video.py

+    """
+
+    @staticmethod
+    def denoise(


seems like there's an opportunity for improvement in terms of SOLID design, especially the single responsibility principle. This denoise method for example appears to:

check for module

read video

write video

noise patch

freq. filtering

interactive display

image stacking

This will make it incredibly painful to debug, maintain, and extend the code. if we want to add a new processing step, for instance:
a. we wouldn't know where to add the new step
b. the whole processing function now has to be modified
c. the test for the entire processing step has to be updated
d. if something goes wrong we wouldn't know where it went wrong cause we modified the entire processing step.

and if we want to modify this function, everything downstream that uses this function that may be completely irrelevant to the changes has to be also modified. if we want to remove one of the processing steps for the saved video, but keep seeing it in the interactive display, the entire function has to be completely overhauled.

The rule of thumb normally is that if there are more than 2-3 if statements in a non-low level logic function, it's a symptom that it can be improved for readability and maintainability. :)

I'd hazard a guess that a lot of these if blocks can be packaged into different classes and files with clearer names and more consistent formatting across methods for better recyclability.

agreed. shouldn't be too tricky to split each of these into separate classes with an __init__ that takes config, process() that applies the transform, and finish() (or whatever we want to call it) that handles writing output if requested.

Yeah, as you can see, it's literally just a raw script disguised as some class now. I'll try to be more explicit that these aren't awaiting formatting review or avoid uploading PRs in draft status so we can work efficiently.

sneakers-the-rat

Alright, thanks for patience, finally got time to review this.

Three changes I think we should for sure make before merging this:

split out each of the processing methods into separate classes with a common API and then make the video processor class generic over those classes, details in comments.
unify the existing Frame and Frames classes with the new NamedFrame class, making separate classes for a single frame vs. collections of frames.
have tests for all of this

The rest are perf improvements and style comments

whether the high-level structure seems mergeable to the pipeline refactor. No intention to make this really compatible because we need this now and this is an independent module but the code structure is pretty arbitrary so it might be good if there is something I can easily anchor to.

If we split up each of these processing stages into separate classes with a single API then i can handle putting them in pipeline system once it drops.

sneakers-the-rat · 2025-01-13T21:14:16Z

.gitignore

+!user_dir/.gitkeep
+
+~/.config/miniscope_io/logs/


not too much harm in having extra entries in .gitignore, but this should hopefully be unnecessary (not in the repo directory, so shouldn't have an effect anyway)

Oh this is just an accident don't know how it got in

sneakers-the-rat · 2025-01-13T21:15:24Z

.gitignore

+.pdm-python
+user_dir/*


what are these directories? we should ideally keep everything we create underneath the single user_dir set up in the config

You're talking about the user_dir? I just wanted somewhere to keep data while I work in the repo for my convenience. I'll get rid of them before getting out of draft state.

Oh I forgot user_dir is used for config. I might keep a user_data dir just for convenience in dev.

up to you where you want to put it, no problem having an extra line in .gitignore.

if you wanted this to persist you would set user_dir in the global config for your machine (mio config global path)

mio/data/config/process/denoise_example.yml

sneakers-the-rat · 2025-01-13T21:23:59Z

mio/io.py

@@ -19,6 +19,107 @@
 from mio.types import ConfigSource


+class VideoWriter:


sort of an odd object with a single staticmethod, esp since this one returns a cv2.VideoWriter and the the VideoReader class wraps it. fine for now bc it's on me to get the pipeline system in place where we can make this a predictably structured Sink class.

Yeah just pulled this out from stream daq without thinking much hre

sneakers-the-rat · 2025-01-13T23:55:10Z

mio/process/video.py

+
+                if config.frequency_masking.enable:
+                    freq_filtered_frame, frame_freq_domain = processor.apply_freq_mask(
+                        img=patched_frame,


if noise_patch.enable == False this won't be defined. we want to pass a single frame through several processing stages, and mutating that frame is a reasonable expectation within the processing pipeline. making a new copy for each processing stage would get pretty dang memory and perf intensive pretty fast.

sneakers-the-rat · 2025-01-13T23:55:56Z

mio/process/video.py

+            minimum_projection = VideoProcessor.get_minimum_projection(output_frames)
+
+            subtract_minimum = [(frame - minimum_projection) for frame in output_frames]
+
+            subtract_minimum = VideoProcessor.normalize_video_stack(subtract_minimum)


definitely a separate class as well

sneakers-the-rat · 2025-01-13T23:58:26Z

mio/process/video.py

+                    freq_filtered_frames.append(freq_filtered_frame)
+                    output_frame = freq_filtered_frame
+                output_frames.append(output_frame)
+        finally:


same thing here, if each of the processing classes had a finish() (or whatever name) method that handled its output routine then all we would do here is

for processor in processors: processor.finish()

sneakers-the-rat · 2025-01-14T00:00:09Z

mio/process/video.py

+            if config.interactive_display.enable:
+                videos = [
+                    raw_video,
+                    noise_patch,
+                    patched_video,
+                    freq_filtered_video,
+                    freq_domain_video,
+                    min_proj_frame,
+                    freq_mask_frame,
+                ]
+                VideoPlotter.show_video_with_controls(
+                    videos,
+                    start_frame=config.interactive_display.start_frame,
+                    end_frame=config.interactive_display.end_frame,
+                )
+


here's a good example of why we want all these things to be separable functions/classes/etc. that can work framewise - ideally this would work on demand, where we just open the UI and compute it as requested in the context of a GUI (not something mio should try and provide) rather than needing to precompute everything and set up a UI for it afterwards.

sneakers-the-rat · 2025-01-14T00:01:34Z

mio/process/video.py

+        return min_projection
+
+    @staticmethod
+    def normalize_video_stack(image_list: list[np.ndarray]) -> list[np.ndarray]:


either should be a separate class like the other processor classes or a function, strange to have it here in a different form than all the other video processing methods.

raymondwjang · 2025-01-14T00:50:38Z

Also would like to add that while I'm also a frequent violator of this 😭 , each PR should ideally consist of ~10 commits, with fewer than 500 lines of codes.

@t-sasatani: Here's something that might help that @sneakers-the-rat and I presented during the lab meeting that you couldn't attend (link)

t-sasatani · 2025-01-14T01:10:11Z

Thanks. I'll look at these later, but could you guys hold off on formatting reviews while the PR is marked draft? I commented in this thread last week that I'll change it to ready for review and ping you. It's not like I'm bothered or anything, but I do prefer using the draft as draft so we can discuss experimental prototypes before thinking about formatting or structure.

t-sasatani · 2025-01-14T01:20:45Z

And thanks for the info, @raymondwjang. I think I attended that meeting, but I sent a view request because I couldn't access the slides.

I can squash commit counts later if you prefer, but I guess it is unlikely that PRs for refactoring or adding new modules (like this one) will fit in 500 lines of code. That rule makes sense for minor additions or fixes, but is it considered better for everything?

raymondwjang · 2025-01-14T01:57:17Z

shared! obviously there are reasonable violations to the rule, but simply seeing 1000+ lines on a PR is pretty threatening as a reviewer.

even within enterprise level large projects, massive PRs are rare unless there's a sweeping feature change or a complete refactor, as they always introduce exponentially larger review overheads, bug risks, rollback costs, etc. Even with new features it is not super common to see PRs approaching 1000 and above, and that's including the tests and documentations. Also remember that these repos are generally hosted with an extensive suite of CI/CD tools that make sure the codes are healthy. (which we don't currently have)

at present this PR is 1000 lines, but with tests and documentations it might approach 2000. that can turn into hours of reviewing and still missing bugs and edge cases. i had a similar pr that grew uncontrollably, and ended up having to chop it up into 5 different PRs so that it's somewhat reviewable.

as @sneakers-the-rat can attest, i often fail to keep things nicely packaged for reviewers and future me as well. but from what i've learned, it seems like it's always recommended to keep things small, nimble and modular. do as i say not as i do lmao

sneakers-the-rat · 2025-01-14T02:51:59Z

I commented in this thread last week that I'll change it to ready for review and ping you. It's not like I'm bothered or anything, but I do prefer using the draft as draft so we can discuss experimental prototypes before thinking about formatting or structure.

no problem! i was just getting started working on mio again and wanted to catch up here, draft status totally understood, giving my feedback that hopefully might help with some ideas on different ways to finalize it :)

and Raymond super appreciate the advocacy of small PRs. There are a few gigantic refactors that need to get done so i'll be guilty of a few very large PRs myself here soon enough, but once we get into a roughly stable code structure we should be able to iterate much more cleanly :).

t-sasatani · 2025-01-14T06:54:08Z

Ok I'll try to keep these short, though I'll probably violate it sometimes. It might be easier for me to overlook review comments for draft PRs, and sry for that in advance.

Co-authored-by: Jonny Saunders <[email protected]>

t-sasatani · 2025-01-15T14:27:32Z

tests/test_models/test_model_process.py

+from ..conftest import DATA_DIR, CONFIG_DIR
+
+def test_interactive_display_config():
+    config = DenoiseConfig.from_id("denoise_test").interactive_display


@sneakers-the-rat, it might be in docs somewhere, but could you let me know how to find these configurations in the test dir? I'm putting it into the same directory as the test config for stream DAQ, but getting this. Do you need some registering or anything?

No config with id denoise_test found in /path/to/somewhere/Application Support/mio/config

Just leaving a note here so I don't forget I'm skipping these config tests for now.

In general I don't think we need these kinda tests (if we wanted to ensure that these values don't change, we could just copy it to tests and assert that the copy in tests was equal to the one in the package), but the fact that it isn't being found would be a bug with the config discovery methods or how we are monkeypatching them to include the tests/data/config directory during testing. i'll check this out in the morning

t-sasatani · 2025-01-15T16:38:52Z

I'm switching this to ready for review as I'm done for today.
I'm sorry this is becoming gigantic, and it might be painful to rebase @MarcelMB 's branch...

It's ready structure-wise, and I changed the "patch" frame function to "drop" frames and added a tracking/export thing.

I haven't gone through the review comments yet, more documents/tests need to be completed, and the interfaces, like the naming of the export paths, are unsettled, so I'll return to them later this week. In case anyone wants to take over or make a significant change in the meantime, feel free to do so.

sneakers-the-rat · 2025-01-16T06:58:35Z

mio/process/frame_helper.py

+        return mask
+
+
+class ZStackHelper:


i like this idea :)

sneakers-the-rat · 2025-01-16T07:01:19Z

mio/process/frame_helper.py

+        self.height = height
+        self.width = width
+
+    def split_by_length(self, array: np.ndarray, segment_length: int) -> list[np.ndarray]:


something like this seems pretty general purpose, but we don't have a good place to put 'general array manipulation functions' like a utils junk drawer would be. i like this helper class idea, bundling some operations together with some staticmethods. i get where this is coming from being split off from the previous implementation, so i'm fine with this, but in the future i might want to play with this helper class idea, make it its own module, and make some generic "ArrayHelper"/"BufferHelper" kind of classes

t-sasatani force-pushed the feat-preprocess branch 3 times, most recently from 1385223 to bc9a268 Compare December 7, 2024 01:05

t-sasatani added 19 commits December 10, 2024 16:58

Add: initial video preprocessing

372c9bd

Add: noise detection, freq filter, min projection

df49809

Add: stack editing class, chunked error detection units

46da23f

Restructure, format, cli

8084003

Model for frame data handling

b50c452

update CLI

9eef110

Add user directory, ignore logs

d2df326

Move out videowriter from stream class/add videoreader class

a491803

Push modules into associated classes

dadef1b

configure denoise with yaml file

846fd5c

add .gitkeep to gitignore

2becfa6

add video export method to NamedFrame model

f5c8edd

docs: error handling for plt import, check plt inside modules

c937b80

Fix config, add start/end of display

44d0613

Interface each processing method with pydantic models

95eeb1a

Fix optional stuff

6436633

Move stuff in correct place for rebasing

36b0167

Fix imports for rebasing

dc0cea3

Add id to denoise config, formatting

87e8a44

t-sasatani force-pushed the feat-preprocess branch from f051300 to 87e8a44 Compare December 11, 2024 01:18

allow endless file input

9cd2bc3

raymondwjang reviewed Jan 13, 2025

View reviewed changes

sneakers-the-rat requested changes Jan 14, 2025

View reviewed changes

t-sasatani and others added 2 commits January 15, 2025 15:56

Apply suggestions from code review

4f247b3

Co-authored-by: Jonny Saunders <[email protected]>

Get rid of random gitignore

77da3b8

t-sasatani force-pushed the feat-preprocess branch from af4feac to 41e43b6 Compare January 15, 2025 10:51

Make base class for video processing, change noise masking to subclass

80f3a4f

t-sasatani force-pushed the feat-preprocess branch from 41e43b6 to 80f3a4f Compare January 15, 2025 10:54

Move some class attributes to property

10ab69b

t-sasatani force-pushed the feat-preprocess branch from 173fea0 to 10ab69b Compare January 15, 2025 11:18

t-sasatani added 4 commits January 15, 2025 20:35

Freq mask processor go to subclass

f21e965

Organize export logics, isolate helpers

94ab9ae

Adjust to new config system, initiate testing

4973274

Add some tests for models

4af73d5

t-sasatani commented Jan 15, 2025

View reviewed changes

t-sasatani added 3 commits January 15, 2025 23:32

Formatting, get rid of config tests for now

4070146

Simplify NamedFrame model

2d7c61f

Drop, track, and export noisy (broken) frames

90b844e

t-sasatani marked this pull request as ready for review January 15, 2025 16:36

Change denoise_run to independent fuction

4ed030d

sneakers-the-rat reviewed Jan 16, 2025

View reviewed changes

mio/process/frame_helper.py

return mask

class ZStackHelper:

Copy link

Collaborator

sneakers-the-rat Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this idea :)

sneakers-the-rat reviewed Jan 16, 2025

View reviewed changes

Merge branch 'main' into feat-preprocess

11f0c19

		@@ -19,6 +19,107 @@
		from mio.types import ConfigSource


		class VideoWriter:

		!user_dir/.gitkeep

		~/.config/miniscope_io/logs/

		.pdm-python
		user_dir/*

Add video preprocessing (denoising) feature #83

Are you sure you want to change the base?

Add video preprocessing (denoising) feature #83

Conversation

t-sasatani commented Dec 7, 2024 • edited Loading

Processing features

Interface

Minor changes

coveralls commented Dec 7, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12208523886

Details

💛 - Coveralls

sneakers-the-rat commented Dec 10, 2024

coveralls commented Dec 11, 2024 • edited Loading

MarcelMB commented Jan 8, 2025

sneakers-the-rat commented Jan 8, 2025

t-sasatani commented Jan 8, 2025

MarcelMB commented Jan 10, 2025

raymondwjang Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-sasatani Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

sneakers-the-rat left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-sasatani Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raymondwjang commented Jan 14, 2025 • edited Loading

t-sasatani commented Jan 14, 2025

t-sasatani commented Jan 14, 2025

raymondwjang commented Jan 14, 2025 • edited Loading

sneakers-the-rat commented Jan 14, 2025 • edited Loading

t-sasatani commented Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-sasatani commented Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-sasatani commented Dec 7, 2024 •

edited

Loading

coveralls commented Dec 7, 2024 •

edited

Loading

coveralls commented Dec 11, 2024 •

edited

Loading

raymondwjang Jan 13, 2025 •

edited

Loading

t-sasatani Jan 15, 2025 •

edited

Loading

sneakers-the-rat left a comment •

edited

Loading

t-sasatani Jan 15, 2025 •

edited

Loading

raymondwjang commented Jan 14, 2025 •

edited

Loading

raymondwjang commented Jan 14, 2025 •

edited

Loading

sneakers-the-rat commented Jan 14, 2025 •

edited

Loading

t-sasatani commented Jan 14, 2025 •

edited

Loading

t-sasatani commented Jan 15, 2025 •

edited

Loading