Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce false positives in ContentDetector due to camera movement #153

Closed
scarwire opened this issue Feb 10, 2020 · 13 comments · Fixed by #198
Closed

Reduce false positives in ContentDetector due to camera movement #153

scarwire opened this issue Feb 10, 2020 · 13 comments · Fixed by #198
Labels
Milestone

Comments

@scarwire
Copy link
Contributor

scarwire commented Feb 10, 2020

Hi there! First of all, thanks for all your work on this clean, fast, and extremely promising module.

The current algorithm for detecting fast cuts with ContentDetector is already quite reliable, but looking through the raw data to find the correct threshold value can be tiresome, and lowering the threshold can sometimes produce false positives, especially in clips where there's no cut but there is fast camera movement. Camera movement can produce a series of frames with high content_val, and if any of the frames go over the threshold, it's interpreted as a cut. You can try with these examples (1, 2) that I've taken from an old film I'm studying. There are real cuts that have a content_val of only 9 or 10 (file 2, frame 1172, 00:46), but also false positives with content_val greater than 10 (file 1, frame 403, 00:16).

One possible way to deal with this would be to look for fast peaks in the amount of change -- peaks that can't last for more than a frame or two. Once we've calculated the content change for all the frames in the clip, we can then calculate the ratio of each frame's content_val to the content_val of the surrounding frames. In pseudocode:

metathreshold = 3
cut_list = []
while i < len(frames):
    content_val_ratio = content_val(frames[i])
        / (content_val(frames[i - 2]) 
        + content_val(frames[i - 1])
        + content_val(frames[i + 1])
        + content_val(frames[i + 2])) / 4)
    if (content_val_ratio >= metathreshold and content_val(frames[i]) >= 5):
        cut_list.append(i)
    i += 1

From the tests I've done, the "metathreshold" value of 3 (mark a frame as a cut if its content-val is more than three times the content-val of its neighbors) and the baseline content-val threshold of 5 (still, don't mark anything as a cut if its content-val is less than 5) seem like sane defaults. I think they're easier to pick out than the current threshold value, but this is open to discussion and more testing.

I've been messing around with the code and come up with a way of implementing this as a SceneDetector object using the model you've already created. However, the current API calculates HSV values one by one as each frame is passed to the detector, so I also added a second post-processing step in scene_manager.py after all those values have been stored. This step calculates the running average of change in surrounding frames and produces a list of cuts based on this. There are probably other ways of doing this, though.

Let me know if you think this is useful. If you're interested, I'll make a pull request.

@wjs018
Copy link
Collaborator

wjs018 commented Feb 24, 2020

I think that this kind of approach could be valuable as fast camera movement leading to anomalous scene detection is something I have come across many times before. The way the code is written now, there is a post_process function in the generic scene_detector class (link). Some detector types use this (threshold and motion) while others don't (content).

So, if I am understanding the process correctly, the detector method you are discussing would operate like this:

  1. The process_frame function does not actually add anything to the cut list as it does its analysis of frames, but instead is just used to build the stats for the video. If a stats file was provided, this could basically be skipped over altogether.
  2. When the last frame has been input, the post_process function runs over the compiled stats for the videos and, using the method you describe, chooses the frames to mark as scene boundaries, adding them to the cut list.

Without actually trying to implement this I see a one thing that would need to be considered. Specifically, that the stats for the whole video would need to be accessible to the post_process function. This probably means saving them as a class variable in the detector class (potentially a memory issue for very long videos). Alternatively, perhaps a stats_file could be part of the __init__ constructor for this detector class and then the post_process function could parse this file.

Something else worth thinking about (perhaps @Breakthrough has thoughts on this) would be if this is the desired approach overall with PySceneDetect. What I mean is that would this kind of post-processing be better performed as part of a SceneDetector object, or would it be better to just make a function to directly analyze a stats file, after it has been analyzed with an existing detector. I would potentially lean towards this second option as, under the hood, the proposed method is still using HSV values just like content_detector, but with a different interpretation of what constitutes a detected scene.

@scarwire
Copy link
Contributor Author

scarwire commented Feb 26, 2020

Yeah, that's basically my idea. The only problem is that the current post_process function takes a single frame as an argument, and if it detects a cut, that cut is added to the existing list. I don't want to mess with that function since it's used by the threshold and motion detectors. The new post-processing function would return a complete new cut list.

I've forked the repository and created a branch with a (still somewhat messy) implementation of this if anyone is interested.

@wjs018
Copy link
Collaborator

wjs018 commented Feb 26, 2020

Thanks for sharing the code! I think this is a first good pass at the issue. I will take a closer look soon to better integrate this into the rest of PySceneDetect (docs, cli, etc.). First thoughts are that it would be possible to use the existing post_process function, but make the video_manager need to be specified in the __init__ function when the detector is made. This would mean that the post_process function could just refer to the video manager that was specified when the SmartContentDetector was initialized.

One other thing that should be called out in the final version of this is that the stats_manager is required for this detector when the SceneManager is constructed. Currently, it is listed as an optional argument, but this kind of post-processing would need it.

I will take a look at these when I get a chance.

@wjs018
Copy link
Collaborator

wjs018 commented Mar 2, 2020

@scarwire I worked some more on your branch and renamed it to AdaptiveContentDetector since its sensitivity should adapt to the neighboring frames around it. Feel free to check it out here and let me know how it performs on your example cases to make sure functionality is retained. I was testing it on some of my videos, but just want to make sure it works for more than just me.

A summary of the changes I made to your implementation to try to generalize it:

  • I made the video_manager be specified in the constructor of the AdaptiveContentDetector
  • I brought all the analysis and cut detection logic into the post_process function
  • I implemented a check in the cli interface so that an error is thrown if a stats manager isn't used
  • I made sure that the scene detection respects the min_scene_len parameter
  • I added the min_delta_hsv parameter, which was previously hard coded as 5.0
  • I also added the window_width parameter which specifies the number of frames previous and after the current frame over which to do averaging. This was previously hard coded as 2

@Breakthrough
Copy link
Owner

Breakthrough commented Jun 29, 2020

Hi @scarwire;

Would you be able to provide updated links to the video files, or attach them to this issue?

I'd like to start expanding the test cases PySceneDetect has with more real-world media such as this to help future development, and prevent regressions when such features are implemented.

This is definitely something I would like to consider adding in a future version of PySceneDetect (post-v0.6 most likely), as automatic threshold determination is a highly sought-after feature. It sounds like this approach might drive some underlying API changes, which I am more than happy to support after the upcoming v0.6 release (the primary goals of which are suppression of camera flashes and quality-of-life improvements).

Thank you (and @wjs018!!) for your research, methods, and implementation, much appreciated. Looking forwards to getting this cleaned up and merged!

@scarwire
Copy link
Contributor Author

scarwire commented Jul 8, 2020

Hi there, and sorry for taking so long to get back to you.

Here are updated links for the two examples I used for testing. The new detector gets more accurate results with them, but it's still not perfect.

Thanks so much for taking up this idea and also thanks again to @wjs018 for refactoring and cleaning up my code!

@enginefeeder101
Copy link

Hej there,

This is exactly what i needed. I had to index 30 year old home video and this works like a charm. I updated @wjs018 fork against @Breakthrough current master, merged without conflicts and results are pretty good. Thank you!

@Breakthrough
Copy link
Owner

Awesome - would you be willing to create a pull request with the detector, @wjs018? Not sure the state of the code but it looks like a very interesting approach!

@wjs018
Copy link
Collaborator

wjs018 commented Jan 12, 2021

Brought my branch up to date with master and made sure nothing broke. As mentioned in #198, I haven't really tested this other than to make sure that things ran smoothly and integrated nicely with other parts of the software (such as cli).

@Breakthrough
Copy link
Owner

Breakthrough commented Jan 16, 2021

Awesome, thanks so much for the PR - I'll try to get that merged as soon as possible, I want to release v0.5.5 before merging it in however, as this will also drive some changes to v0.6 as you mentioned (re: post process). I also want to try and reduce code duplication between this and detect-content somehow, but can figure that out after it's merged.

I wanted to ask you all for your thoughts on:

  1. In terms of the command line, should this be a new command, or should it just be additional arguments to detect-content? I was thinking to add it as an option to enable the post-processing stage. Another option would be to add this as a separate command after detect-content to perform some post-processing.
    I ask this mostly because I was planning on adding flash suppression to detect-content (and is compatible with both detectors), and it looks like most of the work for this detector happens in post_process. Thus I thought that perhaps it makes sense to conditionally add that behavior rather than using a separate command (activating it with a flag, which would also reset the threshold default).

  2. Just curious, why is min_hsv_delta so low as a default? Sorry I haven't tested this on any material yet, planning on doing so as soon as possible though, so it may not be an issue. At a glance though it seemed rather low compared to the current default, and some of the source material is greyscale (leading to lower than expected delta HSV values due to no change in hue/saturation).

  3. As @wjs018 wrote above:

What I mean is that would this kind of post-processing be better performed as part of a SceneDetector object, or would it be better to just make a function to directly analyze a stats file, after it has been analyzed with an existing detector. I would potentially lean towards this second option as, under the hood, the proposed method is still using HSV values just like content_detector, but with a different interpretation of what constitutes a detected scene.

What are your thoughts on the best way to proceed with post_process()? I agree the current format isn't that useful - does it make sense to pass the existing cutting list to it, or nothing at all? If the API was modified to suit these needs, then I think the existing implementation might be able to be merged with ContentDetector. In my head I feel like the post_process function should still go with the detector itself, but if you think it should be a separate object entirely, I'm open to that as well (e.g. some kind of filter object). If that's the case though, I would probably prefer to add it as a separate command that gets run after detect-content.


Thank you so much for this everyone (esp. @wjs018 and @scarwire), looking forward to getting it merged in.

@Breakthrough Breakthrough changed the title Camera movement produces false positives in ContentDetector Reduce false positives in ContentDetector due to camera movement Jan 16, 2021
@Breakthrough Breakthrough added this to the v0.5.6 milestone Jan 17, 2021
@Breakthrough Breakthrough linked a pull request Jan 22, 2021 that will close this issue
@Breakthrough
Copy link
Owner

TODO: In regards to the CLI, I think the new detector is different enough in functionality that it should be a separate command. I'm thinking detect-adaptive to make it follow convention. After further review of the new detector most of my above questions have been resolved, although I definitely am still looking for any feedback on the third point / any potential API implications this might have for the v0.6 refactor.

This has been merged into the v0.5.6 branch right now though, and will definitely be in the next release of PySceneDetect :)

@Breakthrough
Copy link
Owner

Breakthrough commented Feb 21, 2021

@scarwire could you let me know some more details about the video, or is there an IMDB page I could reference? I'd like to add the clips you shared as a test case, but need to work out the copyright details. Thanks!

Also the defaults seem to be a bit too sensitive for colour video - I realize now, that an option should be provided to use either ONLY the change in luminance/intensity, both for ContentDetector, as well as AdaptiveDetector, as the change in hue/saturation will always be neglegible/zero for greyscale material (thus those frame scores are always 1/3 of what they would be for colour material).

My initial thought is to multiply your default threshold by 3 to compensate for this, and allow an additional flag indicating if the source material is greyscale or not (thus normalizing the approximate threshold difference between greyscale and colour source material). Does this sound reasonable?

Breakthrough added a commit that referenced this issue Feb 21, 2021
Refactor metric key implementation accordingly. #153
Breakthrough added a commit that referenced this issue Feb 21, 2021
Rename AdaptiveContentDetector to AdaptiveDetector. (#153)
@Breakthrough
Copy link
Owner

Breakthrough commented Mar 8, 2021

Woo, thanks again everyone! There's now some test cases in the repo for the detector, as well as better integration overall. This detector definitely will influence the API direction for v0.6, as there's a few circular dependencies I want to see if they can be sorted out better (specifically, I think the detection algorithms need some kind of factory to disallow construction without a stats manager). I've also removed the requirement on a statsfile by making the SceneManager implicitly create a StatsManager for any detector that requires one, even if the user doesn't provide one (it will simply be discarded when the program closes).

You can also use detect-adaptive with a statsfile created by using detect-content and still gain a speedup. Additionally, changing the window size or toggling luma-only mode will generate a new column in the statsfile.

Lastly, I've multiplied the default threshold by 3, and added a new -l flag to both content and adaptive detectors which only looks at luma information rather than all of HSL/HSV as was previously done. When using greyscale material, specify the -l flag so that the threshold value will be equivalent to the same material in RGB. I tested it on my end and get equivalent results to the original threshold when specifying this flag.

I'm not sure if there's a reliable automatic way to detect greyscale inputs, so will leave it as an explicit flag for now. If anyone sees value in it, feel free to file a feature request.

This shall now be included in the next release of PySceneDetect v0.5.6 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants