Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple video backends (OpenCV sometimes drops frames resulting in correct timecodes) #213

Closed
elxy opened this issue Apr 9, 2021 · 7 comments

Comments

@elxy
Copy link

elxy commented Apr 9, 2021

Description of Problem & Solution
I want to use the FrameTimecode to instruct ffmpeg process. But the FrameTimecode is different with ffmpeg.
For belowing media, the first 2 scenes detected of command scenedetect -i Blossoms_at_the_Basin.mp4 detect-content list-scenes -n save-images is:

-----------------------------------------------------------------------
 | Scene # | Start Frame |  Start Time  |  End Frame  |   End Time   |
-----------------------------------------------------------------------
 |      1  |           0 | 00:00:00.000 |         462 | 00:00:19.269 |
 |      2  |         462 | 00:00:19.269 |         635 | 00:00:26.485 |

But the actual end frame number of scene 1 is 508 (start from 0), not 462. Look this:

scenedetect.jpg

I think the reason is that VideoCapture has dropped frames. I suggest to use PyAV to read frame. Because PyAV can decode frame with index and pts props.

Media Examples:

Blossoms_at_the_Basin.mp4 is the 4K format of https://www.youtube.com/watch?v=WzD_PREISiM

Proposed Implementation:

Here is a demo to read frames with PyAV:

import sys

import av
import cv2
import numpy

from scenedetect.video_manager import compute_downscale_factor


class Video():
    def __init__(self, video):
        self.video = video
        self.container = av.open(video)

        self.stream = self.container.streams.video[0]
        self.width = self.stream.codec_context.width

        def _get_frame_rate(stream: av.video.stream.VideoStream):
            if stream.average_rate.denominator and stream.average_rate.numerator:
                return float(stream.average_rate)
            if stream.time_base.denominator and stream.time_base.numerator:
                return 1.0 / float(stream.time_base)
            else:
                raise ValueError("Unable to determine FPS")

        self.frame_rate = _get_frame_rate(self.stream)

    def frames(self):
        for frame in self.container.decode(video=0):
            yield frame.index, frame.to_ndarray(format='bgra')



def compute_delta_hsv(i1, i2):
    i1_hsv = cv2.split(cv2.cvtColor(i1, cv2.COLOR_BGR2HSV))
    i2_hsv = cv2.split(cv2.cvtColor(i2, cv2.COLOR_BGR2HSV))
    delta_hsv = [0, 0, 0, 0]
    for i in range(3):
        num_pixels = i1_hsv[i].shape[0] * i1_hsv[i].shape[1]
        i1_hsv[i] = i1_hsv[i].astype(numpy.int32)
        i2_hsv[i] = i2_hsv[i].astype(numpy.int32)
        delta_hsv[i] = numpy.sum(numpy.abs(i1_hsv[i] - i2_hsv[i])) / float(num_pixels)
    return sum(delta_hsv[0:3]) / 3.0


video = Video(sys.argv[1])
threshold = 30.0
factor = compute_downscale_factor(video.width)

last_frame = None
for index, frame in video.frames():
    frame = frame[::factor, ::factor, :3]
    if last_frame is None:
        last_frame = frame
        continue
    hsv = compute_delta_hsv(last_frame, frame)
    if hsv >= threshold:
        print(index)
    last_frame = frame
@Breakthrough
Copy link
Owner

Breakthrough commented Apr 9, 2021

This seems like a good approach, and may solve some other issues (e.g. #93). I need to learn a bit about the overall API to make it compatible with the VideoManager object, e.g. getting the aspect ratio, but it definitely seems feasible (or pass to the VideoManager constructor if you want to use a cv2.VideoCapture or av.video.stream.VideoStream).

Is VideoCapture not using ffmpeg on your system, or using a different version? I'm curious as to why this occurs.

Very interesting, and thank you for the code sample!

Edit: It may be worth supporting several backends for video input such as decord and pass this as a command line parameter.

@elxy
Copy link
Author

elxy commented Apr 9, 2021

Is VideoCapture not using ffmpeg on your system, or using a different version? I'm curious as to why this occurs.

VideoCapture also use ffmpeg (libav) as backend on my system, but I have no idea why VideoCapture lost frames.

I had noticed PyAV because of the slow speed of VideoCapture.seek(). PyAV might be helpful to speedup VideoManager.seek().

And, there is one more suggetion I want to give to. With PyAV, SceneDetector could detect more accurately/quickly by using the keyframe property. And it may be possible to control split accuration without re-encode (like this).

@Breakthrough
Copy link
Owner

Breakthrough commented May 29, 2021

TODO: Add a command line argument to expose the requested video input library. The current plan will be to default to PyAV, if installed, otherwise fall back to OpenCV. Will create a separate issue for supporting any other requested IO backends.

It may also be possible to use PyAV directly for re-encoding videos, rather than invoking ffmpeg by command line. One major advantage of that approach would be that it could avoid passing timestamps to an external tool, ensuring everything lines up frame-by-frame.

This also may influence how FrameTimecodes work - in particular, different backends could theoretically use different objects that have different representations. For now though will probably use what you posted above as a basis to start the transition.

@Breakthrough Breakthrough changed the title FrameTimecode is wrong due to dropping frame Support multiple video backends (OpenCV sometimes drops frames resulting in correct timecodes) May 29, 2021
@Breakthrough
Copy link
Owner

@elxy did you download the video using youtube-dl? I'll try that on my end when I get the new backend working, but was hoping you could share the exact video format you downloaded (or exact commands you used with youtube-dl).

I plan on starting this as the first major task for the v1.0 refactor as this should also resolve several other linked issues.

@elxy
Copy link
Author

elxy commented Aug 22, 2021

@elxy did you download the video using youtube-dl? I'll try that on my end when I get the new backend working, but was hoping you could share the exact video format you downloaded (or exact commands you used with youtube-dl).

I plan on starting this as the first major task for the v1.0 refactor as this should also resolve several other linked issues.

I had downloaded 4K format of https://www.youtube.com/watch?v=WzD_PREISiM with youtube-dl. I just checked that it's 4K format is 313 (webm vp9).

@Breakthrough
Copy link
Owner

Open items before v0.6 release discovered in #262:

  • Fix video length calculations (sometimes reports 0)
  • Ensure image sequences are rejected by the CLI when specifying PyAV backend
  • Create new issue to use multi-threaded decoding, may be addressed for v0.6.1

@Breakthrough
Copy link
Owner

Complete in v0.6-beta3 including multithreaded decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants