Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some H265 encoded videos return an error when seeking to particular points in time #179

Closed
ahmadsharif1 opened this issue Aug 13, 2024 · 1 comment

Comments

@ahmadsharif1
Copy link
Contributor

🐛 Describe the bug

# First generate a test video:
conda install -c conda-forge x265

# Download and build ffmpeg
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --enable-nonfree --enable-gpl --prefix=$(readlink -f ../bin) --enable-libx265  --enable-rpath --extra-ldflags=-Wl,-rpath=$CONDA_PREFIX/lib --enable-filter=drawtext --enable-libfontconfig --enable-libfreetype --enable-libharfbuzz
ffmpeg -f lavfi -i color=size=128x128:duration=1:rate=10:color=blue -vf "drawtext=fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" -vcodec libx265 -pix_fmt yuv420p -g 2 -crf 10 h265_video.mp4 -y

# Now use torchcodec to seek into this file at timestamp 0.5 and write to a bmp file:
$ cat test.py

from torchcodec.decoders._simple_video_decoder import SimpleVideoDecoder
import sys
from PIL import Image

# Assume `rgb_tensor` is your PyTorch tensor with shape (3, H, W)
# The values in `rgb_tensor` should be in the range [0, 1]
def save_tensor_as_bmp(tensor, filename):
    # Convert the tensor to a numpy array
    numpy_array = tensor.mul(1).byte().cpu().numpy()

    # Reorder dimensions from (3, H, W) to (H, W, 3)
    numpy_array = numpy_array.transpose(1, 2, 0)

    # Create a PIL image from the numpy array
    image = Image.fromarray(numpy_array)

    # Save the image as a BMP file
    image.save(filename, format='BMP')



def main():
    video_path = sys.argv[1]
    ts = float(sys.argv[2])
    print(video_path)
    decoder = SimpleVideoDecoder(video_path)
    print(f"Getting frame at {ts=}")
    frame = decoder.get_frame_displayed_at(seconds=ts).data
    bmp_file = f"{video_path}.time{ts}.bmp"
    print(f"Saving to bmp file: {bmp_file}")
    save_tensor_as_bmp(frame, bmp_file)


if __name__ == "__main__":
    main()

# Run the test script like so:

python test.py h265_video.mp4 0.5

This actually fails right now (it throws an exception "no more frames to decode").

With #178 it will get "fixed" in the sense that at least we wont throw an exception, but we will return the wrong frame. i.e. if you run it you will get a bmp file with "Frame 6" instead of "Frame 5". That is a bug because the frame with "Frame 5" is the one that is displayed at timestamp=0.5 (inclusive) to timestamp=0.6 (exclusive).

The underlying cause of this buggy behavior is an FFMPEG bug with H265 videos. When we call avformat_seek_file(), with a max_ts set to an int64 timebase value corresponding to time=0.5, it seeks past our frame to the next frame.

I have filed a bug upstream about this:

https://trac.ffmpeg.org/ticket/11137

Until that bug is resolved, what we can do is to use our own index to seek into the file as opposed to letting FFMPEG seek for us. I will do that in a subsequent PR.

Versions

This bug is for torchcodec v0.0.2

@ahmadsharif1
Copy link
Contributor Author

This issue is fixed for the test cases I tried.

I still believe this is a bug in FFMPEG and I have filed a bug on their tracker. But for now we work around this bug by using our own keyframe index to always seek to a keyframe that is the last keyframe before the user-requested timestamp. (Previously we were seeking to the user-requested timestamp).

FFMPEG seems to respect max_ts when seeking to keyframes. The documentation doesn't say anything about that, but the documentation is not clear here anyway.

https://ffmpeg.org/doxygen/7.0/group__lavf__decoding.html#ga3b40fc8d2fda6992ae6ea2567d71ba30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant