You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# First generate a test video:
conda install -c conda-forge x265
# Download and build ffmpeg
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
./configure --enable-nonfree --enable-gpl --prefix=$(readlink -f ../bin) --enable-libx265 --enable-rpath --extra-ldflags=-Wl,-rpath=$CONDA_PREFIX/lib --enable-filter=drawtext --enable-libfontconfig --enable-libfreetype --enable-libharfbuzz
ffmpeg -f lavfi -i color=size=128x128:duration=1:rate=10:color=blue -vf "drawtext=fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" -vcodec libx265 -pix_fmt yuv420p -g 2 -crf 10 h265_video.mp4 -y
# Now use torchcodec to seek into this file at timestamp 0.5 and write to a bmp file:
$ cat test.py
from torchcodec.decoders._simple_video_decoder import SimpleVideoDecoder
import sys
from PIL import Image
# Assume `rgb_tensor` is your PyTorch tensor with shape (3, H, W)
# The values in `rgb_tensor` should be in the range [0, 1]
def save_tensor_as_bmp(tensor, filename):
# Convert the tensor to a numpy array
numpy_array = tensor.mul(1).byte().cpu().numpy()
# Reorder dimensions from (3, H, W) to (H, W, 3)
numpy_array = numpy_array.transpose(1, 2, 0)
# Create a PIL image from the numpy array
image = Image.fromarray(numpy_array)
# Save the image as a BMP file
image.save(filename, format='BMP')
def main():
video_path = sys.argv[1]
ts = float(sys.argv[2])
print(video_path)
decoder = SimpleVideoDecoder(video_path)
print(f"Getting frame at {ts=}")
frame = decoder.get_frame_displayed_at(seconds=ts).data
bmp_file = f"{video_path}.time{ts}.bmp"
print(f"Saving to bmp file: {bmp_file}")
save_tensor_as_bmp(frame, bmp_file)
if __name__ == "__main__":
main()
# Run the test script like so:
python test.py h265_video.mp4 0.5
This actually fails right now (it throws an exception "no more frames to decode").
With #178 it will get "fixed" in the sense that at least we wont throw an exception, but we will return the wrong frame. i.e. if you run it you will get a bmp file with "Frame 6" instead of "Frame 5". That is a bug because the frame with "Frame 5" is the one that is displayed at timestamp=0.5 (inclusive) to timestamp=0.6 (exclusive).
The underlying cause of this buggy behavior is an FFMPEG bug with H265 videos. When we call avformat_seek_file(), with a max_ts set to an int64 timebase value corresponding to time=0.5, it seeks past our frame to the next frame.
Until that bug is resolved, what we can do is to use our own index to seek into the file as opposed to letting FFMPEG seek for us. I will do that in a subsequent PR.
Versions
This bug is for torchcodec v0.0.2
The text was updated successfully, but these errors were encountered:
I still believe this is a bug in FFMPEG and I have filed a bug on their tracker. But for now we work around this bug by using our own keyframe index to always seek to a keyframe that is the last keyframe before the user-requested timestamp. (Previously we were seeking to the user-requested timestamp).
FFMPEG seems to respect max_ts when seeking to keyframes. The documentation doesn't say anything about that, but the documentation is not clear here anyway.
🐛 Describe the bug
This actually fails right now (it throws an exception "no more frames to decode").
With #178 it will get "fixed" in the sense that at least we wont throw an exception, but we will return the wrong frame. i.e. if you run it you will get a bmp file with "Frame 6" instead of "Frame 5". That is a bug because the frame with "Frame 5" is the one that is displayed at timestamp=0.5 (inclusive) to timestamp=0.6 (exclusive).
The underlying cause of this buggy behavior is an FFMPEG bug with H265 videos. When we call
avformat_seek_file()
, with a max_ts set to an int64 timebase value corresponding to time=0.5, it seeks past our frame to the next frame.I have filed a bug upstream about this:
https://trac.ffmpeg.org/ticket/11137
Until that bug is resolved, what we can do is to use our own index to seek into the file as opposed to letting FFMPEG seek for us. I will do that in a subsequent PR.
Versions
This bug is for torchcodec v0.0.2
The text was updated successfully, but these errors were encountered: