[Bugfix] Fix base64 JPEG video frames returning empty metadata#37301
[Bugfix] Fix base64 JPEG video frames returning empty metadata#37301Isotr0py merged 4 commits intovllm-project:mainfrom
Conversation
When passing base64-encoded JPEG frames via `video/jpeg` media type, `VideoMediaIO.load_base64` returned an empty metadata dict. Downstream code tries to construct `VideoMetadata(**metadata)` which fails with: VideoMetadata.__init__() missing 1 required positional argument: 'total_num_frames' Populate the metadata dict with total_num_frames, fps, duration, video_backend, and frames_indices — same fields that other video loaders return via create_hf_metadata(). Fixes vllm-project#37274 Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the bug where base64-encoded JPEG video frames were returning empty metadata, leading to a VideoMetadata initialization error. The fix correctly populates the metadata dictionary with essential fields such as total_num_frames, fps, duration, video_backend, frames_indices, and do_sample_frames, ensuring compatibility with downstream video processing components. This is a critical improvement for the robustness of video input handling.
| metadata = { | ||
| "total_num_frames": total, | ||
| "fps": fps, | ||
| "duration": duration, | ||
| "video_backend": "jpeg_sequence", | ||
| "frames_indices": list(range(total)), | ||
| "do_sample_frames": False, | ||
| } | ||
| return frames, metadata |
There was a problem hiding this comment.
The previous implementation returned an empty dictionary for metadata, which caused a runtime error when transformers.video_utils.VideoMetadata was initialized. This change correctly populates the metadata with all necessary fields, resolving the total_num_frames missing argument issue and ensuring proper video metadata propagation. This is a critical fix for the functionality of base64 JPEG video frame processing.
|
Hi @universeplayer, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
| return np.stack( | ||
| [np.asarray(load_frame(frame_data)) for frame_data in data.split(",")] | ||
| ), {} |
There was a problem hiding this comment.
Can you add a regression test at tests/multimodal/media/test_video.py?
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
|
Hi @universeplayer, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…project#37301) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Problem
Sending base64-encoded JPEG frames as video input returns a 400 error:
VideoMediaIO.load_base64handles thevideo/jpegmedia type by decoding individual JPEG frames and stacking them, but returns an empty metadata dict{}. Downstream code passes this totransformers.video_utils.VideoMetadata(**metadata)which requirestotal_num_framesas a positional argument.Root Cause
In
vllm/multimodal/media/video.py:83-85:All other video loading paths (via
load_bytes→video_loader.load_bytes) return properly populated metadata throughcreate_hf_metadata(), but the base64 JPEG path was missed.Fix
Populate the metadata dict with the same fields other loaders return:
total_num_frames: frame count from the stacked arrayfps: from request kwargs (default 1)duration: computed from total_num_frames / fpsvideo_backend:"jpeg_sequence"to distinguish from other backendsframes_indices:[0, 1, ..., N-1]since all frames are used directlydo_sample_frames:Falsesince frames are pre-extracted by the clientTest plan
VideoMetadataexpectsFixes #37274