Skip to content

[perf] boosting video decoding performance by 1.3~2x with opencv as decoding backend#13565

Closed
WingEdge777 wants to merge 7 commits intosgl-project:mainfrom
WingEdge777:opt.video_cv2_decoding
Closed

[perf] boosting video decoding performance by 1.3~2x with opencv as decoding backend#13565
WingEdge777 wants to merge 7 commits intosgl-project:mainfrom
WingEdge777:opt.video_cv2_decoding

Conversation

@WingEdge777
Copy link
Contributor

@WingEdge777 WingEdge777 commented Nov 19, 2025

Motivation

We conducted some video decoding benchmark tests, and the results indicate that there always exists a large performance gap between decord2 and OpenCV in the sparse/random video frame sampling scenario. OpenCV could be nearly 2x faster than decord2 in some cases.

Modifications

Thus, we should or could at least use OpenCV as one optional backend for video decoding. Use an env as a switcher for now.

benchmark

test_data: here

benchmark code
import time
from decord import VideoReader, cpu, gpu
import cv2
import numpy as np
from tqdm import tqdm

from typing import List, Any

def list_to_markdown_table(data_list: List[List[Any]]) -> str:
    """
    
    Args:
        data_list: two dimensions list, e.g. [[Header1, Header2], [Data1, Data2], ...]
        
    Returns:
        Markdown str
    """
    if not data_list or not data_list[0]:
        return "Warning: Input list is empty."

    header = data_list[0]
    data_rows = data_list[1:]
    
    num_columns = len(header) + 1

    header_line = "| sampling fps | " + " | ".join(str(h) for h in header) + " |"

    separator_line = "| " + " | ".join(["---"] * num_columns) + " |"

    data_lines = []
    for cv, de in zip(data_rows[0], data_rows[1]):
        for row in [cv, de]:
            data_line = "| " + " | ".join([str(item) for item in row]) + " |"
            data_lines.append(data_line)

    markdown_table = [header_line, separator_line] + data_lines
    return "\n".join(markdown_table)



def benchmark():
    try:
        from decord.bridge import decord_bridge

        ctx = gpu(0)
        _ = decord_bridge.get_ctx_device(ctx)
    except Exception:
        ctx = cpu(0)
    print("Using context:", ctx)

    test_files = ["test_2k.mp4", "test_1080p.mp4", "test_720p.mp4", "test_480p.mp4", "test_1080p_mobile.mp4", "test_720p_mobile.mp4", "test_480p_mobile.mp4"]

    sampling_rate = [1, 3, 5, 10, 120]
    cv_cost, decord_cost = [[]for i in range(len(test_files))], [[]for i in range(len(test_files))]
    for x, file in enumerate(tqdm(test_files)):
        cv_cost[x].append("cv_"+file)
        decord_cost[x].append("de_"+file)

        # use opencv get video meta
        vc = cv2.VideoCapture(file)
        total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = vc.get(cv2.CAP_PROP_FPS)
        width = int(vc.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))
        # print(video_fps, width, height, file)
        vc.release()
        
        # run benchmark
        for fps in sampling_rate:
            if fps > video_fps:
                fps = video_fps
            nframes = int(total_frames / video_fps * fps)
            frame_idx = np.linspace(0, total_frames - 1, num=nframes, dtype=np.int64)
            frame_idx = np.unique(frame_idx)
            nframes = frame_idx.shape[0]

            cnt = 10
            st = time.time()
            for _ in range(cnt):
                vc = cv2.VideoCapture(file)
                total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
                video_np = np.empty((nframes, height, width, 3), dtype=np.uint8)
                mx_idx = min(total_frames, max(frame_idx) + 1)
                i = 0
                for idx in range(mx_idx):
                    ok = vc.grab()
                    if not ok:
                        break
                    if idx in frame_idx:
                        ret, frame = vc.retrieve()
                        if ret:
                            video_np[i] = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                            i += 1
            vc.release()
            ed = time.time()
            t = f"{(ed-st) / cnt:.3f}"
            cv_cost[x].append(t)

            # print(video_np.shape)
            # print(i,"cv", cv_cost)


            st = time.time()
            for _ in range(cnt):
                vr = VideoReader(file, ctx=ctx)
                video_np = vr.get_batch(frame_idx).asnumpy()
            ed = time.time()
            t = f"{(ed-st) / cnt:.3f}"
            decord_cost[x].append(t)

            # print(video_np.shape)
            # print(i,"de", decord_cost)
    return [sampling_rate, cv_cost, decord_cost]


if __name__ == "__main__":
    output = list_to_markdown_table(benchmark())
    print(output)

benchmark result

Cases: different sampling fps to full frames extraction;
Time unit: s

result table

Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz + T4

sampling fps 1 3 5 10 120
cv_test_2k.mp4 0.829 0.891 0.953 1.325 3.642
de_test_2k.mp4 1.164 1.591 1.870 2.690 4.610
cv_test_1080p.mp4 1.193 1.311 1.439 2.267 6.244
de_test_1080p.mp4 1.671 2.155 2.555 3.278 9.514
cv_test_720p.mp4 0.694 0.739 0.777 1.075 2.933
de_test_720p.mp4 0.908 1.202 1.365 1.626 4.422
cv_test_480p.mp4 0.408 0.424 0.440 0.513 1.273
de_test_480p.mp4 0.584 0.716 0.791 0.950 1.448
cv_test_1080p_mobile.mp4 0.801 0.952 1.213 2.199 6.224
de_test_1080p_mobile.mp4 1.289 2.023 2.817 4.579 7.533
cv_test_720p_mobile.mp4 0.425 0.492 0.650 1.099 2.891
de_test_720p_mobile.mp4 0.636 0.966 1.308 1.868 3.567
cv_test_480p_mobile.mp4 0.242 0.266 0.319 0.477 1.233
de_test_480p_mobile.mp4 0.348 0.457 0.560 0.789 1.518

AMD EPYC 7W83 64-Core Processor + A10

sampling fps 1 3 5 10 120
cv_test_2k.mp4 0.633 0.666 0.667 0.743 1.608
de_test_2k.mp4 0.862 1.057 1.122 1.235 2.819
cv_test_1080p.mp4 0.919 0.950 0.987 1.237 2.825
de_test_1080p.mp4 1.289 1.527 1.755 2.132 5.056
cv_test_720p.mp4 0.539 0.552 0.573 0.681 1.262
de_test_720p.mp4 0.775 0.863 0.919 1.086 2.348
cv_test_480p.mp4 0.327 0.331 0.343 0.357 0.628
de_test_480p.mp4 0.459 0.533 0.579 0.589 1.032
cv_test_1080p_mobile.mp4 0.586 0.683 0.701 1.086 2.633
de_test_1080p_mobile.mp4 0.923 1.315 1.935 2.634 4.695
cv_test_720p_mobile.mp4 0.332 0.356 0.381 0.477 1.584
de_test_720p_mobile.mp4 0.491 0.605 0.772 1.037 2.099
cv_test_480p_mobile.mp4 0.173 0.183 0.192 0.260 0.571
de_test_480p_mobile.mp4 0.252 0.301 0.328 0.418 1.034

AMD EPYC 9K84 96-Core Processor + L20

sampling fps 1 3 5 10 120
cv_test_2k.mp4 0.583 0.640 0.694 0.955 2.365
de_test_2k.mp4 0.789 1.100 1.412 2.007 3.345
cv_test_1080p.mp4 0.863 0.966 1.078 1.594 4.234
de_test_1080p.mp4 1.131 1.485 1.824 2.620 5.582
cv_test_720p.mp4 0.511 0.541 0.576 0.756 1.985
de_test_720p.mp4 0.652 0.828 0.977 1.285 2.556
cv_test_480p.mp4 0.293 0.300 0.315 0.378 0.830
de_test_480p.mp4 0.398 0.493 0.564 0.714 1.307
cv_test_1080p_mobile.mp4 0.592 0.716 0.891 1.501 4.067
de_test_1080p_mobile.mp4 0.860 1.422 2.057 3.349 6.102
cv_test_720p_mobile.mp4 0.333 0.364 0.427 0.683 1.933
de_test_720p_mobile.mp4 0.444 0.688 0.942 1.357 2.735
cv_test_480p_mobile.mp4 0.174 0.184 0.210 0.335 0.808
de_test_480p_mobile.mp4 0.238 0.324 0.431 0.619 1.337

AMD EPYC 9K84 96-Core Processor + H20

sampling fps 1 3 5 10 120
cv_test_2k.mp4 0.620 0.651 0.661 0.789 1.590
de_test_2k.mp4 0.837 1.030 1.117 1.233 2.922
cv_test_1080p.mp4 0.930 0.941 1.017 1.189 2.783
de_test_1080p.mp4 1.209 1.424 1.675 2.097 5.129
cv_test_720p.mp4 0.529 0.543 0.540 0.610 1.222
de_test_720p.mp4 0.731 0.826 0.893 1.081 2.439
cv_test_480p.mp4 0.328 0.334 0.343 0.350 0.600
de_test_480p.mp4 0.449 0.532 0.575 0.580 0.986
cv_test_1080p_mobile.mp4 0.598 0.665 0.703 1.145 2.804
de_test_1080p_mobile.mp4 0.908 1.334 1.802 2.652 5.105
cv_test_720p_mobile.mp4 0.308 0.339 0.353 0.461 1.256
de_test_720p_mobile.mp4 0.463 0.599 0.753 0.995 2.026
cv_test_480p_mobile.mp4 0.185 0.186 0.191 0.241 0.591
de_test_480p_mobile.mp4 0.267 0.302 0.337 0.439 1.081

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@WingEdge777
Copy link
Contributor Author

@yhyang201 @JustinTong0323 @mickqian @hnyls2002, could you please take a look at this, thanks.

@yhyang201
Copy link
Collaborator

@yhyang201 @JustinTong0323 @mickqian @hnyls2002, could you please take a look at this, thanks.

We are currently reviewing it and setting up the video bench methodology. We will validate this PR and merge it as soon as possible!

@yhyang201
Copy link
Collaborator

Great work, and thank you for your contribution! We will trigger the CI and proceed with the merge.

@WingEdge777
Copy link
Contributor Author

Thank you too! For taking your time reviewing

@WingEdge777 WingEdge777 force-pushed the opt.video_cv2_decoding branch from c3292f3 to 26a66b6 Compare February 8, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants