Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvdec hardware accelerated decoder #3703

Open
totaam opened this issue Dec 6, 2022 · 4 comments
Open

nvdec hardware accelerated decoder #3703

totaam opened this issue Dec 6, 2022 · 4 comments
Labels
encoding enhancement New feature or request

Comments

@totaam
Copy link
Collaborator

totaam commented Dec 6, 2022

Split from #202.
There are other frameworks for hardware acceleration, but nvidia's offering is generally the most stable option.

https://docs.nvidia.com/video-technologies/video-codec-sdk/nvdec-video-decoder-api-prog-guide/

@totaam totaam added enhancement New feature or request encoding labels Dec 6, 2022
totaam added a commit that referenced this issue Dec 6, 2022
totaam added a commit that referenced this issue Dec 6, 2022
totaam added a commit that referenced this issue Jan 16, 2023
totaam added a commit that referenced this issue Jan 16, 2023
totaam added a commit that referenced this issue Jan 16, 2023
@totaam
Copy link
Collaborator Author

totaam commented Jan 17, 2023

Now working for jpeg decoding without OpenGL - which is not very useful since we have nvjpeg (#3504) which does this but better in every way: more lightweight, decodes jpega, with opengl acceleration (decoded image stays on the GPU).
But this is a start, hopefully I can figure out how to enable h264, hevc, vp8, vp9, etc..
Still TODO:

  • fix no device errors when used as a video decoder for jpeg - allocate the CUDA device once in the decode thread?
  • OpenGL - CUDA buffer sharing (very similar to nvjpeg)
  • add scaling to libyuv's NV12 to RGB
  • jpega

totaam added a commit that referenced this issue Jan 18, 2023
totaam added a commit that referenced this issue Jan 19, 2023
paint NV12 in opengl backend (only 'Y' plane for now until I find the correct shader / incantation) from PBOs we create for both planes
@totaam
Copy link
Collaborator Author

totaam commented Jan 19, 2023

It paints... but needs a new shader: this one only paints the Y plane for now. (better than a corrupted screen, or worse.. crashes)
Perhaps something like:
https://git.linuxtv.org/libcamera.git/tree/src/qcam/assets/shader/NV_2_planes_UV_f.glsl?id=9db6ce0ba499eba53db236558d783a4ff7aa3896
explained in Accelerating libcamera Qcam format conversion using OpenGL shaders

/* SPDX-License-Identifier: LGPL-2.1-or-later */
/*
 * Copyright (C) 2020, Linaro
 *
 * NV_2_planes_UV_f.glsl - Fragment shader code for NV12, NV16 and NV24 formats
 */

#ifdef GL_ES
precision mediump float;
#endif

varying vec2 textureOut;
uniform sampler2D tex_y;
uniform sampler2D tex_u;

void main(void)
{
	vec3 yuv;
	vec3 rgb;
	mat3 yuv2rgb_bt601_mat = mat3(
		vec3(1.164,  1.164, 1.164),
		vec3(0.000, -0.392, 2.017),
		vec3(1.596, -0.813, 0.000)
	);

	yuv.x = texture2D(tex_y, textureOut).r - 0.063;
	yuv.y = texture2D(tex_u, textureOut).r - 0.500;
	yuv.z = texture2D(tex_u, textureOut).g - 0.500;

	rgb = yuv2rgb_bt601_mat * yuv;
	gl_FragColor = vec4(rgb, 1.0);
}

For now I have used this Y to RGBA shader instead, completely ignoring the UV combined plane:

struct pixel_in {                                                                                                                                                                                      
    float2 texcoord1 : TEXCOORD0;
    float2 texcoord2 : TEXCOORD1;
    uniform samplerRECT texture1 : TEXUNIT0;
    uniform samplerRECT texture2 : TEXUNIT1;
    float4 color : COLOR0;                                                                                                                                                                          
};

struct pixel_out {                                                                                                                                                                                     
    float4 color : COLOR0;                                                                                                                                                                          
};  

pixel_out
main (pixel_in IN)
{
    pixel_out OUT;

    float3 pre;

    pre.r = texRECT (IN.texture1, IN.texcoord1).x - (16.0 / 256.0);
    pre.g = texRECT (IN.texture2, IN.texcoord2).x - (128.0 / 256.0);
    pre.b = texRECT (IN.texture2, IN.texcoord2).y - (128.0 / 256.0);

    const float3 red   = float3 (1.0/219.0, 0.0, 1.371/219.0) * 255.0;
    const float3 green = float3 (1.0/219.0, -0.336/219.0, -0.698/219.0) * 255.0;
    const float3 blue  = float3 (1.0/219.0, 1.732/219.0, 0.0) * 255.0;

    OUT.color.r = pre.r;
    OUT.color.g = pre.r;
    OUT.color.b = pre.r;
    OUT.color.a = IN.color.a;
    
    return OUT;
}

More NV12 shader examples:


For jpeg mode, do we need to keep the decoder context? (initialization looks costly)

totaam added a commit that referenced this issue Jan 19, 2023
totaam added a commit that referenced this issue Jan 20, 2023
this is undocumented, but not entirely unexpected for NV12 pixel storage format,
this fixes colors bleeding, at least with the non-opengl case
@totaam
Copy link
Collaborator Author

totaam commented Jan 21, 2023

Trying to get nvdec to work via gstreamer.

It finds the same valid encodings and chroma formats:

GST_DEBUG=nvdec*:4 gst-inspect-1.0 nvh264dec 
0:00:00.902346129 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: mpegvideo bit-depth 8 with chroma format 0 [48 - 4080] x [16 - 4080]
0:00:00.908428330 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: mpeg2video bit-depth 8 with chroma format 0 [48 - 4080] x [16 - 4080]
0:00:00.914007625 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: mpeg4video bit-depth 8 with chroma format 0 [48 - 2032] x [16 - 2032]
0:00:00.920519713 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1094:gst_nv_decoder_check_device_caps: No codec map corresponding to codec 3
0:00:00.921501455 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: h264 bit-depth 8 with chroma format 0 [48 - 4096] x [16 - 4096]
0:00:00.927115589 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: jpeg bit-depth 8 with chroma format 0 [64 - 32768] x [64 - 16384]
0:00:00.929860858 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: jpeg bit-depth 8 with chroma format 1 [64 - 32768] x [64 - 16384]
0:00:00.932570852 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1094:gst_nv_decoder_check_device_caps: No codec map corresponding to codec 6
0:00:00.932581141 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1094:gst_nv_decoder_check_device_caps: No codec map corresponding to codec 7
0:00:00.933641584 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: h265 bit-depth 8 with chroma format 0 [144 - 8192] x [144 - 8192]
0:00:00.934768056 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: h265 bit-depth 10 with chroma format 0 [144 - 8192] x [144 - 8192]
0:00:00.935946641 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: h265 bit-depth 12 with chroma format 0 [144 - 8192] x [144 - 8192]
0:00:00.939812713 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: vp8 bit-depth 8 with chroma format 0 [48 - 4096] x [16 - 4096]
0:00:00.945379570 655952 0x55f02fa992d0 INFO               nvdecoder gstnvdecoder.c:1171:gst_nv_decoder_check_device_caps: vp9 bit-depth 8 with chroma format 0 [128 - 8192] x [128 - 8192]
Factory Details:
  Rank                     primary (256)
  Long-name                NVDEC h264 Video Decoder
  Klass                    Codec/Decoder/Video/Hardware
  Description              NVDEC video decoder
  Author                   Ericsson AB, http://www.ericsson.com, Seungha Yang <[email protected]>

Plugin Details:
  Name                     nvcodec
  Description              GStreamer NVCODEC plugin
  Filename                 /usr/lib64/gstreamer-1.0/libgstnvcodec.so
  Version                  1.20.5
  License                  LGPL
  Source module            gst-plugins-bad
  Source release date      2022-12-19
  Binary package           Fedora GStreamer-plugins-bad package
  Origin URL               http://download.fedoraproject.org

GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstVideoDecoder
                         +----GstNvDec
                               +----nvh264dec

Pad Templates:
  SINK template: 'sink'
    Availability: Always
    Capabilities:
      video/x-h264
          stream-format: byte-stream
              alignment: au
                profile: { (string)constrained-baseline, (string)baseline, (string)main, (string)high, (string)constrained-high, (string)progressive-high }
                  width: [ 48, 4096 ]
                 height: [ 16, 4096 ]
  
  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-raw
                  width: [ 48, 4096 ]
                 height: [ 16, 4096 ]
              framerate: [ 0/1, 2147483647/1 ]
                 format: { (string)NV12 }
      video/x-raw(memory:GLMemory)
                  width: [ 48, 4096 ]
                 height: [ 16, 4096 ]
              framerate: [ 0/1, 2147483647/1 ]
                 format: { (string)NV12 }
      video/x-raw(memory:CUDAMemory)
                  width: [ 48, 4096 ]
                 height: [ 16, 4096 ]
              framerate: [ 0/1, 2147483647/1 ]
                 format: { (string)NV12 }

Element has no clocking capabilities.
Element has no URI handling capabilities.

Pads:
  SINK: 'sink'
    Pad Template: 'sink'
  SRC: 'src'
    Pad Template: 'src'

Element Properties:
  automatic-request-sync-point-flags: Flags to use when automatically requesting sync points
                        flags: readable, writable
                        Flags "GstVideoDecoderRequestSyncPointFlags" Default: 0x00000003, "corrupt-output+discard-input"
                           (0x00000001): discard-input    - GST_VIDEO_DECODER_REQUEST_SYNC_POINT_DISCARD_INPUT
                           (0x00000002): corrupt-output   - GST_VIDEO_DECODER_REQUEST_SYNC_POINT_CORRUPT_OUTPUT
  automatic-request-sync-points: Automatically request sync points when it would be useful
                        flags: readable, writable
                        Boolean. Default: false
  discard-corrupted-frames: Discard frames marked as corrupted instead of outputting them
                        flags: readable, writable
                        Boolean. Default: false
  max-display-delay   : Improves pipelining of decode with display, 0 means no delay (auto = -1)
                        flags: readable, writable
                        Integer. Range: -1 - 2147483647 Default: -1 
  max-errors          : Max consecutive decoder errors before returning flow error
                        flags: readable, writable
                        Integer. Range: -1 - 2147483647 Default: 10 
  min-force-key-unit-interval: Minimum interval between force-keyunit requests in nanoseconds
                        flags: readable, writable
                        Unsigned Integer64. Range: 0 - 18446744073709551615 Default: 0 
  name                : The name of the object
                        flags: readable, writable, 0x2000
                        String. Default: "nvh264dec0"
  parent              : The parent of the object
                        flags: readable, writable, 0x2000
                        Object of type "GstObject"
  qos                 : Handle Quality-of-Service events from downstream
                        flags: readable, writable
                        Boolean. Default: true

And h264 works:

gst-launch-1.0 videotestsrc ! x264enc ! queue ! nvh264dec ! videoconvert ! autovideosink

Building nvdec element from source for debugging is easy:

tar -Jxvf ~/Downloads/gst-plugins-bad-1.20.5.tar.xz
cd gst-plugins-bad-1.20.5/
mkdir build
cd build/
meson ..
ninja
sudo ninja install

rm -fr ~/.cache/gstreamer-1.0
GST_PLUGIN_PATH=/usr/local/lib64/gstreamer-1.0/ GST_DEBUG=nvdec*:5 gst-inspect-1.0 nvh264dec

Hoping to dump the data structures and using the sample data from

TEST_COMPRESSED_DATA = {
"h264": {
#"YUV420P" : (24, 16, unhex("000000016764000aacb317cbc2000003000200000300651e244cd00000000168e970312c8b0000010605ffff56dc45e9bde6d948b7962cd820d923eeef78323634202d20636f726520313432202d20482e3236342f4d5045472d342041564320636f646563202d20436f70796c65667420323030332d32303134202d20687474703a2f2f7777772e766964656f6c616e2e6f72672f783236342e68746d6c202d206f7074696f6e733a2063616261633d31207265663d35206465626c6f636b3d313a303a3020616e616c7973653d3078333a3078313133206d653d756d68207375626d653d38207073793d31207073795f72643d312e30303a302e3030206d697865645f7265663d31206d655f72616e67653d3136206368726f6d615f6d653d31207472656c6c69733d31203878386463743d312063716d3d3020646561647a6f6e653d32312c313120666173745f70736b69703d31206368726f6d615f71705f6f66667365743d2d3220746872656164733d31206c6f6f6b61686561645f746872656164733d3120736c696365645f746872656164733d30206e723d3020646563696d6174653d3120696e7465726c616365643d3020626c757261795f636f6d7061743d3020636f6e73747261696e65645f696e7472613d3020626672616d65733d3020776569676874703d32206b6579696e743d393939393939206b6579696e745f6d696e3d353030303030207363656e656375743d343020696e7472615f726566726573683d302072633d637266206d62747265653d30206372663d33382e322071636f6d703d302e36302071706d696e3d302071706d61783d3639207170737465703d342069705f726174696f3d312e34302061713d313a312e3030008000000165888404bffe841fc0a667f891ea1728763fecb5e1")),
"YUV420P" : {
(24, 16) : (
unhex("000000016764000aacb317cbc2000003000200000300651e244cd00000000168e970312c8b0000010605ffff56dc45e9bde6d948b7962cd820d923eeef78323634202d20636f726520313432202d20482e3236342f4d5045472d342041564320636f646563202d20436f70796c65667420323030332d32303134202d20687474703a2f2f7777772e766964656f6c616e2e6f72672f783236342e68746d6c202d206f7074696f6e733a2063616261633d31207265663d35206465626c6f636b3d313a303a3020616e616c7973653d3078333a3078313133206d653d756d68207375626d653d38207073793d31207073795f72643d312e30303a302e3030206d697865645f7265663d31206d655f72616e67653d3136206368726f6d615f6d653d31207472656c6c69733d31203878386463743d312063716d3d3020646561647a6f6e653d32312c313120666173745f70736b69703d31206368726f6d615f71705f6f66667365743d2d3220746872656164733d31206c6f6f6b61686561645f746872656164733d3120736c696365645f746872656164733d30206e723d3020646563696d6174653d3120696e7465726c616365643d3020626c757261795f636f6d7061743d3020636f6e73747261696e65645f696e7472613d3020626672616d65733d3020776569676874703d32206b6579696e743d393939393939206b6579696e745f6d696e3d353030303030207363656e656375743d343020696e7472615f726566726573683d302072633d637266206d62747265653d30206372663d33382e322071636f6d703d302e36302071706d696e3d302071706d61783d3639207170737465703d342069705f726174696f3d312e34302061713d313a312e3030008000000165888404bffe841fc0a667f891ea1728763fecb5e1"),
),
(128, 128) : (
unhex("000000016764000bacb6102342000003000200000300651e2855c00000000168eac7cb0000010605ffff63dc45e9bde6d948b7962cd820d923eeef78323634202d20636f726520313634202d20482e3236342f4d5045472d342041564320636f646563202d20436f70796c65667420323030332d32303232202d20687474703a2f2f7777772e766964656f6c616e2e6f72672f783236342e68746d6c202d206f7074696f6e733a2063616261633d31207265663d32206465626c6f636b3d313a303a3020616e616c7973653d3078333a3078313133206d653d686578207375626d653d34207073793d31207073795f72643d312e30303a302e3030206d697865645f7265663d30206d655f72616e67653d3136206368726f6d615f6d653d31207472656c6c69733d31203878386463743d312063716d3d3020646561647a6f6e653d32312c313120666173745f70736b69703d31206368726f6d615f71705f6f66667365743d3020746872656164733d32206c6f6f6b61686561645f746872656164733d3220736c696365645f746872656164733d3120736c696365733d32206e723d3020646563696d6174653d3120696e7465726c616365643d3020626c757261795f636f6d7061743d3020636f6e73747261696e65645f696e7472613d3020626672616d65733d3020776569676874703d31206b6579696e743d696e66696e697465206b6579696e745f6d696e3d353336383730393133207363656e656375743d343020696e7472615f726566726573683d302072633d637266206d62747265653d30206372663d32352e352071636f6d703d302e36302071706d696e3d302071706d61783d3639207170737465703d342069705f726174696f3d312e34302061713d313a312e30300080000001658884046fdcfe0bb9ecd9a9eb03bd869ffa711c4a9def0e758b724399e770a4e6982d7f24eed50c22ea4cb1ecaee80075c100000165042221011bffdcfe0bb9ecd9a9eb03bd869ffa711c4a9def0e758b724399e770a4e6982d7f24eed50c22ea4cb1ecaee80075c1"),
unhex("00000001419a3b1052fffeb52a82160000014104268ec414bffeb52a8216"),
unhex("00000001419a4608297ffeb52a821700000141042691820a5ffeb52a8217"),
unhex("00000001419a66082b7ffeb52a821600000141042699820adffeb52a8216"),
unhex("00000001419a86082b7ffeb52a8217000001410426a1820adffeb52a8217"),
),
},

then saving each frame to a different file, I can replay using:

gst-launch-1.0 multifilesrc location="%02d.h264" index=0 \
    caps="video/x-h264,stream-format=byte-stream,alignment=nal,width=128,height=128,framerate=1/1" \
   ! avdec_h264 ! videorate ! autovideosink

But not with nvh264dec or openh264:

WARNING: erroneous pipeline: could not link multifilesrc0 to nvh264dec0

openh264dec can work by adding an h264parse element before it.

This one works with all decoders:

gst-launch-1.0 videotestsrc pattern=white ! x264enc ! nvh264dec ! videoconvert ! autovideosink

But only thanks to caps negotiation. Saving the frames using:

gst-launch-1.0 videotestsrc pattern=white
            ! video/x-raw,width=320,height=240
            ! x264enc
            ! multifilesink location="frame%d.h264"

Then trying to replay them with multifilesrc as above hits the same issues again.

What makes it work is to specify format="NV12" (or format="I420"), otherwise we see:

0:00:00.231785272 714276 0x55563ad7e120 WARN                GST_CAPS gstpad.c:5757:pre_eventfunc_check:<nvh264dec0:sink> caps video/x-h264, pixel-aspect-ratio=(fraction)1/1, width=(int)320, height=(int)240,
    framerate=(fraction)30/1, chroma-format=(string)4:4:4, bit-depth-luma=(uint)10,
    bit-depth-chroma=(uint)10, colorimetry=(string)bt601, parsed=(boolean)true,
    stream-format=(string)byte-stream, alignment=(string)au, profile=(string)high-4:4:4, level=(string)1.3 not accepted

totaam added a commit that referenced this issue Jan 21, 2023
@totaam
Copy link
Collaborator Author

totaam commented Jan 21, 2023

Correction, this works for openh264 without h264parse (switching from alignment=nal to alignment=au):

gst-launch-1.0 multifilesrc location="frame%d.h264" index=0 \
    caps="video/x-h264,stream-format=byte-stream,alignment=au,width=320,height=240,framerate=1/1" \
    ! openh264dec ! videorate ! autovideosink

And nvdec loads from file:

  • vp8:
gst-launch-1.0 multifilesrc location="frame%d.vp8" index=0 \
    caps="video/x-vp8,stream-format=byte-stream,alignment=au,width=320,height=240" \
    ! nvvp8dec ! videoconvert ! autovideosink
  • h264:
gst-launch-1.0 multifilesrc location="frame%d.h264" index=0 \
    caps="video/x-h264,stream-format=byte-stream,alignment=au,width=320,height=240" \
    ! nvh264dec ! videoconvert ! autovideosink

Though that's only half the problem, the bigger issue is that for h264, we must use a parser to populate the decoder's data structures. (things like h264.num_ref_idx_l1_active_minus1!)
Not sure why the pfnDecodePicture - aka parser_decode_callback is called by the gstreamer decoder and not in our codec.. That's the one that provides the CUVIDPICPARAMS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant