Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

deepeshfujitsu · 2024-05-28T14:49:24Z

Hi, I would like to understand the conditions under which brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t are executed. Specifically, is there any criteria that determine when each of these runs? Additionally, I noticed that in TensorFlow, BRGEMM is used, whereas in PyTorch, JIT is used. Could you explain why BRGEMM is not used in the case of PyTorch?

Thank you!

dzarukin · 2024-05-28T16:45:01Z

Hi @deepeshfujitsu,

Specifically, is there any criteria that determine when each of these runs?

The first criteria is the order of implementations in this list. Each data type + prop_kind bucket has it predefined: the order goes from top to bottom.

The next level of criteria is internal implementation knowledge that can consider tons of condition, including but not limited to user primitive settings, blocking parameters for a specific system, features support, etc. All (or most) of those should be visible under ONEDNN_VERBOSE=dispatch (at least, for x64). When the implementation doesn't dispatch due to a specific condition, the line and short reason description would be shown as a verbose message under dispatch mode.

Design decisions for frameworks go to framework maintainers as oneDNN is not responsible for those.
@jgong5, @chunyuan-w, @Guobing-Chen, please help answering the question.

deepeshfujitsu · 2024-05-29T07:42:50Z

Hi @dzarukin,

Thank you for your reply and for suggesting the dispatch flag, which helped me debug the issue preventing the execution of the BRGEMM convolution.

I found the relevant function in jit_brgemm_conv_utils.cpp:

bool is_any_eligible(const jit_brgemm_conv_conf_t &jcp) {
    return (jcp.prop_kind == prop_kind::forward_inference || jcp.wei_plain
            || one_of(jcp.wei_dt, data_type::s8, data_type::f16,
                    data_type::f8_e5m2, data_type::f8_e4m3)
            || one_of(jcp.isa, avx2_vnni_2) || is_amx(jcp.isa));
}

In this function, I changed avx2_vnni_2 to avx512_core, and it worked fine for avx512_core. I ran the following BenchDNN test:

Command:
ONEDNN_VERBOSE=1 ./benchdnn --conv --dt=f32 --dir=FWD_B mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1

Output without changing the function:

onednn_verbose,info,oneDNN v3.4.1 (commit f5ff0a6de16c130053bec1a1aec3a9b826c66f78)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:16
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a::f0 dst_f32::blocked:a::f0,,,64,0.373047
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:ABcd16b16a::f0,,,64x64x3x3,0.473145
onednn_verbose,primitive,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:aBcd16b::f0,,,4x64x56x56,0.0539551
onednn_verbose,primitive,exec,cpu,convolution,jit:avx512_core,forward_training,src_f32:a:blocked:aBcd16b::f0 wei_f32:a:blocked:ABcd16b16a::f0 bia_f32:a:blocked:a::f0 dst_f32:a:blocked:aBcd16b::f0,,alg:convolution_direct,mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1,1.00098
onednn_verbose,primitive,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:aBcd16b::f0 dst_f32::blocked:abcd::f0,,,4x64x56x56,0.059082
0:PASSED __REPRO: --conv mb4ic64ih56oc64oh56kh3ph1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.11s; fill: 0.01s (10%); compute_ref: 0.08s (74%); compare: 0.00s (4%);

Output with the change from avx2_vnni_2 to avx512_core:

onednn_verbose,info,oneDNN v3.4.1 (commit f5ff0a6de16c130053bec1a1aec3a9b826c66f78)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:16
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a::f0 dst_f32::blocked:a::f0,,,64,0.36499
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:Acdb64a::f0,,,64x64x3x3,0.379883
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:acdb::f0,,,4x64x56x56,0.0629883
onednn_verbose,primitive,exec,cpu,convolution,brgconv:avx512_core,forward_training,src_f32:a:blocked:acdb::f0 wei_f32:a:blocked:Acdb64a::f0 bia_f32:a:blocked:a::f0 dst_f32:a:blocked:acdb::f0,,alg:convolution_direct,mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1,0.996094
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb::f0 dst_f32::blocked:abcd::f0,,,4x64x56x56,0.0778809
0:PASSED __REPRO: --conv mb4ic64ih56oc64oh56kh3ph1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.10s; fill: 0.01s (6%); compute_ref: 0.08s (83%); compare: 0.00s (5%);

Could you please clarify why we are only checking for avx2_vnni_2 and not for other ISAs?

Thank you!

dzarukin · 2024-05-29T14:14:49Z

@xuxinzen @tczeszun, could you, please, help answering the question?

deepeshfujitsu added the question label May 28, 2024

vpirogov assigned dzarukin May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

deepeshfujitsu commented May 28, 2024

dzarukin commented May 28, 2024

deepeshfujitsu commented May 29, 2024

dzarukin commented May 29, 2024

Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

Comments

deepeshfujitsu commented May 28, 2024

dzarukin commented May 28, 2024

deepeshfujitsu commented May 29, 2024

dzarukin commented May 29, 2024