Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditions for Running brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t in oneDNN #1934

Open
deepeshfujitsu opened this issue May 28, 2024 · 3 comments
Assignees
Labels

Comments

@deepeshfujitsu
Copy link
Contributor

Hi, I would like to understand the conditions under which brgemm_convolution_fwd_t and jit_avx512_common_convolution_fwd_t are executed. Specifically, is there any criteria that determine when each of these runs? Additionally, I noticed that in TensorFlow, BRGEMM is used, whereas in PyTorch, JIT is used. Could you explain why BRGEMM is not used in the case of PyTorch?

Thank you!

@dzarukin
Copy link
Contributor

Hi @deepeshfujitsu,

Specifically, is there any criteria that determine when each of these runs?

The first criteria is the order of implementations in this list. Each data type + prop_kind bucket has it predefined: the order goes from top to bottom.

The next level of criteria is internal implementation knowledge that can consider tons of condition, including but not limited to user primitive settings, blocking parameters for a specific system, features support, etc. All (or most) of those should be visible under ONEDNN_VERBOSE=dispatch (at least, for x64). When the implementation doesn't dispatch due to a specific condition, the line and short reason description would be shown as a verbose message under dispatch mode.

Design decisions for frameworks go to framework maintainers as oneDNN is not responsible for those.
@jgong5, @chunyuan-w, @Guobing-Chen, please help answering the question.

@deepeshfujitsu
Copy link
Contributor Author

Hi @dzarukin,

Thank you for your reply and for suggesting the dispatch flag, which helped me debug the issue preventing the execution of the BRGEMM convolution.

I found the relevant function in jit_brgemm_conv_utils.cpp:

bool is_any_eligible(const jit_brgemm_conv_conf_t &jcp) {
    return (jcp.prop_kind == prop_kind::forward_inference || jcp.wei_plain
            || one_of(jcp.wei_dt, data_type::s8, data_type::f16,
                    data_type::f8_e5m2, data_type::f8_e4m3)
            || one_of(jcp.isa, avx2_vnni_2) || is_amx(jcp.isa));
}

In this function, I changed avx2_vnni_2 to avx512_core, and it worked fine for avx512_core. I ran the following BenchDNN test:

Command:
ONEDNN_VERBOSE=1 ./benchdnn --conv --dt=f32 --dir=FWD_B mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1

Output without changing the function:

onednn_verbose,info,oneDNN v3.4.1 (commit f5ff0a6de16c130053bec1a1aec3a9b826c66f78)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:16
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a::f0 dst_f32::blocked:a::f0,,,64,0.373047
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:ABcd16b16a::f0,,,64x64x3x3,0.473145
onednn_verbose,primitive,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:aBcd16b::f0,,,4x64x56x56,0.0539551
onednn_verbose,primitive,exec,cpu,convolution,jit:avx512_core,forward_training,src_f32:a:blocked:aBcd16b::f0 wei_f32:a:blocked:ABcd16b16a::f0 bia_f32:a:blocked:a::f0 dst_f32:a:blocked:aBcd16b::f0,,alg:convolution_direct,mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1,1.00098
onednn_verbose,primitive,exec,cpu,reorder,jit:blk,undef,src_f32::blocked:aBcd16b::f0 dst_f32::blocked:abcd::f0,,,4x64x56x56,0.059082
0:PASSED __REPRO: --conv mb4ic64ih56oc64oh56kh3ph1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.11s; fill: 0.01s (10%); compute_ref: 0.08s (74%); compare: 0.00s (4%);

Output with the change from avx2_vnni_2 to avx512_core:

onednn_verbose,info,oneDNN v3.4.1 (commit f5ff0a6de16c130053bec1a1aec3a9b826c66f78)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:16
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:a::f0 dst_f32::blocked:a::f0,,,64,0.36499
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:Acdb64a::f0,,,64x64x3x3,0.379883
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:abcd::f0 dst_f32::blocked:acdb::f0,,,4x64x56x56,0.0629883
onednn_verbose,primitive,exec,cpu,convolution,brgconv:avx512_core,forward_training,src_f32:a:blocked:acdb::f0 wei_f32:a:blocked:Acdb64a::f0 bia_f32:a:blocked:a::f0 dst_f32:a:blocked:acdb::f0,,alg:convolution_direct,mb4_ic64oc64_ih56oh56kh3sh1dh0ph1_iw56ow56kw3sw1dw0pw1,0.996094
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb::f0 dst_f32::blocked:abcd::f0,,,4x64x56x56,0.0778809
0:PASSED __REPRO: --conv mb4ic64ih56oc64oh56kh3ph1
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.10s; fill: 0.01s (6%); compute_ref: 0.08s (83%); compare: 0.00s (5%);

Could you please clarify why we are only checking for avx2_vnni_2 and not for other ISAs?

Thank you!

@dzarukin
Copy link
Contributor

@xuxinzen @tczeszun, could you, please, help answering the question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants