Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Mar 24, 2022

The code path

if is_dotprod_available():
intrin = dot_int8_int8_int32_neon_82(int32_lanes=4)

is broken because (1) we are not passing dtype and (2) the intrin description has a bug in the number of arguments to sdot/udot. For example I get this error if I try to run it:

  File "/home/masa/projects/dev/tvm/src/target/llvm/codegen_llvm.cc", line 981
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (f) is false: Cannot find intrinsic declaration, possible type mismatch: llvm.aarch64.neon.udot

The PR that added this intrin, #3978, didn't add a test so this code path was never tested. The PR added something under apps/topi_recipe/conv, https://github.com/apache/tvm/blob/main/apps/topi_recipe/conv/test_conv_int8_arm.py, but this script is broken and needs fixing too.

@tkonolige

@masahi masahi force-pushed the arm-nchwc-dot-fix branch from dfb4744 to 2425a5c Compare March 24, 2022 20:30
name="conv2d_nchw_spatial_pack.arm_cpu",
plevel=10,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule_conv2d_nchw_spatial_pack has tophub entries, so they are always picked up based on "time cost" even if topi.arm_cpu.is_int8_hw_support path is taken and the NCHWc int8 schedule is surely faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like another good argument for removing tophub. You can disable it for the test by putting it all in an empty ApplyHistoryBest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annoying thing is this happens on only specific input or filter sizes. It took me some time to figure out why the NCHWc schedule is not used when the input shape is (56, 56) and filter size is 3x3. If I use (128, 128) shape or 1x1 filter, I didn't see this issue.

@masahi masahi force-pushed the arm-nchwc-dot-fix branch from 2425a5c to b99839a Compare March 24, 2022 21:02
Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @masahi

@masahi masahi force-pushed the arm-nchwc-dot-fix branch from b99839a to 3d86a61 Compare March 25, 2022 04:37
@junrushao
Copy link
Member

@vinx13 would you mind taking a look? Thanks a lot!

Copy link

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to add this test - thank you for finding and fixing this. ]

@masahi masahi force-pushed the arm-nchwc-dot-fix branch from 4dd3cff to a4ac3e1 Compare March 25, 2022 21:28
@Mousius
Copy link
Member

Mousius commented Mar 28, 2022

LGTM, I'll wait for @vinx13 to take a look 😸

@vinx13 vinx13 merged commit 7896108 into apache:main Mar 28, 2022
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* [ARM] Fix NCHWc int8 dot product schedule lowering

* fix arm task extraction test not running

* skip test on i386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants