Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Aug 16, 2023

#14275 was sent and merged to unity for no good reason, so I'm sending this to main now.

Also the PR makes the conv2d profiling time 4x slower by expanding the search space over the output alignments. In most cases (profile_all_alignments = False, the default), we just want to pick the largest-possible alignment. To prevent the conv2d profiling time from blowing up in the default path, this PR adds a fix on top of #14275. Due to this difference, expect merge conflict on the next unity + main merge.

@junrushao @spectrometerHBH @Hzfengsy @jwfromm

spectrometerHBH and others added 2 commits August 16, 2023 14:21
- allow Conv2d using different alignment factors for input and epilogue, which can influence performance
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
@tvm-bot
Copy link
Collaborator

tvm-bot commented Aug 16, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: cutlass, cherry-pick See #10317 for details

Generated by tvm-bot

@Hzfengsy Hzfengsy merged commit 8afa6d2 into apache:main Aug 17, 2023
@Hzfengsy
Copy link
Member

Due to this difference, expect merge conflict on the next unity + main merge.

Could you please also patch it to unity branch to address the potential conflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants