Skip to content

Conversation

@spectrometerHBH
Copy link
Contributor

@spectrometerHBH spectrometerHBH commented Mar 12, 2023

@tvm-bot
Copy link
Collaborator

tvm-bot commented Mar 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@spectrometerHBH spectrometerHBH marked this pull request as ready for review March 12, 2023 03:34
Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Could you fix the linter?

@Hzfengsy
Copy link
Member

Could you please send the PR directly to main? since it's not unity related.

@junrushao junrushao merged commit 0ffd24c into apache:unity Mar 18, 2023
tqchen pushed a commit that referenced this pull request Mar 20, 2023
- allow Conv2d using different alignment factors for input and epilogue, which can influence performance 
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
tqchen pushed a commit that referenced this pull request Apr 1, 2023
- allow Conv2d using different alignment factors for input and epilogue, which can influence performance 
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
tqchen pushed a commit that referenced this pull request Apr 1, 2023
- allow Conv2d using different alignment factors for input and epilogue, which can influence performance 
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
tqchen pushed a commit that referenced this pull request Apr 1, 2023
- allow Conv2d using different alignment factors for input and epilogue, which can influence performance 
- store the profiler cache on disk, reducing CUTLASS profiler overhead across different runs
- use the same set of default tile configurations as CUTLASS for sm80 https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/generator.py#L1881
@masahi
Copy link
Member

masahi commented Aug 10, 2023

Just noticed that this PR was sent to unity, but it should have been sent to main. @spectrometerHBH please follow up.

The search for the output alignments added in this PR does nothing to our CUTLASS BYOC (since the C alignment doesn't seem to be used in codegen), but increases the conv2d profiling time by 4x. I was going to fix that on main and found that this code only exists on unity.

UPDATE: Sorry I did find that the C alignment is actually used here

min(operation.C.alignment * DataTypeSize[operation.C.element], 128)
. But we shouldn't not search for all candidate alignments when profile_all_alignments is False, otherwise conv2d profiling becomes super slow.

@junrushao
Copy link
Member

Yeah we should probably send this commit to main as well. The co-existence of the branches does cause some trouble in maintenance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants