-
Notifications
You must be signed in to change notification settings - Fork 300
[CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm #2540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
7688ba0
[CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm
ClementLinCF 510dc83
Update rmsnorm host reference
ClementLinCF 2985332
Update tree reduction of rmsnorm for reference host
ClementLinCF 997996f
Fix cross warp for m > 1 cases
MHYangAMD 6a1ac38
Add RMSNorm model selectable option for host reference
ClementLinCF 58a6ee8
Fix save_unquant cases
MHYangAMD b796269
Update reference rmsnorm forward function to use enum for model sensi…
ClementLinCF ac2ba69
Update reference rmsnorm calculation for model sensitivity
ClementLinCF 3a141eb
Fix m warp for layernorm
MHYangAMD 0c803d1
Adjust parameter of reference for twoPass
ClementLinCF b2e7af5
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF 1cb4149
Fix clang format
ClementLinCF 16fce0c
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF 847cedd
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF c441904
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF 59a92fe
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF 0dc5d41
Run clang-format-overwrite.sh to fix formating issue
ClementLinCF d71e744
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF e358e8c
fix clang format
illsilin 1a9a3a4
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
ClementLinCF bcf095f
solve the merge conflict
427e94b
Merge branch 'develop' into ck_tile/rmsnorm-smoke-test
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,49 +1,85 @@ | ||
| #!/bin/sh | ||
| #!/bin/bash | ||
|
|
||
| EXE="$(find . -name tile_rmsnorm2d_fwd -type f | head -n 1)" | ||
|
|
||
| for fquant in "" "-fquant=1 -prec_o=int8" "-fquant=2 -prec_o=int8" "-fquant=1 -prec_o=fp8" "-fquant=2 -prec_o=fp8"\ | ||
| "-fquant=1 -prec_o=int8 -save_unquant=1" "-fquant=2 -prec_o=int8 -save_unquant=1" "-fquant=1 -prec_o=fp8 -save_unquant=1" "-fquant=2 -prec_o=fp8 -save_unquant=1"; do | ||
| for pr_i in "fp16" "bf16" ; do | ||
| for fadd in "0" "1"; do | ||
| # 0: for no specific RMSNorm; 1: for T-5 like RMSNorm | ||
| for s in "0" "1"; do | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=99 -n=13 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=17 -n=16 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=1 -n=100 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=4 -n=128 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=80 -n=127 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=22 -n=255 -stride=256 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=7 -n=599 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=19 -n=512 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=33 -n=313 -stride=1000 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=11 -n=510 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=171 -n=676 -stride=818 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=91 -n=636 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=12 -n=768 -stride=800 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=100 -n=766 -stride=812 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=31 -n=1024 | ||
| # $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=64 -n=1000 -stride=1004 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=8 -n=1501 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=3 -n=1826 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=5 -n=2040 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=7 -n=2734 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=1 -n=3182 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=9 -n=4096 | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=3 -n=8192 | ||
| done | ||
| done | ||
| done | ||
| done | ||
| total=0 | ||
| valid=0 | ||
|
|
||
| # The following cases uses two pass pipeline which doesn't support quant epilogue. | ||
| for fquant in "" | ||
| for pr_i in "fp16" "bf16" ; do | ||
| for fadd in "0" "1"; do | ||
| # 0: for no specific RMSNorm; 1: for T-5 like RMSNorm | ||
| for s in "0" "1"; do | ||
| $EXE -prec_i=$pr_i -fadd=$fadd -s=$s $fquant -m=1 -n=10547 | ||
| #$EXE -prec_i=$pr_i -fadd=$fadd $fquant -m=3 -n=17134 | ||
| done | ||
| done | ||
| run_case() { | ||
| cmd="$EXE -prec_i=$1 -fadd=$2 -s=$3 $4 -m=$5 -n=$6 $7" | ||
| echo "[CMD] $cmd" | ||
| output=$($cmd 2>&1) | ||
| echo "$output" | ||
| if echo "$output" | grep -q "valid:y"; then | ||
| valid=$((valid + 1)) | ||
| fi | ||
| total=$((total + 1)) | ||
| } | ||
|
|
||
| fquant_list=( | ||
| "" | ||
| "-fquant=1 -prec_o=int8" | ||
| "-fquant=2 -prec_o=int8" | ||
| "-fquant=1 -prec_o=fp8" | ||
| "-fquant=2 -prec_o=fp8" | ||
| "-fquant=1 -prec_o=int8 -save_unquant=1" | ||
| "-fquant=2 -prec_o=int8 -save_unquant=1" | ||
| "-fquant=1 -prec_o=fp8 -save_unquant=1" | ||
| "-fquant=2 -prec_o=fp8 -save_unquant=1" | ||
| ) | ||
|
|
||
| m_n_list=( | ||
| "99 13" "17 16" "1 100" "4 128" "80 127" | ||
| "7 599" "19 512" "11 510" "91 636" | ||
| "31 1024" "8 1501" "3 1826" "5 2040" | ||
| "7 2734" "1 3182" "9 4096" "3 8192" | ||
| ) | ||
|
|
||
| ### Add special stride test ### | ||
| m_n_stride_list=( | ||
| "22 255 -x_stride=256 -xr_stride=256 -y_stride=256 -yr_stride=256" | ||
| "33 313 -x_stride=1000 -xr_stride=1000 -y_stride=1000 -yr_stride=1000" | ||
| "171 676 -x_stride=818 -xr_stride=818 -y_stride=818 -yr_stride=818" | ||
| "12 768 -x_stride=800 -xr_stride=800 -y_stride=800 -yr_stride=800" | ||
| "100 766 -x_stride=812 -xr_stride=812 -y_stride=812 -yr_stride=812" | ||
| "64 1000 -x_stride=1004 -xr_stride=1004 -y_stride=1004 -yr_stride=1004" | ||
| ) | ||
|
|
||
| for fquant in "${fquant_list[@]}"; do | ||
| for pr_i in "fp16" "bf16"; do | ||
| for fadd in "0" "1"; do | ||
| for s in "0" "1"; do | ||
| for pair in "${m_n_list[@]}"; do | ||
| m=$(echo $pair | cut -d ' ' -f1) | ||
| n=$(echo $pair | cut -d ' ' -f2) | ||
| run_case "$pr_i" "$fadd" "$s" "$fquant" "$m" "$n" "" | ||
| done | ||
|
|
||
| ### Running tests with stride ### | ||
| for triple in "${m_n_stride_list[@]}"; do | ||
| m=$(echo $triple | cut -d ' ' -f1) | ||
| n=$(echo $triple | cut -d ' ' -f2) | ||
| stride_args=$(echo $triple | cut -d ' ' -f3-) | ||
| run_case "$pr_i" "$fadd" "$s" "$fquant" "$m" "$n" "$stride_args" | ||
| done | ||
| done | ||
| done | ||
| done | ||
| done | ||
|
|
||
| # Special two-pass only | ||
| for pr_i in "fp16" "bf16"; do | ||
| for fadd in "0" "1"; do | ||
| for s in "0" "1"; do | ||
| run_case "$pr_i" "$fadd" "$s" "" "1" "10547" "" | ||
| done | ||
| done | ||
| done | ||
|
|
||
| # Summary | ||
| echo "==============================" | ||
| echo "Total cases: $total" | ||
| echo "Valid cases: $valid" | ||
| accuracy=$(awk "BEGIN {printf \"%.2f\", ($valid / $total) * 100}") | ||
| echo "Accuracy: $accuracy%" | ||
| echo "==============================" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.