Skip to content

[hipblaslt] Fix hazard in TF32 cvt sequence#1092

Merged
b-shi merged 1 commit into
developfrom
users/brianshi/tf32_fix
Aug 7, 2025
Merged

[hipblaslt] Fix hazard in TF32 cvt sequence#1092
b-shi merged 1 commit into
developfrom
users/brianshi/tf32_fix

Conversation

@b-shi
Copy link
Copy Markdown
Contributor

@b-shi b-shi commented Aug 6, 2025

Due to changes in #913 s_mov_b64 was introduced. This requires wait states for all cases (previously was just added for SourceSwap = 1).

Local testing (previously was failing)

$ ./build/release/clients/staging/hipblaslt-bench --verify --print_kernel_info --a_type f32_r --b_type f32_r --c_type f32_r --d_type f32_r --compute_type xf32_r -m 512 -n 2048 -k 64 --transA T --transB N --activation_type none --rotating 512 --cold_iters 100 --iters 2000 --initialization trig_float --alpha 1.0000 --beta 0.0000
hipBLASLt version: 100100
hipBLASLt git version: 5596db44bc-dirty
Query device success: there are 8 devices. (Target device ID is 0)
Device ID 0 : AMD Radeon Graphics gfx950:sramecc+:xnack-
with 309.2 GB memory, max. SCLK 2400 MHz, max. MCLK 1900 MHz, compute capability 9.5
maxGridDimX 2147483647, sharedMemPerBlock 163.8 KB, maxThreadsPerBlock 1024, warpSize 64

Rotating buffer 512 MiB. Needed Size: 4 MiB. Needed block count: 111 (Capped to max iters: 2000)
Is supported 1 / Total solutions: 1
[0]:transA,transB,grouped_gemm,batch_count,m,n,k,alpha,lda,stride_a,beta,ldb,stride_b,ldc,stride_c,ldd,stride_d,a_type,b_type,c_type,d_type,compute_type,scaleA,scaleB,scaleC,scaleD,amaxD,activation_type,bias_vector,bias_type,aux_type,rotating_buffer,hipblaslt-Gflops,hipblaslt-GB/s,us,CPU-Gflops,CPU-us,norm_error,atol,rtol
    T,N,0,1,512,2048,64,1,64,32768,0,64,131072,512,1048576,512,1048576,f32_r,f32_r,f32_r,f32_r,xf32_r,0,0,0,0,0,none,0,f32_r,f32_r,512,22129.9,744.699,6.065,5.76784,23270,6.04562e-07,0.0001,1e-06
    --Solution index: 12891
    --Solution name:  Cijk_Alik_Bljk_S_MX_B_UserArgs_MT64x64x32_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA4_GRVWB4_GSU0_GSUAMB_GSUC0_GSUWGMRR0_GLS0_ISA950_IU1_K1_LBSPPA256_LBSPPB256_LBSPPM0_LPA8_LPB8_LPM0_LRVW4_LWPMn1_MIAV0_MIWT2_2_MO40_NTn1_NTA0_NTB0_NTC0_NTD0_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR0_PKA1_SIA3_SS1_SU0_SUM0_SUS128_SPO0_SRVW0_SSO0_SVW2_SK3_SKXCCM8_TLDS1_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA2_VWB2_WSGRA0_WSGRB0_WS64_WG32_8_1_WGM6_WGMXCC1_WGMXCCGn1
    --kernel name:    Cijk_Alik_Bljk_S_MX_B_UserArgs_MT64x64x32_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA4_GRVWB4_GSU0_GSUAMB_GLS0_ISA950_IU1_K1_LBSPPA256_LBSPPB256_LBSPPM0_LPA8_LPB8_LPM0_LRVW4_LWPMn1_MIAV0_MIWT2_2_MO40_NTn1_NTA0_NTB0_NTC0_NTD0_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR0_PKA1_SIA3_SS1_SPO0_SRVW0_SSO0_SVW2_SK3_SKXCCM8_TLDS1_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA2_VWB2_WSGRA0_WSGRB0_WS64_WG32_8_1

@b-shi b-shi requested a review from a team as a code owner August 6, 2025 23:05
@b-shi b-shi added the gfx950 run CI on gfx950 label Aug 6, 2025
@b-shi b-shi merged commit 6c59f7a into develop Aug 7, 2025
8 of 12 checks passed
@b-shi b-shi deleted the users/brianshi/tf32_fix branch August 7, 2025 17:42
assistant-librarian Bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Aug 7, 2025
[hipblaslt] Fix hazard in TF32 cvt sequence

Due to changes in ROCm/rocm-libraries#913
`s_mov_b64` was introduced. This requires wait states for all cases
(previously was just added for SourceSwap = 1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants