Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip naive conv testing to speed up #3383

Open
RobQuistNL opened this issue Nov 14, 2024 · 5 comments
Open

Skip naive conv testing to speed up #3383

RobQuistNL opened this issue Nov 14, 2024 · 5 comments

Comments

@RobQuistNL
Copy link

RobQuistNL commented Nov 14, 2024

Hi,

Looking at running various models with various inputs - it seems a lot of time for the initial runs is being spent benchmarking potential kernels - including the naive ones (e.g. naive_conv_nonpacked_fwd_nchw_float_double_float)

The solution that comes up usually is not the naive one, but one of the other kernels. Running with MIOPEN_DEBUG_CONV_DIRECT=0 significantly speeds up initial runs of said model with varying resolutions.

Would it be an option to get this testing / benching dynamically, without excluding it completely? Where the naive kernel would be the least preferred - and if another is found it would be a safe bet to say the other implementation is faster (so the testing of the kernel itself could be skipped alltogether)

If its not desired behaviour - maybe this could be added behind a feature flag.

I'm quite sure that people running this without knowing about it, would experience major speedups in initial runs (the test case here is various VAE models being ran).

@RobQuistNL
Copy link
Author

RobQuistNL commented Nov 14, 2024

Here's a snippet from the ufdb in question - I'm not 100% sure but I think this shows that some of those ConvDirectNaive kernels take a lot of time;

Click to view `HIP.3_2_0.ufdb.txt`

HIP.3_2_0.ufdb.txt

1920-26-32-1x1-1280-26-32-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.25571,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.736573,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.289919,15564800,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.116864,19169280,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.266399,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:16.9739,0,miopenConvolutionFwdAlgoDirect
1280-52-64-3x3-1280-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.34123,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.71735,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.64133,63569920,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.03775,97648640,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.92911,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:240.954,0,miopenConvolutionFwdAlgoDirect
1920-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.55705,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:5.0626,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.24214,56197120,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.34646,73236480,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.36562,115015680,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:200.26,0,miopenConvolutionFwdAlgoDirect
1920-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.49522,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.22867,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.406174,36536320,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.242687,51118080,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.269183,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:35.5401,0,miopenConvolutionFwdAlgoDirect
1280-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.7151,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.38824,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.42153,40304640,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.02272,57344000,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.19766,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:129.973,0,miopenConvolutionFwdAlgoDirect
1280-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.67263,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.831037,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.327743,27197440,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.176575,42598400,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.192191,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:23.6178,0,miopenConvolutionFwdAlgoDirect
960-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.29372,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:2.53945,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.967996,32358400,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.697597,49397760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.53158,57507840,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:96.8891,0,miopenConvolutionFwdAlgoDirect
960-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.26147,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.632862,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.283551,22528000,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.135807,38338560,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.219487,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:17.7857,0,miopenConvolutionFwdAlgoDirect
640-104-128-3x3-640-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.39595,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.30868,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.36338,75530240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.94018,143687680,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.62095,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:279.665,0,miopenConvolutionFwdAlgoDirect
960-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.58604,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:4.89873,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.72639,73687040,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.27314,107765760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:4.93534,230031360,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:256.787,0,miopenConvolutionFwdAlgoDirect
960-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.52303,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.25126,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.471806,68771840,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.334623,102236160,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.346175,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:43.6982,0,miopenConvolutionFwdAlgoDirect
640-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.74319,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.34469,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.4166,54804480,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.47539,88883200,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.90972,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:165.897,0,miopenConvolutionFwdAlgoDirect
640-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.70063,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.853245,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.39635,51527680,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.251679,85196800,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.289215,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:28.6757,0,miopenConvolutionFwdAlgoDirect
320-104-128-3x3-4-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:0.0983036,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.173375,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.315071,17275392,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.211327,17701376,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.734749,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:40.4575,0,miopenConvolutionFwdAlgoDirect
4-104-128-1x1-4-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0151358,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0191038,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.142335,426240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.0238398,425984,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.0715196,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.149664,0,miopenConvolutionFwdAlgoDirect
4-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0611198,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0808954,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.287711,27549696,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.102943,27549696,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.111072,1916928,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.749117,0,miopenConvolutionFwdAlgoDirect
512-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.84121,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.48776,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.2487,63963136,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.93567,63963136,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.23257,245366784,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:101.823,0,miopenConvolutionFwdAlgoDirect
512-104-128-1x1-512-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.79679,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.900925,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.531326,55574528,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.365215,54525952,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.273055,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:18.3887,0,miopenConvolutionFwdAlgoDirect
512-208-256-3x3-512-208-256-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.17329,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.9223,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:8.36243,227540992,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:7.62704,227540992,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:8.17904,981467136,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:477.672,0,miopenConvolutionFwdAlgoDirect
512-416-512-3x3-512-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:28.9988,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:49.5818,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:33.8475,881852416,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:32.3395,881852416,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:41.5719,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:2055.18,0,miopenConvolutionFwdAlgoDirect
512-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.8489,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.3045,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:17.5976,659030016,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:16.8068,659030016,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:21.3744,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:1877.97,0,miopenConvolutionFwdAlgoDirect
256-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.64682,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.8598,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:10.3321,438566912,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:8.93561,438566912,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:10.1123,1962934272,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:873.538,0,miopenConvolutionFwdAlgoDirect
512-416-512-1x1-256-416-512-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.3863,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.59623,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:5.1019,654835712,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:4.32277,654311424,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:1.92758,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:286.65,0,miopenConvolutionFwdAlgoDirect
256-832-1024-3x3-256-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:32.2882,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:50.3876,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:44.9249,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:39.9932,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:52.2714,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3816.86,0,miopenConvolutionFwdAlgoDirect
256-832-1024-3x3-128-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:16.3588,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.5977,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:22.7511,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:20.9185,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:27.0331,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3346.55,0,miopenConvolutionFwdAlgoDirect

@ppanchad-amd
Copy link

Hi @RobQuistNL. Internal ticket has been created to assist with your issue. Thanks!

@huanrwan-amd
Copy link

Hi @RobQuistNL, can you please provide more info on your hardware and software version (ROCm version and OS version)? Thanks.

@RobQuistNL
Copy link
Author

hey @huanrwan-amd ;

rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0 as the base image, with pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2 and;

git clone --recursive https://github.com/ROCm/flash-attention /tmp/flash-attention
cd /tmp/flash-attention; export GPU_ARCHS="gfx90a"; pip3 install .

@huanrwan-amd
Copy link

Hi @RobQuistNL, thanks for the info. This issue is more like a feature enhancement. I will contact internal team first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants