Skip to content

Conversation

@amd-eochoalo
Copy link
Contributor

Signed-off-by: Erick Ochoa <[email protected]>
@amd-eochoalo amd-eochoalo marked this pull request as ready for review October 3, 2025 14:04
@amd-eochoalo amd-eochoalo merged commit 4a4f7fe into iree-org:main Oct 3, 2025
45 checks passed
newling added a commit that referenced this pull request Oct 6, 2025
Carries same 2 reverts as previous integrate
#22200
Updates IREE for LLVM tablegen change
llvm/llvm-project#161744

Signed-off-by: James Newling <[email protected]>
newling added a commit that referenced this pull request Oct 9, 2025
Carries same 2 reverts as previous integrates
#22200 and
#22214

Change in IREE for change some TOSA pass logic:
llvm/llvm-project#153771
Change in IREE for deprecated LLVM Triple API:
llvm/llvm-project#162186
Currently includes patch #22241
which is a pure IREE fix
Increases golden times for 2 models (<5%): `assert 10.864054075338773 <=
10.5`

Noticed this potential flake on windows at some point: lit test
ksplitmatmul_basic

Follow-up: understand #22241 (why can we not just assert it is not null?)

---------

Signed-off-by: James Newling <[email protected]>
weidel-p pushed a commit to weidel-p/iree that referenced this pull request Oct 21, 2025
Carries same 2 reverts as previous integrates
iree-org#22200 and
iree-org#22214

Change in IREE for change some TOSA pass logic:
llvm/llvm-project#153771
Change in IREE for deprecated LLVM Triple API:
llvm/llvm-project#162186
Currently includes patch iree-org#22241
which is a pure IREE fix
Increases golden times for 2 models (<5%): `assert 10.864054075338773 <=
10.5`

Noticed this potential flake on windows at some point: lit test
ksplitmatmul_basic

Follow-up: understand iree-org#22241 (why can we not just assert it is not null?)

---------

Signed-off-by: James Newling <[email protected]>
Signed-off-by: Philipp <[email protected]>
@yash-amd
Copy link

yash-amd commented Oct 31, 2025

Due to this pr we are seeing a performance degradation of 9% in decode time for llama-8b-fp8 model.

Repro:

Steps to reproduce:
Mlir is here

Setup IREE and Run the following steps on the sharkmi300x-4 machine

Compile Mlir With:

iree-compile output.mlir \
        --iree-hip-target=gfx942 -o output.vmfb \
        --iree-hal-target-device=hip --iree-opt-level=O3 \
        --iree-hal-indirect-command-buffers=true \
        --iree-stream-resource-memory-model=discrete \
        --iree-hip-enable-tensor-ukernels \
        --iree-hal-memoization=true \
        --iree-dispatch-creation-propagate-collapse-across-expands=true \
        --iree-hip-specialize-dispatches

Test for benchmark(performance):

iree-benchmark-module --hip_use_streams=true \
        --module=output.vmfb \
        --parameters=model=/shark-dev/8b/fp8/attnf8/native_fp8_e4m3fnuz_llama3_8b.irpa \
        --device=hip \
        --function=decode_bs4 \
        --input=4x1xi64 \
        --input=4xi64 \
        --input=4xi64 \
        --input=4x65xi64 \
        --input=261x2097152xf8E4M3FNUZ \
        --benchmark_repetitions=3 \
        --benchmark_out_format=json \
        --benchmark_out=<BENCHMARK_DIR>/llama-8B-FP8_decode_bs4_isl_2048.json

output time is generated in the <benchmark_dir>/llama-8B-FP8_prefill_bs4_isl_2048.json file.

mischirmer pushed a commit to mischirmer/iree that referenced this pull request Nov 24, 2025
Carries same 2 reverts as previous integrate
iree-org/iree#22200
Updates IREE for LLVM tablegen change
llvm/llvm-project#161744

Signed-off-by: James Newling <[email protected]>
mischirmer pushed a commit to mischirmer/iree that referenced this pull request Nov 24, 2025
Carries same 2 reverts as previous integrates
iree-org/iree#22200 and
iree-org/iree#22214

Change in IREE for change some TOSA pass logic:
llvm/llvm-project#153771
Change in IREE for deprecated LLVM Triple API:
llvm/llvm-project#162186
Currently includes patch iree-org/iree#22241
which is a pure IREE fix
Increases golden times for 2 models (<5%): `assert 10.864054075338773 <=
10.5`

Noticed this potential flake on windows at some point: lit test
ksplitmatmul_basic

Follow-up: understand iree-org/iree#22241 (why can we not just assert it is not null?)

---------

Signed-off-by: James Newling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants