-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3.1 8b fp16 takes ~20 minutes to compile #19049
Comments
Thanks for the details in this report. Narrowing down the commit range helps quite a bit. More tips: https://iree.dev/developers/debugging/compile-time-regressions/ |
I'm having similar problems with 70b currently, I'll upload a |
watch it be one of the compile performance optimizations that made it worse :P |
Looks like the majority of the time is spent in I'd be interested in trying to resolve this, but because this seems pretty blocking for llama dev, it might be better to have someone with more familiarity work on it. |
heh, yeah, that'll do it
There may be some short-circuiting we can add to the analysis (avoid walking into linalg ops or something) but I'm not sure of the impact on the analysis results. We may need to internally parallelize that given that these funcs are so big. Unless there's a few obvious standouts in a perf dump/tracy with sampling enabled we may have to turn it off for this model until the larger changes can be made. I'm not sure if we've already started needing the results for good codegen, though, so it's a risk. |
Here's the full commit range between the nightly releases you referenced: candidate-20241104.1068...candidate-20241105.1069 |
Odd - nothing stands out besides the integrate - most other changes were in codegen or runtime HAL API - I was expecting a flag flip. |
This MLIR will repro the perf issue on TOM with |
hah watch it be iree-org/llvm-project@3494ee9 - adding an assert to APInt will make LLVM even worse in debug builds, hooray~~~~ we should make sure asserts are disabled and try timing (and if the issue push for that to be rolled back/behind an opt-in-aggressive flag - APInt can be constructed a bajillion times/sec and that assert path is not cheap). I'd try going before/after the integrate and also comparing torch IR before/after (as there's both LLVM and torch in there). |
That assert should really be behind LLVM_ENABLE_EXPENSIVE_CHECKS |
yeah - definitely! it's really not good to have that in that value-type ctor |
Apparently iree-org/llvm-project@3494ee9 could also be causing problems too so there might be more to do. |
hah! nice find/fix ian! |
What happened?
Llama 3.1 8b fp16 with decomposed (not sdpa) flash attention takes ~20 minutes to compile with
iree-compiler==20241105.1069
, but it only took <1 minute to compile withiree-compiler==20241104.1068
/ also before this commit f71dd12.Steps to reproduce your issue
../iree-build-no-trace/tools/iree-compile 8b_f16_decomposed.mlir --iree-hip-target=gfx942 --iree-hal-target-backends=rocm -o=test_decomposed_tom.vmfb
What component(s) does this issue relate to?
No response
Version information
41ed8c0
Additional context
No response
The text was updated successfully, but these errors were encountered: