-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). #8333
Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). #8333
Conversation
baed7a9
to
8095af6
Compare
Windows failure looks real? |
I don't know... Doesn't look like it's related at all?
|
No, totally unrelated, go ahead and land this |
Does this error look like it could be related to this change? https://buildbot.halide-lang.org/master/#/builders/102/builds/448/steps/12/logs/stdio |
I'll run this test on my MacBook and see if I can figure out what's up. I doubt that this is related to this PR tho. |
Could not reproduce with current top-of-tree of Halide with LLVM 18.1.7 (I do realize the buildbot was using LLVM 19, but I don't have that one installed right now...):
Will try again with a debug build. Update: same for debug build. |
Did you try it with |
Definitely repros locally for me with LLVM 19 |
I did now, and it still doesn't reproduce (I'm under LLVM 18 of course, still). |
Did you revert this PR locally and see if it changes? I'd be surprised if it is related actually. |
When --for some reason-- an allocation group for fused storage for multiple
Func
s that originally are intended to go inGPUShared
gets lifted out of the GPU-block loops, and sits inHeap
memory instead, the profiling injection logic assumed that this buffer came from a function with the same name. This buffer was incorrectly determined to be on the stack, as it ignored thecustom_new
andcustom_free
attributes of theAllocate
node.Consider this example (also included as a new test):
Produces the following Stmt right before the Profiling pass:
Notice how the
allocgroup__f1$0.0__f2$0.1.buffer
is outside of the outermost GPU-block loop. When this buffer didn't get lifted out of the kernel, Profiling wasn't an issue, as the profiler doesn't traverse the IR into GPU loops.The offending line was:
Halide/src/Profiling.cpp
Line 274 in 461c128
When instrumenting the allocate node. The node is incorrectly determined to be
on_stack=true
.This PR checks if there is a custom_new and overrides that it is on the stack to false.
@abadams I wonder if we can't simply rely on
Allocate::MemoryType
to determineon_stack
, or is that stillAuto
at that moment?