[CodePartition] Support arbitrary data channel#6
Conversation
Summary: For an example with IfOp o = forOp mulf that uses o Prior to the change: IfOp o1 = forOp mulf that uses o1 IfOp o2 = forOp mulf that uses o1 After: IfOp o1 = forOp use o1 IfOp o2 = forOp use o2 We should not replace use of o with output of the specialized forOp while handling the first taskId, that will make handling of the next taskId incorrect. Instead of we update the mapping that is private to each taskId.
e0e225e to
735779c
Compare
| Operation *dstOp; | ||
| Value srcOperand; | ||
| Operation *op; | ||
| unsigned operandIdx; |
There was a problem hiding this comment.
I wonder why we need to change from (srcOp, dstOp, srcOperand) to (op, operandIdx). Will it be better if we keep dstOp instead of op?
There was a problem hiding this comment.
This is because the source operations can be changing. For example it could be a loop, or an if. The consumers are relatively stable. op now means dstOp, and the class now provides setter functions to access srcOp,dstOp, srcOperand, dstOperand.
| for (Operation *userOp : result.getUsers()) { | ||
|
|
||
| SetVector<std::pair<Operation *, unsigned>> users; | ||
| getTransitiveUsers(result, users); |
There was a problem hiding this comment.
Is there a case where we need to get transitive users?
There was a problem hiding this comment.
Yes, like below which is from fusing one kernel into one gemm kernel:
for (...)
c = dot(a,b,c)
op1(c)
op2(c)
There will be two channels created. Both has the loop as the producer.
| // Group channels by producer op. | ||
| DenseMap<Operation *, SmallVector<Channel *>> producerChannels; | ||
| for (auto channel : channels) { | ||
| producerChannels[channel->getSrcOp()].push_back(channel); |
There was a problem hiding this comment.
Is this the only place we update producerChannels? the rest of the function seems to update consumerChannels which existed prior to the change (i.e to minimize number of sync points).
There was a problem hiding this comment.
Yes. The function computes producerChannels first followed by the computation of consumerChannels.
| if (isa<triton::LoadOp>(c->srcOp)) { | ||
| // After createAsyncCopy, c->srcOp/headProducer are no longer valid. | ||
| createAsyncCopy(bufferMap, c, c->srcOp, asyncTasksPC, bufferIdx, | ||
| bufferIdx); |
There was a problem hiding this comment.
Oh lowering for non-tma loads are moved out of this function.
There was a problem hiding this comment.
Yeah as a part of the barrier insertion, as they are coupled.
| } | ||
| if (it == mutuallyNonDominatingChannels.end()) | ||
| mutuallyNonDominatingChannels.insert(c); | ||
| } |
There was a problem hiding this comment.
Will this have any impact on the default mode where producer is a loadOp? I guess it can, if the load has multiple consumers. Will we create multiple async_copy?
There was a problem hiding this comment.
Yes it affects load with multiple consumers where we haven't seen so far. If the load has multiple consumers, we will create multiple async_copy without the current change. With only one buffer is created for channels that can be compressed (i.e, one master dominating channel).
Previously data channels are created for data flows originating from a load operation to its consumers only. This change adds fundamental support for channeling from any operation to its consumers. This will be useful for the cases where data partition isn't available, such as when BLOCK_M is smaller than 128. To support arbitrary data channel where SMEM isn't naturally, a `local_store` is created on the producer side to dump the register data into SMEM and a `local_load` use created on the consumer side to load data back into registers. The patch also extends the previous data channeling logics to handle one-producer multi-consumer mode, where only one buffer is needed. Some refactoring is also done to the existing code base. For example, async copy creating code is separated from async communication emission. The data structure of `Channel` is refactored mainly to get the case where producer operation can be deallocated/reallocated, e.g, a `forOp`. Although the current auto partition heuristics still only considers `DotLike` ops in addition to previous `LoadOp` and `ExperimentalDescriptorLoadOp`, the logics there can be extended to work on more anchor ops.
Getting a crash internally when running `09-persistent-matmul.py`
tutorial, and ASAN reports the following:
```
==7854==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c884c02e800 at pc 0x557f344112d9 bp 0x7b35908a1840 sp 0x7b35908a1838
READ of size 8 at 0x7c884c02e800 thread T1128
#0 0x557f344112d8 in getNextOperandUsingThisValue third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:43:58
#1 0x557f344112d8 in operator++ third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:322:39
#2 0x557f344112d8 in mlir::ResultRange::UseIterator::operator++() third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:613:5
#3 0x557f2ab70625 in mlir::lowerTokenOperations(mlir::Operation*, int, int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:269:27
#4 0x557f2ab70de8 in mlir::doTokenLowering(mlir::triton::FuncOp&, unsigned int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:321:3
#5 0x557f2ab2d018 in mlir::NVGPUWarpSpecializationPass::runOnFuncOp(mlir::triton::FuncOp) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:99:5
#6 0x557f2ab2c5d6 in operator() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:55
#7 0x557f2ab2c5d6 in operator() third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:304:7
#8 0x557f2ab2c5d6 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<std::__u::enable_if<!llvm::is_one_of<mlir::triton::FuncOp, mlir::Operation*, mlir::Region*, mlir::Block*>::value && std::is_same<void, void>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, mlir::ForwardIterator, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp), mlir::triton::FuncOp, void>(mlir::Operation*, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp)&&)::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
#9 0x557f2820ce45 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:12
#10 0x557f2820ce45 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:152:5
#11 0x557f2820ce2c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:147:9
#12 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), mlir::triton::FuncOp, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:306:10
#13 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:798:12
#14 0x557f2ab2c0c9 in mlir::NVGPUWarpSpecializationPass::runOnOperation() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:21
...
```
The problem seems to be that we are iterating through uses, and then
removing some of them inside the loop, which invalidates the iterator.
…leaveTMem.cpp (#7924) `TritonNvidiaGPU/interleave_tmem.mlir` fails under address sanitizer. The `ConstantIntOp` operations were created without attachment to any block in https://github.com/triton-lang/triton/pull/7622, which caused a memory leak. This change addresses the problem by adding an insertion point. <details open> <summary>Full log</summary> ================================================================= ==3831==ERROR: LeakSanitizer: detected memory leaks Direct leak of 576 byte(s) in 6 object(s) allocated from: #0 0x55c3eca39164 in malloc [third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp?l=67&ws=tap-presubmit-server/421956858&snapshot=2):3 #1 0x55c3f176afb3 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::BlockRange, unsigned int) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:113](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=113&ws=tap-presubmit-server/421956858&snapshot=2):46 #2 0x55c3f176a90c in create [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:74](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=74&ws=tap-presubmit-server/421956858&snapshot=2):10 #3 0x55c3f176a90c in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::OpaqueProperties, mlir::BlockRange, mlir::RegionRange) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:57](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=57&ws=tap-presubmit-server/421956858&snapshot=2):7 #4 0x55c3f176a61b in mlir::Operation::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:35](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=35&ws=tap-presubmit-server/421956858&snapshot=2):7 #5 0x55c3f1678a78 in mlir::OpBuilder::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp:453](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp?l=453&ws=tap-presubmit-server/421956858&snapshot=2):17 #6 0x55c3ecf3668f in mlir::arith::ConstantIntOp mlir::OpBuilder::create<mlir::arith::ConstantIntOp, int, int>(mlir::Location, int&&, int&&) [third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:507](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h?l=507&ws=tap-presubmit-server/421956858&snapshot=2):16 #7 0x55c3eefa690a in findBufferAccessMemdescSubview [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:75](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=75&ws=tap-presubmit-server/421956858&snapshot=2):33 #8 0x55c3eefa690a in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:151](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=151&ws=tap-presubmit-server/421956858&snapshot=2):12 #9 0x55c3eefa70e7 in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:156](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=156&ws=tap-presubmit-server/421956858&snapshot=2):34 #10 0x55c3eefa4c0c in tmemMayAlias [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:173](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=173&ws=tap-presubmit-server/421956858&snapshot=2):28 #11 0x55c3eefa4c0c in sinkOps [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:227](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=227&ws=tap-presubmit-server/421956858&snapshot=2):36 #12 0x55c3eefa4c0c in trySinkOp [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:253](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=253&ws=tap-presubmit-server/421956858&snapshot=2):10 #13 0x55c3eefa4c0c in mlir::triton::nvidia_gpu::TritonNvidiaGPUInterleaveTMemPass::runOnOperation() [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:275](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=275&ws=tap-presubmit-server/421956858&snapshot=2):14 #14 0x55c3f1560ad1 in operator() [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):17 #15 0x55c3f1560ad1 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 #16 0x55c3f1559920 in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 #17 0x55c3f1559920 in executeAction<mlir::PassExecutionAction, mlir::Pass &> [third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h:280](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h?l=280&ws=tap-presubmit-server/421956858&snapshot=2):7 #18 0x55c3f1559920 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:547](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=547&ws=tap-presubmit-server/421956858&snapshot=2):21 #19 0x55c3f155d46f in runPipeline [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:619](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=619&ws=tap-presubmit-server/421956858&snapshot=2):16 #20 0x55c3f155d46f in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:933](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=933&ws=tap-presubmit-server/421956858&snapshot=2):10 #21 0x55c3f155d15b in mlir::PassManager::run(mlir::Operation*) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:913](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=913&ws=tap-presubmit-server/421956858&snapshot=2):60 #22 0x55c3ed0a8b20 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:477](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=477&ws=tap-presubmit-server/421956858&snapshot=2):17 #23 0x55c3ed0a8363 in processBuffer [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):12 #24 0x55c3ed0a8363 in operator() [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:642](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=642&ws=tap-presubmit-server/421956858&snapshot=2):12 #25 0x55c3ed0a8363 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 #26 0x55c3f17bd34f in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 #27 0x55c3f17bd34f in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) [third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp:30](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp?l=30&ws=tap-presubmit-server/421956858&snapshot=2):12 #28 0x55c3ed09d0c6 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:647](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=647&ws=tap-presubmit-server/421956858&snapshot=2):26 #29 0x55c3ed09d67f in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:693](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=693&ws=tap-presubmit-server/421956858&snapshot=2):14 #30 0x55c3ed09dc59 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:709](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=709&ws=tap-presubmit-server/421956858&snapshot=2):10 #31 0x55c3eca74a70 in main [third_party/triton/bin/triton-opt.cpp:14](https://cs.corp.google.com/piper///depot/google3/third_party/triton/bin/triton-opt.cpp?l=14&ws=tap-presubmit-server/421956858&snapshot=2):33 #32 0x7f1fd58613d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94) #33 0x55c3ec995aa9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 </details> --------- Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
…mlir` test (#8117)
IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.
We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.
Here is the error:
```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268
READ of size 8 at 0x557bc517caa0 thread T0
#0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38
#1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5
#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27
#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9
#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10
#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10
#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5
#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17
#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5
#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```
(cherry picked from commit 4f5f43e)
…mlir` test (#8117)
IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.
We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.
Here is the error:
```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268
READ of size 8 at 0x557bc517caa0 thread T0
#0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38
facebookexperimental#1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5
facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27
facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9
facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10
facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10
facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5
facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17
facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5
facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```
…issue in `tensor_layout_print.mlir` test (#8117)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Differential Revision: D92893421
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
…issue in `tensor_layout_print.mlir` test (#8117)' (#883) Summary: Pull Request resolved: #883 This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > #2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > #3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > #4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > #5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > #6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > #7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > #8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > #9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421 fbshipit-source-id: 4d4bfcf7459e4ff08a3a06a4292e656df3249675
Previously data channels are created for data flows originating from a load operation to its consumers only. This change adds fundamental support for channeling from any operation to its consumers. This will be useful for the cases where data partition isn't available, such as when BLOCK_M is smaller than 128.
To support arbitrary data channel where SMEM isn't naturally, a
local_storeis created on the producer side to dump the register data into SMEM and alocal_loaduse created on the consumer side to load data back into registers.The patch also extends the previous data channeling logics to handle one-producer multi-consumer mode, where only one buffer is needed.
Some refactoring is also done to the existing code base. For example, async copy creating code is separated from async communication emission. The data structure of
Channelis refactored mainly to get the case where producer operation can be deallocated/reallocated, e.g, aforOp.Although the current auto partition heuristics still only considers
DotLikeops in addition to previousLoadOpandExperimentalDescriptorLoadOp, the logics there can be extended to work on more anchor ops.