Skip to content

fix an issue with two consumers when output of ForOp is used#5

Merged
manman-ren merged 2 commits into
wsfrom
mren/ws-fix
Dec 6, 2024
Merged

fix an issue with two consumers when output of ForOp is used#5
manman-ren merged 2 commits into
wsfrom
mren/ws-fix

Conversation

@manman-ren
Copy link
Copy Markdown
Contributor

Summary: For an example with
IfOp
o = forOp
mulf that uses o

Prior to the change:
IfOp
o1 = forOp
mulf that uses o1
IfOp
o2 = forOp
mulf that uses o1

After:
IfOp
o1 = forOp
use o1
IfOp
o2 = forOp
use o2

We should not replace use of o with output of the specialized forOp while handling the first taskId, that will make handling of the next taskId incorrect. Instead of we update the mapping that is private to each taskId.

Summary: For an example with
IfOp
  o = forOp
  mulf that uses o

Prior to the change:
IfOp
  o1 = forOp
  mulf that uses o1
IfOp
  o2 = forOp
  mulf that uses o1

After:
IfOp
  o1 = forOp
  use o1
IfOp
  o2 = forOp
  use o2

We should not replace use of o with output of the specialized forOp
while handling the first taskId, that will make handling of the next taskId incorrect.
Instead of we update the mapping that is private to each taskId.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2024
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@manman-ren manman-ren requested a review from htyu December 5, 2024 23:29
Copy link
Copy Markdown
Contributor

@htyu htyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@manman-ren manman-ren merged commit d08b5d0 into ws Dec 6, 2024
@manman-ren manman-ren deleted the mren/ws-fix branch December 9, 2024 19:51
htyu pushed a commit that referenced this pull request Dec 9, 2024
Summary: For an example with
IfOp
  o = forOp
  mulf that uses o

Prior to the change:
IfOp
  o1 = forOp
  mulf that uses o1
IfOp
  o2 = forOp
  mulf that uses o1

After:
IfOp
  o1 = forOp
  use o1
IfOp
  o2 = forOp
  use o2

We should not replace use of o with output of the specialized forOp
while handling the first taskId, that will make handling of the next
taskId incorrect. Instead of we update the mapping that is private to
each taskId.
htyu pushed a commit that referenced this pull request Dec 13, 2024
Summary: For an example with
IfOp
  o = forOp
  mulf that uses o

Prior to the change:
IfOp
  o1 = forOp
  mulf that uses o1
IfOp
  o2 = forOp
  mulf that uses o1

After:
IfOp
  o1 = forOp
  use o1
IfOp
  o2 = forOp
  use o2

We should not replace use of o with output of the specialized forOp
while handling the first taskId, that will make handling of the next
taskId incorrect. Instead of we update the mapping that is private to
each taskId.
dshi7 pushed a commit that referenced this pull request Aug 28, 2025
Getting a crash internally when running `09-persistent-matmul.py`
tutorial, and ASAN reports the following:

```
==7854==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c884c02e800 at pc 0x557f344112d9 bp 0x7b35908a1840 sp 0x7b35908a1838
READ of size 8 at 0x7c884c02e800 thread T1128
    #0 0x557f344112d8 in getNextOperandUsingThisValue third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:43:58
    #1 0x557f344112d8 in operator++ third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:322:39
    #2 0x557f344112d8 in mlir::ResultRange::UseIterator::operator++() third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:613:5
    #3 0x557f2ab70625 in mlir::lowerTokenOperations(mlir::Operation*, int, int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:269:27
    #4 0x557f2ab70de8 in mlir::doTokenLowering(mlir::triton::FuncOp&, unsigned int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:321:3
    #5 0x557f2ab2d018 in mlir::NVGPUWarpSpecializationPass::runOnFuncOp(mlir::triton::FuncOp) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:99:5
    #6 0x557f2ab2c5d6 in operator() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:55
    #7 0x557f2ab2c5d6 in operator() third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:304:7
    #8 0x557f2ab2c5d6 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<std::__u::enable_if<!llvm::is_one_of<mlir::triton::FuncOp, mlir::Operation*, mlir::Region*, mlir::Block*>::value && std::is_same<void, void>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, mlir::ForwardIterator, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp), mlir::triton::FuncOp, void>(mlir::Operation*, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp)&&)::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
    #9 0x557f2820ce45 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:12
    #10 0x557f2820ce45 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:152:5
    #11 0x557f2820ce2c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:147:9
    #12 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), mlir::triton::FuncOp, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:306:10
    #13 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:798:12
    #14 0x557f2ab2c0c9 in mlir::NVGPUWarpSpecializationPass::runOnOperation() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:21
...
```

The problem seems to be that we are iterating through uses, and then
removing some of them inside the loop, which invalidates the iterator.
dshi7 pushed a commit that referenced this pull request Aug 28, 2025
…leaveTMem.cpp (#7924)

`TritonNvidiaGPU/interleave_tmem.mlir` fails under address sanitizer. 

The `ConstantIntOp` operations were created without attachment to any
block in https://github.com/triton-lang/triton/pull/7622, which caused a
memory leak. This change addresses the problem by adding an insertion
point.

<details open>
  <summary>Full log</summary>

=================================================================
==3831==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 576 byte(s) in 6 object(s) allocated from:
#0 0x55c3eca39164 in malloc
[third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp?l=67&ws=tap-presubmit-server/421956858&snapshot=2):3
#1 0x55c3f176afb3 in mlir::Operation::create(mlir::Location,
mlir::OperationName, mlir::TypeRange, mlir::ValueRange,
mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::BlockRange, unsigned
int)
[third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:113](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=113&ws=tap-presubmit-server/421956858&snapshot=2):46
#2 0x55c3f176a90c in create
[third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:74](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=74&ws=tap-presubmit-server/421956858&snapshot=2):10
#3 0x55c3f176a90c in mlir::Operation::create(mlir::Location,
mlir::OperationName, mlir::TypeRange, mlir::ValueRange,
mlir::NamedAttrList&&, mlir::OpaqueProperties, mlir::BlockRange,
mlir::RegionRange)
[third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:57](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=57&ws=tap-presubmit-server/421956858&snapshot=2):7
#4 0x55c3f176a61b in mlir::Operation::create(mlir::OperationState
const&)
[third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:35](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=35&ws=tap-presubmit-server/421956858&snapshot=2):7
#5 0x55c3f1678a78 in mlir::OpBuilder::create(mlir::OperationState
const&)
[third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp:453](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp?l=453&ws=tap-presubmit-server/421956858&snapshot=2):17
#6 0x55c3ecf3668f in mlir::arith::ConstantIntOp
mlir::OpBuilder::create<mlir::arith::ConstantIntOp, int,
int>(mlir::Location, int&&, int&&)
[third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:507](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h?l=507&ws=tap-presubmit-server/421956858&snapshot=2):16
#7 0x55c3eefa690a in findBufferAccessMemdescSubview
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:75](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=75&ws=tap-presubmit-server/421956858&snapshot=2):33
#8 0x55c3eefa690a in mlir::triton::nvidia_gpu::(anonymous
namespace)::findBufferAccess(mlir::Value)
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:151](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=151&ws=tap-presubmit-server/421956858&snapshot=2):12
#9 0x55c3eefa70e7 in mlir::triton::nvidia_gpu::(anonymous
namespace)::findBufferAccess(mlir::Value)
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:156](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=156&ws=tap-presubmit-server/421956858&snapshot=2):34
#10 0x55c3eefa4c0c in tmemMayAlias
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:173](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=173&ws=tap-presubmit-server/421956858&snapshot=2):28
#11 0x55c3eefa4c0c in sinkOps
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:227](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=227&ws=tap-presubmit-server/421956858&snapshot=2):36
#12 0x55c3eefa4c0c in trySinkOp
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:253](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=253&ws=tap-presubmit-server/421956858&snapshot=2):10
#13 0x55c3eefa4c0c in
mlir::triton::nvidia_gpu::TritonNvidiaGPUInterleaveTMemPass::runOnOperation()
[third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:275](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=275&ws=tap-presubmit-server/421956858&snapshot=2):14
#14 0x55c3f1560ad1 in operator()
[third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):17
#15 0x55c3f1560ad1 in void llvm::function_ref<void
()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*,
mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long)
[third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12
#16 0x55c3f1559920 in operator()
[third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12
#17 0x55c3f1559920 in executeAction<mlir::PassExecutionAction,
mlir::Pass &>
[third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h:280](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h?l=280&ws=tap-presubmit-server/421956858&snapshot=2):7
#18 0x55c3f1559920 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*,
mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)
[third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:547](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=547&ws=tap-presubmit-server/421956858&snapshot=2):21
#19 0x55c3f155d46f in runPipeline
[third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:619](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=619&ws=tap-presubmit-server/421956858&snapshot=2):16
#20 0x55c3f155d46f in mlir::PassManager::runPasses(mlir::Operation*,
mlir::AnalysisManager)
[third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:933](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=933&ws=tap-presubmit-server/421956858&snapshot=2):10
#21 0x55c3f155d15b in mlir::PassManager::run(mlir::Operation*)
[third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:913](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=913&ws=tap-presubmit-server/421956858&snapshot=2):60
#22 0x55c3ed0a8b20 in performActions(llvm::raw_ostream&,
std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*,
mlir::MlirOptMainConfig const&)
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:477](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=477&ws=tap-presubmit-server/421956858&snapshot=2):17
#23 0x55c3ed0a8363 in processBuffer
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):12
#24 0x55c3ed0a8363 in operator()
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:642](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=642&ws=tap-presubmit-server/421956858&snapshot=2):12
#25 0x55c3ed0a8363 in llvm::LogicalResult
llvm::function_ref<llvm::LogicalResult
(std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef
const&,
llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&,
std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&,
mlir::MlirOptMainConfig const&)::$_0>(long,
std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef
const&, llvm::raw_ostream&)
[third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12
#26 0x55c3f17bd34f in operator()
[third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12
#27 0x55c3f17bd34f in
mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>,
llvm::function_ref<llvm::LogicalResult
(std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef
const&, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef,
llvm::StringRef)
[third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp:30](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp?l=30&ws=tap-presubmit-server/421956858&snapshot=2):12
#28 0x55c3ed09d0c6 in mlir::MlirOptMain(llvm::raw_ostream&,
std::__u::unique_ptr<llvm::MemoryBuffer,
std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&,
mlir::MlirOptMainConfig const&)
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:647](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=647&ws=tap-presubmit-server/421956858&snapshot=2):26
#29 0x55c3ed09d67f in mlir::MlirOptMain(int, char**, llvm::StringRef,
llvm::StringRef, mlir::DialectRegistry&)
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:693](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=693&ws=tap-presubmit-server/421956858&snapshot=2):14
#30 0x55c3ed09dc59 in mlir::MlirOptMain(int, char**, llvm::StringRef,
mlir::DialectRegistry&)
[third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:709](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=709&ws=tap-presubmit-server/421956858&snapshot=2):10
#31 0x55c3eca74a70 in main
[third_party/triton/bin/triton-opt.cpp:14](https://cs.corp.google.com/piper///depot/google3/third_party/triton/bin/triton-opt.cpp?l=14&ws=tap-presubmit-server/421956858&snapshot=2):33
#32 0x7f1fd58613d3 in __libc_start_main
(/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId:
9a996398ce14a94560b0c642eb4f6e94)
#33 0x55c3ec995aa9 in _start
/usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

</details>

---------

Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
agron911 pushed a commit that referenced this pull request Oct 30, 2025
…mlir` test (#8117)

IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.

We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.

Here is the error:

```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

READ of size 8 at 0x557bc517caa0 thread T0

    #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

    #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

    #2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

    #3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

    #4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

    #5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

    #6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

    #7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

    #8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

    #9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```

(cherry picked from commit 4f5f43e)
htyu pushed a commit to htyu/triton-1 that referenced this pull request Jan 6, 2026
…mlir` test (#8117)

IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.

We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.

Here is the error:

```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

READ of size 8 at 0x557bc517caa0 thread T0

    #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

    facebookexperimental#1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

    facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

    facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

    facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

    facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

    facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

    facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

    facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

    facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```
agron911 added a commit to agron911/triton that referenced this pull request Feb 10, 2026
…issue in `tensor_layout_print.mlir` test (#8117)'

Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8117

Upstream commit message:
```
> Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117)

> IIUC, the initialization order between static and non-static variables
> is not guaranteed, so we can't use the previous non-static variable to
> initialize a static one later on. Working around that by moving it into
> a static function variable.

> We discovered this when upgrading to a newer LLVM version, so it might
> only be reproducible with new LLVM.

> Here is the error:

> ```
> ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

> READ of size 8 at 0x557bc517caa0 thread T0

>     #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

>     #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

>     facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

>     facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

>     facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

>     facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

>     facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

>     facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
> ```
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 4f5f43e
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3
```

Differential Revision: D92893421
agron911 added a commit to agron911/triton that referenced this pull request Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8117

Upstream commit message:
```
> Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117)

> IIUC, the initialization order between static and non-static variables
> is not guaranteed, so we can't use the previous non-static variable to
> initialize a static one later on. Working around that by moving it into
> a static function variable.

> We discovered this when upgrading to a newer LLVM version, so it might
> only be reproducible with new LLVM.

> Here is the error:

> ```
> ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

> READ of size 8 at 0x557bc517caa0 thread T0

>     #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

>     #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

>     facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

>     facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

>     facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

>     facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

>     facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

>     facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
> ```
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 4f5f43e
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3
```

Reviewed By: stashuk-olek

Differential Revision: D92893421
agron911 added a commit to agron911/triton that referenced this pull request Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8117

Upstream commit message:
```
> Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117)

> IIUC, the initialization order between static and non-static variables
> is not guaranteed, so we can't use the previous non-static variable to
> initialize a static one later on. Working around that by moving it into
> a static function variable.

> We discovered this when upgrading to a newer LLVM version, so it might
> only be reproducible with new LLVM.

> Here is the error:

> ```
> ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

> READ of size 8 at 0x557bc517caa0 thread T0

>     #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

>     #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

>     facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

>     facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

>     facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

>     facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

>     facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

>     facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
> ```
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 4f5f43e
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3
```

Reviewed By: stashuk-olek

Differential Revision: D92893421
agron911 added a commit to agron911/triton that referenced this pull request Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883)

Summary:

This is a cherry-pick of an upstream PR: triton-lang/triton#8117

Upstream commit message:
```
> Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117)

> IIUC, the initialization order between static and non-static variables
> is not guaranteed, so we can't use the previous non-static variable to
> initialize a static one later on. Working around that by moving it into
> a static function variable.

> We discovered this when upgrading to a newer LLVM version, so it might
> only be reproducible with new LLVM.

> Here is the error:

> ```
> ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

> READ of size 8 at 0x557bc517caa0 thread T0

>     #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

>     #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

>     facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

>     facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

>     facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

>     facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

>     facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

>     facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

>     facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
> ```
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 4f5f43e
---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3
```

Reviewed By: stashuk-olek

Differential Revision: D92893421
meta-codesync Bot pushed a commit that referenced this pull request Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (#883)

Summary:
Pull Request resolved: #883

This is a cherry-pick of an upstream PR: triton-lang/triton#8117

Upstream commit message:
```
> Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117)

> IIUC, the initialization order between static and non-static variables
> is not guaranteed, so we can't use the previous non-static variable to
> initialize a static one later on. Working around that by moving it into
> a static function variable.

> We discovered this when upgrading to a newer LLVM version, so it might
> only be reproducible with new LLVM.

> Here is the error:

> ```
> ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268

> READ of size 8 at 0x557bc517caa0 thread T0

>     #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38

>     #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5

>     #2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27

>     #3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9

>     #4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10

>     #5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10

>     #6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5

>     #7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17

>     #8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5

>     #9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
> ```
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: 4f5f43e
 ---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3
```

Reviewed By: stashuk-olek

Differential Revision: D92893421

fbshipit-source-id: 4d4bfcf7459e4ff08a3a06a4292e656df3249675
htyu added a commit to htyu/triton-1 that referenced this pull request Apr 27, 2026
…-pipeline clustering

Summary:
The design doc assumed each warp group maps to a single hardware pipeline (Limitation facebookexperimental#5). This breaks for mixed groups like the epilogue (CUDA+MEM) or compute (CUDA+SFU). This change replaces that assumption with a latency-aware algorithm that uses two signals from the modulo schedule to decide which pipelines should share a warp group:

1. **Separation cost**: barrier overhead (∼30 cycles) relative to the cycle gap between cross-pipeline ops. Tightly-coupled ops (small gap) stay together; loosely-coupled ops (large gap) are separated.

2. **Multi-pipeline makespan**: list scheduling with per-pipeline resource tracking validates that a merged group can execute within II, correctly modeling that different pipelines overlap while data dependencies serialize.

The algorithm is greedy agglomerative clustering: start with one group per active pipeline, merge the highest-coupling pair if makespan allows, repeat. Worked examples show it reproduces hand-tuned results: GEMM stays at 2 groups (MEM↔TC coupling = 0.03, not worth merging), FA Forward produces 3 groups with CUDA+SFU merged (coupling = 0.23), and epilogue chains merge into a single multi-pipeline group.

The partitioning is moved from Pass B Step 1 into Pass A as Step 4.7, inside the iterative refinement loop, so it gets recomputed when DDG transformations change the schedule. The ScheduleGraph now carries warp group assignments (`modulo.warp_group`, per-node `wg` field), and Pass B Step 1 becomes a thin reader.

Reviewed By: wlei-llvm

Differential Revision: D102487494
meta-codesync Bot pushed a commit that referenced this pull request Apr 27, 2026
…-pipeline clustering (#1366)

Summary:
Pull Request resolved: #1366

The design doc assumed each warp group maps to a single hardware pipeline (Limitation #5). This breaks for mixed groups like the epilogue (CUDA+MEM) or compute (CUDA+SFU). This change replaces that assumption with a latency-aware algorithm that uses two signals from the modulo schedule to decide which pipelines should share a warp group:

1. **Separation cost**: barrier overhead (∼30 cycles) relative to the cycle gap between cross-pipeline ops. Tightly-coupled ops (small gap) stay together; loosely-coupled ops (large gap) are separated.

2. **Multi-pipeline makespan**: list scheduling with per-pipeline resource tracking validates that a merged group can execute within II, correctly modeling that different pipelines overlap while data dependencies serialize.

The algorithm is greedy agglomerative clustering: start with one group per active pipeline, merge the highest-coupling pair if makespan allows, repeat. Worked examples show it reproduces hand-tuned results: GEMM stays at 2 groups (MEM↔TC coupling = 0.03, not worth merging), FA Forward produces 3 groups with CUDA+SFU merged (coupling = 0.23), and epilogue chains merge into a single multi-pipeline group.

The partitioning is moved from Pass B Step 1 into Pass A as Step 4.7, inside the iterative refinement loop, so it gets recomputed when DDG transformations change the schedule. The ScheduleGraph now carries warp group assignments (`modulo.warp_group`, per-node `wg` field), and Pass B Step 1 becomes a thin reader.

Reviewed By: wlei-llvm

Differential Revision: D102487494

fbshipit-source-id: 4d6c49b13346be2d8d1c17314d2f3cfebc641d3b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants