Bump tj-actions/changed-files from 44 to 45#1
Closed
dependabot[bot] wants to merge 2 commits into
Closed
Conversation
… annotations This commit is a squash generated by: ``` git diff --stat b62b221a...06ccdadb -- . ':(exclude)python/gemmbench' ':(exclude)python/hstuBench' ':(exclude)third_party/proton' ```
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 44 to 45. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](tj-actions/changed-files@v44...v45) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
30a4248 to
55b6d80
Compare
714a49d to
56df264
Compare
bertmaher
pushed a commit
that referenced
this pull request
Nov 19, 2024
This will fix the following problem: ```bash python: /home/runner/work/triton/triton/llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:168: llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, true, false, void, false, void>, false, false>::operator*() const [OptionsT = llvm::ilist_detail::node_options<mlir::Operation, true, false, void, false, void>, IsReverse = false, IsConst = false]: Assertion `!NodePtr->isKnownSentinel()' failed. Aborted (core dumped) ``` The problem was found when using PyTorch on Intel gpu: <details> <summary> Simplified reproducer #1:</summary> ```python from torch._inductor.async_compile import AsyncCompile async_compile = AsyncCompile() triton_per_fused_add_embedding_native_layer_norm_0 = async_compile.triton('triton_per_fused_add_embedding_native_layer_norm_0', ''' import triton import triton.language as tl from triton.compiler.compiler import AttrsDescriptor from torch._inductor.runtime import triton_helpers, triton_heuristics from torch._inductor.runtime.triton_helpers import libdevice, math as tl_math from torch._inductor.runtime.hints import AutotuneHint, ReductionHint, TileHint, DeviceProperties triton_helpers.set_driver_to_gpu() @triton_heuristics.persistent_reduction( size_hints=[512, 128], reduction_hint=ReductionHint.INNER, filename=__file__, triton_meta={'signature': {'in_ptr0': '*i64', 'in_ptr1': '*fp32', 'in_ptr2': '*fp32', 'in_ptr3': '*fp32', 'in_ptr4': '*fp32', 'in_ptr5': '*fp32', 'out_ptr2': '*fp32', 'xnumel': 'i32', 'rnumel': 'i32'}, 'device': DeviceProperties(type='xpu', index=0, cc={'driver_version': '1.3.30049', 'gpu_eu_count': 448, 'gpu_subslice_count': 56, 'has_atomic64': True, 'has_bfloat16_conversions': True, 'has_fp16': True, 'has_fp64': True, 'has_subgroup_2d_block_io': True, 'has_subgroup_matrix_multiply_accumulate': True, 'has_subgroup_matrix_multiply_accumulate_tensor_float32': False, 'max_compute_units': 448, 'max_num_sub_groups': 64, 'max_work_group_size': 1024, 'name': 'Intel(R) Data Center GPU Max 1100', 'platform_name': 'Intel(R) Level-Zero', 'sub_group_sizes': [16, 32], 'total_memory': 51539607552, 'type': 'gpu', 'vendor': 'Intel(R) Corporation', 'version': '1.3'}, major=None, regs_per_multiprocessor=None, max_threads_per_multi_processor=None, multi_processor_count=None, warp_size=32), 'constants': {}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3, 4, 5, 6, 7, 8), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})]}, inductor_meta={'autotune_hints': set(), 'kernel_name': 'triton_per_fused_add_embedding_native_layer_norm_0', 'mutated_arg_names': [], 'optimize_mem': True, 'no_x_dim': False, 'num_load': 5, 'num_reduction': 4, 'backend_hash': 'D82C2E8E2C9203D653D1A2B8A0511701E4F7567A195A5128E03B9AA7218348AA', 'are_deterministic_algorithms_enabled': True, 'assert_indirect_indexing': True, 'autotune_local_cache': True, 'autotune_pointwise': True, 'autotune_remote_cache': None, 'force_disable_caches': False, 'dynamic_scale_rblock': True, 'max_autotune': False, 'max_autotune_pointwise': False, 'min_split_scan_rblock': 256, 'spill_threshold': 16, 'store_cubin': False} ) @triton.jit def triton_per_fused_add_embedding_native_layer_norm_0(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 512 rnumel = 128 RBLOCK: tl.constexpr = 128 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = xindex < xnumel rindex = tl.arange(0, RBLOCK)[None, :] roffset = 0 rmask = tl.full([XBLOCK, RBLOCK], True, tl.int1) x0 = xindex r1 = rindex tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last') tmp7 = tl.load(in_ptr2 + (r1 + (128*x0)), xmask, other=0.0) tmp9 = tl.load(in_ptr3 + (r1 + (128*x0)), xmask, other=0.0) tmp34 = tl.load(in_ptr4 + (r1), None, eviction_policy='evict_last') tmp36 = tl.load(in_ptr5 + (r1), None, eviction_policy='evict_last') tmp1 = tl.full([XBLOCK, RBLOCK], 30000, tl.int32) tmp2 = tmp0 + tmp1 tmp3 = tmp0 < 0 tmp4 = tl.where(tmp3, tmp2, tmp0) tl.device_assert(((0 <= tmp4) & (tmp4 < 30000)) | ~(xmask), "index out of bounds: 0 <= tmp4 < 30000") ''', device_str='xpu') ``` </details>
Contributor
Author
|
Looks like tj-actions/changed-files is up-to-date now, so this is no longer needed. |
dshi7
pushed a commit
that referenced
this pull request
Aug 28, 2025
Getting a crash internally when running `09-persistent-matmul.py`
tutorial, and ASAN reports the following:
```
==7854==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c884c02e800 at pc 0x557f344112d9 bp 0x7b35908a1840 sp 0x7b35908a1838
READ of size 8 at 0x7c884c02e800 thread T1128
#0 0x557f344112d8 in getNextOperandUsingThisValue third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:43:58
#1 0x557f344112d8 in operator++ third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:322:39
#2 0x557f344112d8 in mlir::ResultRange::UseIterator::operator++() third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:613:5
#3 0x557f2ab70625 in mlir::lowerTokenOperations(mlir::Operation*, int, int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:269:27
#4 0x557f2ab70de8 in mlir::doTokenLowering(mlir::triton::FuncOp&, unsigned int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:321:3
#5 0x557f2ab2d018 in mlir::NVGPUWarpSpecializationPass::runOnFuncOp(mlir::triton::FuncOp) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:99:5
#6 0x557f2ab2c5d6 in operator() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:55
#7 0x557f2ab2c5d6 in operator() third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:304:7
#8 0x557f2ab2c5d6 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<std::__u::enable_if<!llvm::is_one_of<mlir::triton::FuncOp, mlir::Operation*, mlir::Region*, mlir::Block*>::value && std::is_same<void, void>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, mlir::ForwardIterator, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp), mlir::triton::FuncOp, void>(mlir::Operation*, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp)&&)::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
#9 0x557f2820ce45 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:12
#10 0x557f2820ce45 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:152:5
#11 0x557f2820ce2c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:147:9
#12 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), mlir::triton::FuncOp, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:306:10
#13 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:798:12
#14 0x557f2ab2c0c9 in mlir::NVGPUWarpSpecializationPass::runOnOperation() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:21
...
```
The problem seems to be that we are iterating through uses, and then
removing some of them inside the loop, which invalidates the iterator.
dshi7
pushed a commit
that referenced
this pull request
Aug 28, 2025
…leaveTMem.cpp (#7924) `TritonNvidiaGPU/interleave_tmem.mlir` fails under address sanitizer. The `ConstantIntOp` operations were created without attachment to any block in https://github.com/triton-lang/triton/pull/7622, which caused a memory leak. This change addresses the problem by adding an insertion point. <details open> <summary>Full log</summary> ================================================================= ==3831==ERROR: LeakSanitizer: detected memory leaks Direct leak of 576 byte(s) in 6 object(s) allocated from: #0 0x55c3eca39164 in malloc [third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp?l=67&ws=tap-presubmit-server/421956858&snapshot=2):3 #1 0x55c3f176afb3 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::BlockRange, unsigned int) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:113](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=113&ws=tap-presubmit-server/421956858&snapshot=2):46 #2 0x55c3f176a90c in create [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:74](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=74&ws=tap-presubmit-server/421956858&snapshot=2):10 #3 0x55c3f176a90c in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::OpaqueProperties, mlir::BlockRange, mlir::RegionRange) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:57](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=57&ws=tap-presubmit-server/421956858&snapshot=2):7 #4 0x55c3f176a61b in mlir::Operation::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:35](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=35&ws=tap-presubmit-server/421956858&snapshot=2):7 #5 0x55c3f1678a78 in mlir::OpBuilder::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp:453](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp?l=453&ws=tap-presubmit-server/421956858&snapshot=2):17 #6 0x55c3ecf3668f in mlir::arith::ConstantIntOp mlir::OpBuilder::create<mlir::arith::ConstantIntOp, int, int>(mlir::Location, int&&, int&&) [third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:507](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h?l=507&ws=tap-presubmit-server/421956858&snapshot=2):16 #7 0x55c3eefa690a in findBufferAccessMemdescSubview [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:75](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=75&ws=tap-presubmit-server/421956858&snapshot=2):33 #8 0x55c3eefa690a in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:151](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=151&ws=tap-presubmit-server/421956858&snapshot=2):12 #9 0x55c3eefa70e7 in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:156](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=156&ws=tap-presubmit-server/421956858&snapshot=2):34 #10 0x55c3eefa4c0c in tmemMayAlias [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:173](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=173&ws=tap-presubmit-server/421956858&snapshot=2):28 #11 0x55c3eefa4c0c in sinkOps [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:227](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=227&ws=tap-presubmit-server/421956858&snapshot=2):36 #12 0x55c3eefa4c0c in trySinkOp [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:253](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=253&ws=tap-presubmit-server/421956858&snapshot=2):10 #13 0x55c3eefa4c0c in mlir::triton::nvidia_gpu::TritonNvidiaGPUInterleaveTMemPass::runOnOperation() [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:275](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=275&ws=tap-presubmit-server/421956858&snapshot=2):14 #14 0x55c3f1560ad1 in operator() [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):17 #15 0x55c3f1560ad1 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 #16 0x55c3f1559920 in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 #17 0x55c3f1559920 in executeAction<mlir::PassExecutionAction, mlir::Pass &> [third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h:280](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h?l=280&ws=tap-presubmit-server/421956858&snapshot=2):7 #18 0x55c3f1559920 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:547](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=547&ws=tap-presubmit-server/421956858&snapshot=2):21 #19 0x55c3f155d46f in runPipeline [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:619](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=619&ws=tap-presubmit-server/421956858&snapshot=2):16 #20 0x55c3f155d46f in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:933](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=933&ws=tap-presubmit-server/421956858&snapshot=2):10 #21 0x55c3f155d15b in mlir::PassManager::run(mlir::Operation*) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:913](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=913&ws=tap-presubmit-server/421956858&snapshot=2):60 #22 0x55c3ed0a8b20 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:477](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=477&ws=tap-presubmit-server/421956858&snapshot=2):17 #23 0x55c3ed0a8363 in processBuffer [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):12 #24 0x55c3ed0a8363 in operator() [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:642](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=642&ws=tap-presubmit-server/421956858&snapshot=2):12 #25 0x55c3ed0a8363 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 #26 0x55c3f17bd34f in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 #27 0x55c3f17bd34f in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) [third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp:30](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp?l=30&ws=tap-presubmit-server/421956858&snapshot=2):12 #28 0x55c3ed09d0c6 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:647](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=647&ws=tap-presubmit-server/421956858&snapshot=2):26 #29 0x55c3ed09d67f in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:693](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=693&ws=tap-presubmit-server/421956858&snapshot=2):14 #30 0x55c3ed09dc59 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:709](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=709&ws=tap-presubmit-server/421956858&snapshot=2):10 #31 0x55c3eca74a70 in main [third_party/triton/bin/triton-opt.cpp:14](https://cs.corp.google.com/piper///depot/google3/third_party/triton/bin/triton-opt.cpp?l=14&ws=tap-presubmit-server/421956858&snapshot=2):33 #32 0x7f1fd58613d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94) #33 0x55c3ec995aa9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 </details> --------- Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
dshi7
added a commit
that referenced
this pull request
Sep 5, 2025
The LIT was appended by #7297 for while-loop Both TLX-3.5 and OAI-main have no changes in both LIT and output MLIRs but TLX-3.5 failed as below. `%1` (generated by tt.make_tensor_descriptor) has swizzledByteWidth 32 instead of 128. I haven't found out which recent change introduces it (must coming from TLX-3.5 rebase) and tweak ref from 128 to 32 as a quick side step. TODO. revert this change and put up real fix Command Output (stderr): -- RUN: at line 1: /data/users/daohang/fbtriton/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir -split-input-file --triton-nvidia-optimize-descriptor-encoding | FileCheck /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir + /data/users/daohang/fbtriton/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir -split-input-file --triton-nvidia-optimize-descriptor-encoding + FileCheck /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir:106:10: error: 'scf.while' op along control flow edge from Region #0 to Region #1: source type #0 '!tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 128, transposed = false, elementBitWidth = 8}>>>' should match input type #0 '!tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 32, transposed = false, elementBitWidth = 8}>>>' %2 = scf.while (%arg4 = %1) : (!tt.tensordesc<tensor<1x32xi8>>) -> (!tt.tensordesc<tensor<1x32xi8>>) { ^ /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir:106:10: note: see current operation: %5 = "scf.while"(%4) ({ ^bb0(%arg6: !tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 128, transposed = false, elementBitWidth = 8}>>>): "scf.condition"(%arg4, %arg6) : (i1, !tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 128, transposed = false, elementBitWidth = 8}>>>) -> () }, { ^bb0(%arg5: !tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 32, transposed = false, elementBitWidth = 8}>>>): %8 = "tt.descriptor_gather"(%arg5, %arg3, %1) : (!tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 32, transposed = false, elementBitWidth = 8}>>>, tensor<32xi32, #ttg.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>>, i32) -> tensor<32x32xi8, #ttg.blocked<{sizePerThread = [1, 1], threadsPerWarp = [1, 32], warpsPerCTA = [1, 4], order = [1, 0]}>> "scf.yield"(%arg5) : (!tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 32, transposed = false, elementBitWidth = 8}>>>) -> () }) : (!tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 128, transposed = false, elementBitWidth = 8}>>>) -> !tt.tensordesc<tensor<1x32xi8, #ttg.nvmma_shared<{swizzlingByteWidth = 128, transposed = false, elementBitWidth = 8}>>> /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir:95:15: error: CHECK-DAG: expected string not found in input // CHECK-DAG: #[[BLOCKED:.*]] = #ttg.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}> ^ <stdin>:58:100: note: scanning from here %1 = ttg.local_alloc %0 : (tensor<64x64xf16, #blocked>) -> !ttg.memdesc<64x64xf16, #shared, #smem> ^ <stdin>:65:7: note: possible intended match here ^ Input file: <stdin> Check file: /data/users/daohang/fbtriton/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir
agron911
pushed a commit
that referenced
this pull request
Oct 30, 2025
…mlir` test (#8117)
IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.
We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.
Here is the error:
```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268
READ of size 8 at 0x557bc517caa0 thread T0
#0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38
#1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5
#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27
#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9
#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10
#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10
#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5
#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17
#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5
#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```
(cherry picked from commit 4f5f43e)
pchen7e2
added a commit
to pchen7e2/fbtriton
that referenced
this pull request
Dec 22, 2025
htyu
pushed a commit
to htyu/triton-1
that referenced
this pull request
Jan 6, 2026
…mlir` test (#8117)
IIUC, the initialization order between static and non-static variables
is not guaranteed, so we can't use the previous non-static variable to
initialize a static one later on. Working around that by moving it into
a static function variable.
We discovered this when upgrading to a newer LLVM version, so it might
only be reproducible with new LLVM.
Here is the error:
```
==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268
READ of size 8 at 0x557bc517caa0 thread T0
#0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38
facebookexperimental#1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5
facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27
facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9
facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10
facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10
facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5
facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17
facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5
facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20
```
agron911
referenced
this pull request
in agron911/triton
Feb 10, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Differential Revision: D92893421
agron911
referenced
this pull request
in agron911/triton
Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
agron911
referenced
this pull request
in agron911/triton
Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
agron911
referenced
this pull request
in agron911/triton
Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (facebookexperimental#883) Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > facebookexperimental#3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > facebookexperimental#4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > facebookexperimental#5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > facebookexperimental#6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > facebookexperimental#7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > facebookexperimental#8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > facebookexperimental#9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421
meta-codesync Bot
pushed a commit
that referenced
this pull request
Feb 11, 2026
…issue in `tensor_layout_print.mlir` test (#8117)' (#883) Summary: Pull Request resolved: #883 This is a cherry-pick of an upstream PR: triton-lang/triton#8117 Upstream commit message: ``` > Fix ASAN `initialization-order-fiasco` issue in `tensor_layout_print.mlir` test (#8117) > IIUC, the initialization order between static and non-static variables > is not guaranteed, so we can't use the previous non-static variable to > initialize a static one later on. Working around that by moving it into > a static function variable. > We discovered this when upgrading to a newer LLVM version, so it might > only be reproducible with new LLVM. > Here is the error: > ``` > ==3551==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x557bc517caa0 at pc 0x557bc3f2fbb2 bp 0x7ffda74ef270 sp 0x7ffda74ef268 > READ of size 8 at 0x557bc517caa0 thread T0 > #0 0x557bc3f2fbb1 in getName llvm/include/llvm/Support/CommandLine.h:194:38 > #1 0x557bc3f2fbb1 in operator() llvm/lib/Support/CommandLine.cpp:347:5 > #2 0x557bc3f2fbb1 in __invoke<(lambda at llvm/lib/Support/CommandLine.cpp:347:5) &, llvm::cl::OptionCategory *> libcxx/include/__type_traits/invoke.h:87:27 > #3 0x557bc3f2fbb1 in __count_if<std::__u::_ClassicAlgPolicy, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, std::__u::__identity, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:30:9 > #4 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSetIterator<llvm::cl::OptionCategory *>, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> libcxx/include/__algorithm/count_if.h:41:10 > #5 0x557bc3f2fbb1 in count_if<llvm::SmallPtrSet<llvm::cl::OptionCategory *, 16U> &, (lambda at llvm/lib/Support/CommandLine.cpp:347:5)> llvm/include/llvm/ADT/STLExtras.h:1981:10 > #6 0x557bc3f2fbb1 in registerCategory llvm/lib/Support/CommandLine.cpp:347:5 > #7 0x557bc3f2fbb1 in llvm::cl::OptionCategory::registerCategory() llvm/lib/Support/CommandLine.cpp:484:17 > #8 0x557bc4504950 in OptionCategory llvm/include/llvm/Support/CommandLine.h:191:5 > #9 0x557bc4504950 in __cxx_global_var_init llvm/lib/CodeGen/GlobalISel/Combiner.cpp:37:20 > ``` ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4f5f43e --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 3 ``` Reviewed By: stashuk-olek Differential Revision: D92893421 fbshipit-source-id: 4d4bfcf7459e4ff08a3a06a4292e656df3249675
plotfi
pushed a commit
that referenced
this pull request
Mar 17, 2026
…734) ## Summary Fix three bugs causing **non-deterministic SIGSEGV on RTX 5070 Ti / 5080 / 5090 GPUs** (SM 12.0) when using `torch.compile` or any Triton-compiled kernel. This is the #1 blocker for RTX 50-series adoption in ML training. Every Blackwell GPU owner hitting this: pytorch/pytorch#176426 ## Root Cause `sm_arch_from_capability(120)` returns `"sm_120a"` — but **consumer Blackwell has no "a" variant**. The "a" suffix is only valid for datacenter GPUs: - `sm_90a` — Hopper (H100, H200) - `sm_100a` — Blackwell datacenter (B100, B200) There is no `sm_120a`. Consumer Blackwell is just `sm_120`. Passing the invalid `sm_120a` to LLVM and ptxas causes instruction selection for **tensor memory features (tcgen05) that do not exist on consumer hardware**. The generated machine code contains instructions for hardware that isn't there → SIGSEGV at runtime. The crash manifests as `ip 0000000000000000` (null jump target) because tensor memory register loads produce undefined values on hardware that lacks tensor memory, and subsequent indirect branches through those registers jump to address 0. The non-deterministic nature is explained by residual register state — whether the uninitialized register happens to hold a valid address or null determines if the kernel crashes or silently produces wrong results. ## The Fix Three changes to `third_party/nvidia/backend/compiler.py`: ### 1. `sm_arch_from_capability` — stop generating `sm_120a` ```python # Before (broken): adds "a" to everything >= 90, including sm_120 suffix = "a" if capability >= 90 else "" # After (fixed): "a" only for architectures that actually have it suffix = "a" if 90 <= capability < 120 else "" ``` This resolves the existing `TODO: Handle non-"a" sms` comment. ### 2. PTX `.target` regex — handle the "a" suffix ```python # Before: doesn't match the "a", so .target sm_120a passes through uncorrected re.sub(r'\.target sm_\d+', ...) # After: correctly matches and replaces sm_XXXa targets re.sub(r'\.target sm_\d+a?', ...) ``` ### 3. `make_ttgir` pipeline — route sm_120 away from tensor memory passes Consumer Blackwell uses MMAv2 (confirmed by `AccelerateMatmul.cpp` line 43-47 which already correctly excludes MMAv5 for sm_120). It has **no tensor memory**. The datacenter Blackwell pipeline runs `add_hoist_tmem_alloc`, `add_promote_lhs_to_tmem`, and `add_warp_specialize` (the Blackwell variant) — none of which are tested on sm_120 (the test suite excludes it via `is_blackwell()` checking major in [10, 11]). ```python # Before: sm_120 falls into datacenter Blackwell path if capability // 10 in [8, 9]: # Ampere/Hopper elif capability // 10 >= 10: # ALL Blackwell (including consumer) # After: sm_120 uses the Hopper pipeline (matches its MMAv2 feature set) if capability // 10 in [8, 9] or capability >= 120: # Ampere/Hopper/consumer Blackwell elif 100 <= capability < 120: # Datacenter Blackwell only ``` ## Hardware Testing Tested on **RTX 5070 Ti** (SM 12.0, compute capability 12.0) with PyTorch 2.9.1+cu128 / Triton 3.5.1 / CUDA 12.8 / Driver 595.71: | Test | Before fix | After fix | |------|-----------|-----------| | `torch.compile` training (100 steps) | Segfaults within ~100 steps | **5 × 100 steps, 0 crashes** | | Compiled MLP (200 steps) | Segfaults non-deterministically | **200 steps, correct results** | | Triton elementwise kernel | Sometimes works | **Always correct** | | Triton matmul kernel (fp16) | Segfaults | **Correct results, matches torch.mm** | **700+ compiled training steps with zero segfaults** on hardware that previously couldn't survive 100. ## Reproduction ```python import torch, torch.nn as nn model = nn.Linear(768, 768).cuda().bfloat16() model = torch.compile(model, dynamic=False) opt = torch.optim.Adam(model.parameters()) for i in range(100): x = torch.randn(16, 768, device="cuda", dtype=torch.bfloat16) loss = model(x).sum() loss.backward() opt.step() opt.zero_grad() print(f"step {i}") # Before fix: segfaults non-deterministically on RTX 5070 Ti/5080/5090 # After fix: completes every time ``` ## Test plan - [x] `test_sm_arch_from_capability` — verifies correct arch strings for all GPU generations - [x] `test_compile_only_sm120` — verifies sm_120 PTX contains `.target sm_120` (no "a"), no `tcgen05` instructions, and produces valid cubin - [x] Existing `test_compile_only_sm100` — still passes (sm_100a preserved) - [x] Hardware validation on RTX 5070 Ti (700+ training steps) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps tj-actions/changed-files from 44 to 45.
Release notes
Sourced from tj-actions/changed-files's releases.
... (truncated)
Changelog
Sourced from tj-actions/changed-files's changelog.
... (truncated)
Commits
4edd678chore(deps): update dependency eslint-plugin-jest to v28.9.0f082558chore(deps): update dependency@types/nodeto v22.9.092c02a0chore(deps): lock file maintenanceb702211chore(deps): update dependency@types/nodeto v22.8.7435fd74chore(deps): update dependency@types/nodeto v22.8.60626fa3chore(deps): update dependency@types/nodeto v22.8.58817a79chore(deps): update dependency@types/lodashto v4.17.135417491chore(deps): update dependency@types/nodeto v22.8.484ef162chore(deps): update dependency@types/nodeto v22.8.2b672a51chore(deps): lock file maintenanceDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)