python3Packages.triton: 3.1.0 -> 3.3.1#419179
Conversation
54dac1a to
6970ae7
Compare
|
Last time I checked, |
|
@GaetanLepage (1) Bumping the hash of triton-llvm (second commit) and (2) rebasing the patch |
When running |
|
|
@GaetanLepage Sorry about that, just removing the offending patch gets it to build. I forgot overwriting I can confirm |
|
|
@GaetanLepage It seems torch is causing most/all of the other build failures. For me nvcc error : '"$CICC_PATH/cicc"' died due to signal 9 (Kill signal)
[3863/7609] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredLinear.cu.o
/build/pytorch/aten/src/ATen/core/IListRef_inl.h: In static member function ‘static c10::detail::IListRefConstRef<at::OptionalTensorRef> c10::detail::IListRefTagImpl<c10::IListRefTag::Boxed, at::OptionalTensorRef>::iterator_get(const c10::List<std::optional<at::Tensor> >::const_iterator&)’:
/build/pytorch/aten/src/ATen/core/IListRef_inl.h:171:13: warning: possibly dangling reference to a temporary [-Wdangling-reference]
171 | const auto& ivalue = (*it).get();
| ^~~~~~
/build/pytorch/aten/src/ATen/core/IListRef_inl.h:171:33: note: the temporary was destroyed at the end of the full expression ‘(& it)->c10::impl::ListIterator<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<std::optional<at::Tensor>, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::get()’
171 | const auto& ivalue = (*it).get();
| ~~~~~~~~~~~^~
[3867/7609] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/third_party/flash-attention/csrc/flash_attn/src/flash_bwd_hdim256_fp16_sm80.cu.o
ninja: build stopped: subcommand failed.
ERROR Backend subprocess exited when trying to invoke build_wheelIt seems to be just running out of memory, I think you know the torch build better than me, but my understanding is that triton is just used for Do you have any idea why torch is failing to build with the new triton? I could try to lower the number of cores and see if the build succeeds, but with 8 cores it took 3 hours to get to 3867/7609 so if I halve the number of cores this will take around 12 hours of building, which is a bit annoying on my laptop. |
Indeed, in this run, Thanks again for your work @stephen-huan! |
|
Thanks for the detailed review, I made some silly time-wasting mistakes, my bad : )
Just out of curiosity what was the cause? out of memory like mine or something else? |
Can't remember, I think I had too many builds on the same machine. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
GaetanLepage
left a comment
There was a problem hiding this comment.
Well done @stephen-huan and thank you for your patience!
Things done
Modifications to #414862 to get it to work.
0004-nvidia-allow-static-ptxas-path.patch.0001-setup.py-introduce-TRITON_OFFLINE_BUILD.patchwhich is unused and has been upstream for a while now (setup.py: introduce TRITON_OFFLINE_BUILD triton-lang/triton#4414).0001-_build-allow-extra-cc-flags.patchand0002-nvidia-amd-driver-short-circuit-before-ldconfig.patchfor ease of future maintenance.cc @GaetanLepage
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.