[Question] triton jit of `LigerRopeFunction` runs every step when variable length sequences are used #146

tyler-romero · 2024-08-28T20:31:58Z

🐛 Describe the bug

When training a model using LigerKernels, I don't get the speedup I expect. Upon profiling with and without LigerKernels, I see that the slowpoint seems to be the triton JIT of LigerRopeFunction. This seems to happen at every step, instead of only once. Other LigerKernels do not seem to take as long to jit (or the jit just isnt happening after the first step).

Side by Side comparison of rope with and without Liger:

LigerRopeFunction in the context of an entire forward pass:

I am training with dynamic padding (so different sequence lengths at every forward pass). It seems like this currently the only LigerKernel that is dependent on seq_len in a non-batch dimension (LigerCrossEntropyFunction is dependent on seq_len, but it is pushed into the batch dimension).

Does this intuition about why LigerRopeFunction is JIT every forward pass but LigerCrossEntropyFunction is not make sense?

Reproduce

No response

Versions

> python -m liger_kernel.env_report
Environment Report:
-------------------
Operating System: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
Python version: 3.10.13
PyTorch version: 2.3.0
CUDA version: 12.1
Triton version: 2.3.0
Transformers version: 4.42.3

The text was updated successfully, but these errors were encountered:

ByronHsu · 2024-08-28T20:35:56Z

triton-lang/triton#3166 seems related to this. probably need triton folks for help @ptillet @Jokeren. Can you also cross post in triton issue?

ByronHsu · 2024-08-28T20:36:14Z

cc @yundai424 @lancerts if you have insights

yundai424 · 2024-08-28T20:53:14Z

my hunch is it has something to do with the fact that we treat sequence length as constexpr in RoPE kernel 🤔 if it has to be known at compile time then makes sense to me that for each sequence length there will be a new function signature / cache entry hmm

ByronHsu · 2024-08-28T21:23:25Z

see the discussion thread at: https://discord.com/channels/1189498204333543425/1275130785933951039/1278457607291670641. tldr: seq_len should not be constexpr

Jokeren · 2024-08-28T21:48:25Z

The analysis makes sense to me. Sequence length shouldn't be a constexpr if you want to get rid of JIT multiple times

ByronHsu · 2024-08-28T22:25:43Z

Let's close the ticket once @tyler-romero verify the fix

tyler-romero · 2024-08-28T22:44:35Z

Look much better!

ByronHsu · 2024-08-28T23:01:35Z

Thanks folks!

shreyassks · 2024-08-29T10:00:33Z

@tyler-romero hey. Could you pls lmk which tool you have used for profiling?

tyler-romero · 2024-08-29T20:21:04Z

@shreyassks I just used PyTorch's built-in profiler: https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#using-tracing-functionality

yundai424 added bug Something isn't working p0 labels Aug 28, 2024

yundai424 linked a pull request Aug 28, 2024 that will close this issue

updated sl to be non-constexpr #149

Merged

3 tasks

yundai424 closed this as completed in #149 Aug 28, 2024

ByronHsu reopened this Aug 28, 2024

tyler-romero closed this as completed Aug 28, 2024

This was referenced Sep 9, 2024

Support Z Loss in CE #234

Closed

Support Z Loss in CE #239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] triton jit of `LigerRopeFunction` runs every step when variable length sequences are used #146

[Question] triton jit of `LigerRopeFunction` runs every step when variable length sequences are used #146

tyler-romero commented Aug 28, 2024 •

edited

Loading

ByronHsu commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

yundai424 commented Aug 28, 2024

ByronHsu commented Aug 28, 2024 •

edited

Loading

Jokeren commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

tyler-romero commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

shreyassks commented Aug 29, 2024

tyler-romero commented Aug 29, 2024

[Question] triton jit of LigerRopeFunction runs every step when variable length sequences are used #146

[Question] triton jit of LigerRopeFunction runs every step when variable length sequences are used #146

Comments

tyler-romero commented Aug 28, 2024 • edited Loading

🐛 Describe the bug

Reproduce

Versions

ByronHsu commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

yundai424 commented Aug 28, 2024

ByronHsu commented Aug 28, 2024 • edited Loading

Jokeren commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

tyler-romero commented Aug 28, 2024

ByronHsu commented Aug 28, 2024

shreyassks commented Aug 29, 2024

tyler-romero commented Aug 29, 2024

[Question] triton jit of `LigerRopeFunction` runs every step when variable length sequences are used #146

[Question] triton jit of `LigerRopeFunction` runs every step when variable length sequences are used #146

tyler-romero commented Aug 28, 2024 •

edited

Loading

ByronHsu commented Aug 28, 2024 •

edited

Loading