Apply temp. patch to Triton code to resolve conflicting cache dirs in TP case by tdoublep · Pull Request #34 · IBM/vllm

tdoublep · 2024-05-28T12:35:08Z

We are seeing Mixtral pods with TP>1 failing with errors like:

FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/c926ad2ef143810ed738a313c473c7b2/fused_moe_kernel.cubin.tmp.pid_72_945989'

It seems like there is some conflict in the Triton cache directories when using multi-processing. This has actually been fixed upstream in Triton, but the fix hasn't made it into Triton v2.3.0 which is what vLLM is currently using.

This change essentially applies same fix that has made it into Triton main branch inside our container.

… TP case. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

njhill

Very nice thank you @tdoublep!

@cyang49

…ch instead. (#35) I tested the previous fix for the Triton cache collision issue (see: #34) and it didn't work. I now see errors like: ``` FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/1feb415f3280ca46eea8c4407a58c23e/fused_moe_kernel.json.tmp.pid_72_c0a0033e-6147-4520-ae3a-3847d02598f8' ``` which now shows the `uuid` instead of a random integer, but problem remains. This PR implements a different workaround, proposed by @cyang49, that tells Triton to use a custom cache manager which assigns a different directory based on the process id. This time I have tested it and it seems to work. --------- Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Joe Runde <joe@joerun.de> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Co-authored-by: Joe Runde <joseph.runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Upstream changes broke the logic to disable building the C extensions when building for sendnn. This was causing the following error when trying to pip install vllm inside the [dev container build](https://v3.travis.ibm.com/github/ai-foundation/aiu-inference-dev/builds/25651578). This tiny fix resolves that. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep added 2 commits May 28, 2024 12:30

Apply temp. patch to Triton code to resolve conflicting cache dirs in…

e679a9d

… TP case. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Remove dev stuff

607f46a

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep requested review from joerunde and njhill May 28, 2024 12:35

njhill approved these changes May 28, 2024

View reviewed changes

njhill merged commit 4af59d3 into main May 28, 2024

njhill deleted the tpa-triton-cachefix branch May 28, 2024 15:32

tdoublep mentioned this pull request May 29, 2024

Revert previous attempt at Triton patch; use CustomCacheManger approach instead. #35

Merged

tdoublep mentioned this pull request Jul 5, 2024

[Bug]: fused_moe_kernel compile bug vllm-project/vllm#6103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply temp. patch to Triton code to resolve conflicting cache dirs in TP case#34

Apply temp. patch to Triton code to resolve conflicting cache dirs in TP case#34
njhill merged 2 commits intomainfrom
tpa-triton-cachefix

tdoublep commented May 28, 2024

Uh oh!

njhill left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tdoublep commented May 28, 2024

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants