[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140simon-mo merged 10 commits intovllm-project:mainfrom
Conversation
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
Thanks IMHO, this issue should be addressed by bundling the custom cache manager code inside the vllm. |
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
Thanks @tdoublep, I had mentioned this to @youkaichao previously but kept forgetting to open a PR. Not immediately obvious why this seems to only affect the I agree that it would be better for this to be incorporated into the library if possible. I wonder if we could open a PR or issue in the triton for this (if one doesn't already exist) |
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
| else: | ||
| raise RuntimeError("Could not create or locate cache dir") | ||
|
|
||
| print(f"Triton cache dir: {self.cache_dir=}") |
There was a problem hiding this comment.
This should probably be a debug log instead, it produces a lot of output.
There was a problem hiding this comment.
have just removed it for now
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
@njhill @jeejeelee I have re-implemented it as part of the vllm library. One thing I'm not sure about is whether setting the env variable from fused_moe code is sufficient, or whether there are other parts of the code where this fix would be needed. Maybe it's OK for now. |
Even if we don't consider #5036, prefix_prefill and triton_flash_attention are still necessary. |
| def maybe_set_triton_cache_manager(module: str) -> None: | ||
| cache_manger = os.environ.get("TRITON_CACHE_MANAGER", None) | ||
| if cache_manger != module: | ||
| os.environ["TRITON_CACHE_MANAGER"] = module |
There was a problem hiding this comment.
If the user manually sets this env, can we modify it? Additionally, I suggest adding a log message for clarification
There was a problem hiding this comment.
have changed it so that we only set it if the user has not. also added a log message
@jeejeelee ok, in that case I guess it makes sense to call |
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
@njhill @jeejeelee I've moved the call to |
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
CI tests failures look like network blips ( |
| os.environ["TRITON_CACHE_MANAGER"] = manager | ||
|
|
||
|
|
||
| class CustomCacheManager(FileCacheManager): |
There was a problem hiding this comment.
Document why do we need this?
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
CI failure looks unrelated: |
I'm also wondering this, too. cc anyscale folks @cadedaniel @Yard1 for visibility. |
| """Re-implements Triton's cache manager, ensuring that a | ||
| unique cache directory is created for each process. This is | ||
| needed to avoid collisions when running with tp>1 and | ||
| using multi-processing as the distributed backend. |
There was a problem hiding this comment.
If triton 3.0.0 could solve this problem, it'd be better to note here that this custom cache manager can be removed when we upgrade triton.
There was a problem hiding this comment.
The fix for the issue is not yet in v3.0.0, but I guess would be in whatever version comes after that (see my summary here). I will add a comment to that end.
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
All comments have been addressed. Is there anything else you would like to see? @comaniac @njhill @jeejeelee @simon-mo I think it would be good to get this one in since there are quite a few people struggling with this issue. |
|
merging to unblock release |
…m-project#6140) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
…m-project#6140) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…m-project#6140) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
Fixes #6103
We have been using this fix via our fork (see here) for a while and it seems stable.
Note, this will only resolve the problem if you are using vLLM from the docker image. Maybe a better approach would be to bundle the custom cache manager code inside vllm package, that way it will get shipped via pip install too, and the user could still set env variable to enable it.Update: I've now implemented it by including the custom cache manager inside vLLM and setting the necessary env variable via code.
cc @jeejeelee