[core][distributed] simplify code to support pipeline parallel#6406
[core][distributed] simplify code to support pipeline parallel#6406youkaichao merged 11 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Full CI run is still required to merge this PR so once the PR is ready to go, please make sure to run it. If you need all test signals in between PR commits, you can trigger full CI as well. To run full CI, you can do one of these:
🚀 |
|
Perhaps it would be better to move the |
vllm/model_executor/models/llama.py
Outdated
| except KeyError: | ||
| pass | ||
|
|
||
| if name not in params_dict: |
There was a problem hiding this comment.
This could silence some hard to track down bugs when loading more complex state dicts (in quantized case)
Could we try to check if the name corresponds to a layer not on the device b/c of PP?
There was a problem hiding this comment.
this is quite difficult, as load_weights does not have layer information.
fixed in 347399e |
|
I noticed CUDA out of memory error on the basic correctness tests here. Is it reproducible locally? I think this change shouldn't cause that so it's possibly a flaky test? |
|
/ready |
|
@andoorve finally figured it out, it is because lru cache stores a reference of the model, and then fails the gc system :( |
|
merge first to unblock the following models support. |
| def is_pp_missing_parameter(name: str, model: torch.nn.Module) -> bool: | ||
| """Check if a parameter is missing in a pipeline parallel model.""" | ||
| for missing_layer_name in get_pp_missing_layer_names(model): | ||
| if name.startswith(missing_layer_name): |
There was a problem hiding this comment.
:-) "xx.11".startswith("xx.1")
There was a problem hiding this comment.
sorry for the bug, and thanks for pointing it out so quickly! please take a look at #6446 .
…project#6406) Signed-off-by: Alvant <alvasian@yandex.ru>
…project#6406) Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>
to minimize the line of code change for a model to support pipeline parallel.
in the model weight loading part, just add:
(the benefit is we don't need to introduce extra indentation of the try-except code)
in the layer construction part, just add:
hopefully, with these change, code for a model to support pp will be easier, and the review can also be easier.