[Misc] Refactor Attention kv transfer methods into decorator#27816
[Misc] Refactor Attention kv transfer methods into decorator#27816NickLucche merged 7 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the KV cache transfer logic into a decorator, which is a great improvement for code clarity and maintainability in layer.py. The implementation is clean and correctly captures the on-entry/on-exit pattern.
However, I've found a critical issue in the new decorator. It attempts to access the layer_name parameter from keyword arguments, but this parameter is passed positionally in all call sites. This will cause a KeyError at runtime. I've provided a suggestion to fix this by using the inspect module to robustly retrieve the argument.
Once this is addressed, the PR will be in excellent shape.
|
@codex review |
markmc
left a comment
There was a problem hiding this comment.
Love the idea in general, suggestion inline
8abc2cd to
e3914fa
Compare
|
Thanks for reviewing @markmc ! |
ProExpertProg
left a comment
There was a problem hiding this comment.
Just a few nits and Qs
e3914fa to
5546494
Compare
5f24f38 to
f3b6244
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
9a18fa5 to
900be1f
Compare
|
@hmellor can you tell why the docs build really dislikes this PR? :) |
vllm/attention/layer.py
Outdated
| Returns: | ||
| A tuple containing: | ||
| - attn_metadata: Attention metadata for this specific layer, or None if | ||
| no metadata available |
There was a problem hiding this comment.
@NickLucche https://app.readthedocs.org/projects/vllm/builds/30282990/#293811637--1144
| no metadata available | |
| no metadata available |
or
| no metadata available | |
| no metadata available |
(not sure which will render better)
@NickLucche any reason for not checking |
|
Thanks @ptovam for spotting this! Rebase cruft on me |
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Reduces code duplication between the maybe_transfer_kv_layer and the functions it decorates. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
9d06f87 to
7ba7765
Compare
Signed-off-by: NickLucche <nlucches@redhat.com>
…oject#27816) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
…oject#27816) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Small quality of life improvement by removing some of the kv transfer-specifc code in
layer.pyand refactoring that into a decorator. I believe the on entry-on exit pattern (wait_read-wait_write) here is very suitable for that.The result is simply that there's less non-attention related code in the file. Behavior should be unchanged.
Also, I found that after grouping some common boilerplate code for both
maybe_save_kv_layer_to_connectorandwait_for_kv_layer_from_connector, I think there's too little left to justify a separate function for both, hence I ended-up inlining both connector method calls.cc @ApostaC who wrote the initial connector code