[NIXL][Bugfix] metrics & testing minor bug#36051
[NIXL][Bugfix] metrics & testing minor bug#36051NickLucche merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Andy Lo <andy@mistral.ai>
There was a problem hiding this comment.
Code Review
This pull request addresses two minor bugs. The first is a test fix in test_prefill_tp_size_greater_than_decode_tp_size which removes a hardcoded variable, allowing the test to correctly use its parameterization. The second is a metrics fix in _nixl_handshake that adjusts the timing logic to correctly include the duration of the add_remote_agent call in the log. Both changes are correct and effectively resolve the described issues.
|
Hi @andylolu2, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Thanks @andylolu2 , can you quickly check pre-commit here? |
I think main was broken for a bit, will rebase |
9766451 to
191da6f
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
| remote_agents = worker._nixl_handshake( | ||
| host="localhost", | ||
| port=1234, | ||
| remote_tp_size=2, |
There was a problem hiding this comment.
The test actually doesn't work with local_tp_size == remote_tp_size
Purpose
Minor bug fixes:
NIXL handshake: add agent tooklog was not counting the time spent onadd_remote_agent, which is what it's supposed to log.test_prefill_tp_size_greater_than_decode_tp_sizewas not testing thelocal_tp_sizecase because of an override.