Skip to content

[NIXL][Bugfix] metrics & testing minor bug#36051

Merged
NickLucche merged 5 commits intovllm-project:mainfrom
andylolu2:andy/nixl-fixes
Mar 18, 2026
Merged

[NIXL][Bugfix] metrics & testing minor bug#36051
NickLucche merged 5 commits intovllm-project:mainfrom
andylolu2:andy/nixl-fixes

Conversation

@andylolu2
Copy link
Copy Markdown
Contributor

@andylolu2 andylolu2 commented Mar 4, 2026

Purpose

Minor bug fixes:

  • NIXL handshake: add agent took log was not counting the time spent on add_remote_agent, which is what it's supposed to log.
  • The test_prefill_tp_size_greater_than_decode_tp_size was not testing the local_tp_size case because of an override.

Signed-off-by: Andy Lo <andy@mistral.ai>
@andylolu2 andylolu2 marked this pull request as ready for review March 4, 2026 22:34
@mergify mergify bot added v1 bug Something isn't working kv-connector labels Mar 4, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses two minor bugs. The first is a test fix in test_prefill_tp_size_greater_than_decode_tp_size which removes a hardcoded variable, allowing the test to correctly use its parameterization. The second is a metrics fix in _nixl_handshake that adjusts the timing logic to correctly include the duration of the add_remote_agent call in the log. Both changes are correct and effectively resolve the described issues.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 4, 2026

Hi @andylolu2, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@NickLucche
Copy link
Copy Markdown
Collaborator

Thanks @andylolu2 , can you quickly check pre-commit here?

@andylolu2
Copy link
Copy Markdown
Contributor Author

Thanks @andylolu2 , can you quickly check pre-commit here?

I think main was broken for a bit, will rebase

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 6, 2026
@mergify mergify bot added llama Related to Llama models multi-modality Related to multi-modality (#4194) performance Performance-related issues qwen Related to Qwen models gpt-oss Related to GPT-OSS models nvidia labels Mar 18, 2026
@mergify mergify bot added rocm Related to AMD ROCm structured-output labels Mar 18, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Mar 18, 2026
@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Mar 18, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @andylolu2.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andy Lo <andy@mistral.ai>
remote_agents = worker._nixl_handshake(
host="localhost",
port=1234,
remote_tp_size=2,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test actually doesn't work with local_tp_size == remote_tp_size

@NickLucche NickLucche merged commit 98b09dd into vllm-project:main Mar 18, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) nvidia performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding structured-output tool-calling v1

Projects

Status: Done
Status: Done
Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants