Skip to content

Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host#11

Merged
nlevin-ui merged 2 commits intosa-submission-q2-2026from
nlevin/kimi-k2-nixl-fix
Apr 6, 2026
Merged

Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host#11
nlevin-ui merged 2 commits intosa-submission-q2-2026from
nlevin/kimi-k2-nixl-fix

Conversation

@nlevin-ui
Copy link
Copy Markdown
Collaborator

Summary

  • Add kimi-k2.5 1k1k and 8k1k disaggregated GB200 recipes (synced from Add Kimi-K2.5-nvfp4 GB200-disagg 1k1k and 8k1k for vllm #7)
  • Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to the
    node's routable IP in get_process_environment() — previously unset, causing
    workers to advertise 0.0.0.0/localhost and fail cross-node KV transfers
  • Update test_vllm_get_process_environment to cover the NIXL host env var

Test plan

  • Verified on SA GB200 cluster: all kimi-k2.5 1k1k and 8k1k recipes ran
    successfully with NIXL KV transfers working
  • Unit tests passing (make check)

nlevin-ui and others added 2 commits April 6, 2026 23:03
- Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from #7)
- Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to
  node's routable IP in get_process_environment() instead of leaving it
  as 0.0.0.0/localhost which caused transfer handshake failures
- Update test_vllm_get_process_environment to cover NIXL host env var

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nlevin-ui nlevin-ui force-pushed the nlevin/kimi-k2-nixl-fix branch from 277056e to 4efc6e2 Compare April 6, 2026 23:04
Copy link
Copy Markdown
Collaborator

@kyleliang-nv kyleliang-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@nlevin-ui nlevin-ui merged commit 8294e64 into sa-submission-q2-2026 Apr 6, 2026
5 checks passed
richardhuo-nv pushed a commit to richardhuo-nv/srt-slurm-trtllm that referenced this pull request Apr 9, 2026
* Add Kimi-K2.5 vLLM recipes and fix NIXL side channel host

- Add kimi-k2.5 1k1k and 8k1k disagg GB200 recipes (from NVIDIA#7)
- Fix vLLM NIXL handshake failures: set VLLM_NIXL_SIDE_CHANNEL_HOST to
  node's routable IP in get_process_environment() instead of leaving it
  as 0.0.0.0/localhost which caused transfer handshake failures
- Update test_vllm_get_process_environment to cover NIXL host env var

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: run checks on PRs targeting sa-submission-q2-2026

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants