Skip to content

feat: enable DP for GAIE#7741

Merged
atchernych merged 7 commits intomainfrom
dp-support
Apr 2, 2026
Merged

feat: enable DP for GAIE#7741
atchernych merged 7 commits intomainfrom
dp-support

Conversation

@atchernych
Copy link
Copy Markdown
Contributor

@atchernych atchernych commented Mar 31, 2026

Overview:

feat: enable DP (data parallelism) for GAIE

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Enhanced request routing with data parallelism (DP-rank) metadata propagation, improving request scheduling and batching efficiency across distributed inference workers.
  • Documentation

    • Updated Inference Gateway documentation to clarify support for data parallelism in GAIE integration and testing configurations.

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych requested review from a team as code owners March 31, 2026 22:40
@github-actions github-actions bot added feat documentation Improvements or additions to documentation labels Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Walkthrough

The changes introduce dynamic batching data-parallel rank (DpRank) propagation throughout the inference gateway pipeline. DpRank is extracted from routing results, propagated through decode and prefill scorers, set in request headers, and handled in C FFI bindings to support downstream scheduling logic.

Changes

Cohort / File(s) Summary
C FFI and Go Routing Result Extension
lib/bindings/c/src/lib.rs, deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go
Extended CRoutingResult struct with prefill_dp_rank and decode_dp_rank fields (both u32). Updated CallRoutePrefillRequest and CallRouteDecodeRequest to read and populate DpRank from C routing results. Changed RouterHandles::query_prefill_worker return type to include u32 for prefill rank alongside worker ID.
Decode and Prefill Scorer Integration
deploy/inference-gateway/epp/pkg/plugins/disagg/decode_scorer.go, deploy/inference-gateway/epp/pkg/plugins/disagg/prefill_scorer.go
Added DpRank field to DecodeRoutingState with propagation via Clone(). Extended Score() methods to extract DpRank from routing results, format and log the values, and set x-dynamo-dp-rank and PrefillDpRankHeader in request headers. Updated PreRequest() to pass state.DpRank to dynscorer.CallAddRequest instead of hardcoded 0.
Documentation
docs/kubernetes/inference-gateway.md
Restructured content with new "Features" section. Reorganized KV-routing and EPP configuration references. Added feature bullet for GAIE integration with Data Parallelism support. Clarified that disaggregated and aggregated setups are only tested with kGateway-based Inference Gateway. Repositioned LoRA configuration reference within feature list.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is largely incomplete. While it has the correct structure with Overview, Details, and Related Issues sections, most sections are empty or contain only placeholder text. Fill in the Details section with a clear description of changes made in this PR. Provide specific guidance under 'Where should the reviewer start?' by calling out key files. Replace the placeholder 'closes GitHub issue: #xxx' with the actual issue number.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: enable DP for GAIE' clearly describes the main change—enabling Data Parallelism support for GAIE—which aligns with the core purpose of all modifications across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/kubernetes/inference-gateway.md`:
- Line 23: Typo: replace the incorrect phrase "kGateways Inference Gateway" with
the correct wording "kGateway Inference Gateway" in the compatibility bullet;
update the string in docs/kubernetes/inference-gateway.md where the phrase
appears (search for "kGateways Inference Gateway" or the exact sentence
containing it) so the document reads "Currently, these setups are only tested
with the kGateway Inference Gateway."
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a91cbc8-a691-4782-a8f9-5053745c1b27

📥 Commits

Reviewing files that changed from the base of the PR and between cbbde3d and 66712d0.

📒 Files selected for processing (5)
  • deploy/inference-gateway/epp/pkg/plugins/disagg/decode_scorer.go
  • deploy/inference-gateway/epp/pkg/plugins/disagg/prefill_scorer.go
  • deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go
  • docs/kubernetes/inference-gateway.md
  • lib/bindings/c/src/lib.rs

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@github-actions github-actions bot added the frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` label Apr 1, 2026
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych enabled auto-merge (squash) April 1, 2026 22:46
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych requested review from a team as code owners April 2, 2026 22:06
@pull-request-size pull-request-size bot added size/L and removed size/M labels Apr 2, 2026
@github-actions github-actions bot added the router Relates to routing, KV-aware routing, etc. label Apr 2, 2026
@atchernych atchernych merged commit 2b90650 into main Apr 2, 2026
93 checks passed
@atchernych atchernych deleted the dp-support branch April 2, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants