Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Aug 18, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Namespace-aware configuration via environment variable, enabling dynamic defaults across services.
    • Frontend now supports a --namespace flag (falls back to environment) with clearer startup logging.
    • Per-namespace model discovery/filtering in dynamic HTTP mode to scope visible models.
  • Refactor

    • Simplified vLLM configuration; removed the --endpoint CLI option in favor of environment-driven defaults.
  • Documentation

    • Updated vLLM docs to reflect removal of the --endpoint option.
  • Chores

    • Faster vLLM installation on AMD64 using a precompiled wheel.
  • Tests

    • Added integration tests covering namespace filtering and discovery behavior.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@biswapanda biswapanda requested a review from PeaBrane as a code owner August 18, 2025 06:04
@biswapanda biswapanda requested a review from a team as a code owner August 18, 2025 06:04
@github-actions github-actions bot added the feat label Aug 18, 2025
@biswapanda biswapanda marked this pull request as draft August 18, 2025 06:04
@biswapanda biswapanda marked this pull request as ready for review August 18, 2025 06:05
@biswapanda
Copy link
Contributor Author

/ok to test c7b3957

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

Walkthrough

Introduces environment-driven namespaces across components. Adds DYN_NAMESPACE defaults, exposes namespace utilities, removes explicit namespace flags in deploys, updates vLLM arg parsing to fixed endpoint mapping, adds frontend namespace-scoped discovery, implements Rust-side namespace module and watcher filtering, and updates tests. Adds AMD64 precompiled vLLM wheel install path.

Changes

Cohort / File(s) Change Summary
Namespace defaults (Python mocker)
components/backends/mocker/src/dynamo/mocker/main.py
New DYN_NAMESPACE env default ("global"); DEFAULT_ENDPOINT now dyn://{DYN_NAMESPACE}.backend.generate.
SGLang common utils/API
components/backends/sglang/src/dynamo/sglang/common/__init__.py, components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
Added get_dynamo_namespace() (reads DYN_NAMESPACE); re-exported via all.
SGLang workers adopt dynamic namespace
components/backends/sglang/src/dynamo/sglang/worker/main.py, components/backends/sglang/src/dynamo/sglang/decode_worker/main.py
Replace hard-coded "dynamo" with get_dynamo_namespace() for runtime.namespace lookups.
SGLang namespace clearing CLI
components/backends/sglang/src/dynamo/sglang/utils/clear_namespace.py
--namespace now optional; falls back to DYN_NAMESPACE; asserts presence before clearing.
SGLang deploy manifests
components/backends/sglang/deploy/agg.yaml, .../agg_router.yaml, .../disagg*.yaml
Removed explicit --namespace from clear_namespace commands; rely on env fallback.
Frontend namespace-scoped discovery
components/frontend/src/dynamo/frontend/main.py
Added --namespace flag; default from DYN_NAMESPACE; computes endpoint_id as .frontend.http or global; added logging and constants.
vLLM args/docs
components/backends/vllm/src/dynamo/vllm/args.py, components/backends/vllm/README.md, components/backends/vllm/deploy/README.md
Removed --endpoint CLI and dyn:// parsing; namespace from DYN_NAMESPACE, component from role, endpoint="generate"; docs updated accordingly.
Rust namespace module + watcher
lib/llm/src/lib.rs, lib/llm/src/namespace.rs, lib/llm/src/discovery/watcher.rs
New namespace module (GLOBAL_NAMESPACE, is_global_namespace); ModelWatcher::watch now accepts target_namespace and filters Put events by namespace.
Rust HTTP entry wiring
lib/llm/src/entrypoint/input/http.rs, lib/llm/src/entrypoint/input/common.rs
Added WatcherConfig and per-namespace discovery; pass target_namespace to watcher; endpoint enabling updated; call sites adjusted with None/default.
Rust tests
lib/llm/tests/namespace.rs, lib/llm/tests/http_namespace_integration.rs
New tests for namespace helpers, filtering, endpoint parsing, and HTTP discovery scoping.
Container: vLLM install (AMD64)
container/deps/vllm/install_vllm.sh
AMD64 path installs PyTorch 2.7.1+cu128 and downloads precompiled vLLM wheel; cleans up after install; ARM64 unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Env as Environment
  participant FE as Frontend
  participant RT as Runtime
  participant ETCD as Discovery (KV)

  Env->>FE: DYN_NAMESPACE (optional)
  FE->>FE: Parse args (--namespace or env or global)
  FE->>RT: Build endpoint_id "<ns>.frontend.http"
  RT->>ETCD: Watch models under namespace scope
  ETCD-->>RT: Model events (scoped)
  RT-->>FE: Available models/endpoints
  FE->>Client: Serve HTTP using discovered models
Loading
sequenceDiagram
  participant Watcher as ModelWatcher
  participant ETCD as KV Stream
  participant MM as ModelManager

  Watcher->>Watcher: target_namespace = None or Some(ns)
  ETCD-->>Watcher: Put/Delete events
  Watcher->>Watcher: Parse ModelEntry
  alt Put event
    opt target_namespace=Some(ns) and not global
      Watcher->>Watcher: if entry.ns != ns skip
    end
    Watcher->>MM: Add/Update model (with namespace)
  else Delete event
    Watcher->>MM: Remove model
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In burrows of bytes, I twitch my nose,
A namespace shifts where the river flows.
Global or scoped, I hop with grace,
Models appear in the right-place space.
Wheels roll fast on AMD lanes,
And watchers prune unneeded chains.
Thump-thump—clean trails, no stains. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (2)
components/frontend/src/dynamo/frontend/main.py (1)

175-181: Bug: trimmed metrics prefix is computed but not used.

You compute prefix = flags.metrics_prefix.strip() but set the env var to the untrimmed value. Use prefix.

Apply this diff:

     if flags.metrics_prefix is not None:
         prefix = flags.metrics_prefix.strip()
         if prefix:
-            os.environ["DYN_METRICS_PREFIX"] = flags.metrics_prefix
+            os.environ["DYN_METRICS_PREFIX"] = prefix
components/backends/sglang/deploy/disagg-multinode.yaml (1)

63-69: Prefill and decode are in different namespaces; this likely breaks disaggregation coordination.

prefill is configured with dynamoNamespace: sglang-disagg while Frontend and decode use sglang-disagg-multinode. Disaggregation workers typically must share the same namespace for discovery and routing.

Proposed fix (update prefill to match the others):

# In prefill service
dynamoNamespace: sglang-disagg-multinode

If this divergence is intentional, add a brief comment explaining the cross-namespace design to avoid regressions later.

♻️ Duplicate comments (1)
components/backends/sglang/deploy/agg.yaml (1)

27-27: Same verification as disagg: ensure DYN_NAMESPACE is injected.

The observation and verification steps in components/backends/sglang/deploy/disagg.yaml apply here as well.

Run the same script from the disagg.yaml comment to confirm env injection and fallback behavior.

🧹 Nitpick comments (19)
container/deps/vllm/install_vllm.sh (1)

151-156: Add integrity checks and robust cleanup for precompiled wheel; quote paths.

  • No checksum verification for the downloaded wheel; add an optional SHA256 gate to prevent tampering/partial downloads.
  • Use trap to ensure the wheel is cleaned up even on installation failures, instead of best-effort cleanup at the end.
  • Quote paths to avoid word-splitting issues.

If you can provide a trusted sha256 for each wheel in S3, the script can enforce it.

Apply this diff:

-    VLLM_TEMP_DIR=/tmp/vllm/wheel/$VLLM_REF
-    mkdir -p $VLLM_TEMP_DIR
-    REMOTE_WHEEL_URL=https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_REF}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
-    export VLLM_PRECOMPILED_WHEEL_LOCATION=$VLLM_TEMP_DIR/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
-    rm -rf $VLLM_PRECOMPILED_WHEEL_LOCATION || true
-    curl -fS --retry 3 -L "$REMOTE_WHEEL_URL" -o "$VLLM_PRECOMPILED_WHEEL_LOCATION"
+    VLLM_TEMP_DIR="/tmp/vllm/wheel/$VLLM_REF"
+    mkdir -p "$VLLM_TEMP_DIR"
+    REMOTE_WHEEL_URL="https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_REF}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl"
+    export VLLM_PRECOMPILED_WHEEL_LOCATION="$VLLM_TEMP_DIR/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl"
+    rm -f "$VLLM_PRECOMPILED_WHEEL_LOCATION" || true
+    curl -fS --retry 3 -L "$REMOTE_WHEEL_URL" -o "$VLLM_PRECOMPILED_WHEEL_LOCATION"
+    # Basic integrity checks: non-empty file and optional SHA256 verification if provided
+    test -s "$VLLM_PRECOMPILED_WHEEL_LOCATION"
+    if [ -n "${VLLM_WHEEL_SHA256:-}" ]; then
+        echo "$VLLM_WHEEL_SHA256  $VLLM_PRECOMPILED_WHEEL_LOCATION" | sha256sum -c -
+    fi
+    # Ensure cleanup on success/failure
+    trap 'rm -f "$VLLM_PRECOMPILED_WHEEL_LOCATION"' EXIT
@@
-    rm -rf $VLLM_PRECOMPILED_WHEEL_LOCATION
+    # Cleanup handled by trap above

Verification to consider:

  • Ensure you have or can generate per-ref SHA256 sums for the S3 wheels and wire them via VLLM_WHEEL_SHA256.

Also applies to: 162-162

components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)

106-107: Normalize DYN_NAMESPACE; treat blank/whitespace as global and add type hints

Current implementation returns an empty string if DYN_NAMESPACE is set but blank, which may propagate invalid/empty namespaces into runtime.namespace(...) or dyn:// endpoint construction elsewhere. Align with Rust is_global_namespace and avoid surprises by normalizing.

Apply this diff:

-def get_dynamo_namespace():
-    return os.environ.get("DYN_NAMESPACE", "global")
+def get_dynamo_namespace() -> str:
+    """
+    Resolve the Dynamo namespace from environment.
+    - Uses the DYN_NAMESPACE env var if set and non-empty after trimming.
+    - Falls back to 'global' otherwise.
+    """
+    ns = os.environ.get("DYN_NAMESPACE", "").strip()
+    return ns or "global"
components/backends/sglang/src/dynamo/sglang/utils/clear_namespace.py (1)

43-50: Make CLI fallback robust and user-friendly; accept blank as 'global'

  • The current flow asserts a non-empty namespace. Elsewhere (Rust) empty string maps to global, and other Python call sites may pass blanks inadvertently. Normalize env/CLI input and avoid a hard failure when omitted or blank.
  • Also add a helpful help= message to --namespace.

Apply this diff:

-    parser.add_argument("--namespace", type=str, required=False, default=None)
+    parser.add_argument(
+        "--namespace",
+        type=str,
+        required=False,
+        default=None,
+        help="Dynamo namespace (overrides $DYN_NAMESPACE). Defaults to 'global' if neither is set."
+    )
@@
-    if not args.namespace:
-        args.namespace = os.environ.get("DYN_NAMESPACE")
-    assert (
-        args.namespace
-    ), "Missing namespace, either pass --namespace or set DYN_NAMESPACE"
+    # Normalize: CLI wins; else env; treat blank/whitespace as global.
+    if not args.namespace:
+        args.namespace = os.environ.get("DYN_NAMESPACE", "")
+    args.namespace = (args.namespace or "").strip() or "global"
lib/llm/src/namespace.rs (1)

7-9: Trim whitespace to avoid treating " " as a distinct namespace

Normalize the input to reduce surprises from env/config with stray whitespace. This keeps behavior consistent with Python-side normalization we’re adopting.

Apply this diff:

-pub fn is_global_namespace(namespace: &str) -> bool {
-    namespace == GLOBAL_NAMESPACE || namespace.is_empty()
-}
+pub fn is_global_namespace(namespace: &str) -> bool {
+    let ns = namespace.trim();
+    ns.is_empty() || ns == GLOBAL_NAMESPACE
+}
components/backends/mocker/src/dynamo/mocker/main.py (1)

18-20: Normalize DYN_NAMESPACE to avoid dyn://.backend.generate when env is blank

If DYN_NAMESPACE is set to an empty or whitespace-only string, DEFAULT_ENDPOINT becomes invalid (dyn://.backend.generate). Normalize to default to global.

Apply this diff:

-DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "global")
-DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.backend.generate"
+DYN_NAMESPACE = (os.environ.get("DYN_NAMESPACE", "").strip() or "global")
+DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.backend.generate"

Optional (for better testability): compute the default inside cmd_line_args to respect env changes at runtime rather than module-import time.

# Inside cmd_line_args(), before adding --endpoint:
ns = (os.environ.get("DYN_NAMESPACE", "").strip() or "global")
default_endpoint = f"dyn://{ns}.backend.generate"
# Then use:
parser.add_argument("--endpoint", type=str, default=default_endpoint, help=f"Dynamo endpoint string (default: {default_endpoint})")
components/backends/vllm/src/dynamo/vllm/args.py (3)

108-110: Normalize DYN_NAMESPACE and keep endpoint derivation deterministic

  • Treat blank/whitespace env values as global to avoid empty namespace propagation into ETCD contexts and discovery.
  • Component/endpoint mapping looks good.

Apply this diff:

-    config.namespace = os.environ.get("DYN_NAMESPACE", "global")
+    config.namespace = os.environ.get("DYN_NAMESPACE", "").strip() or "global"
     config.component = "prefill" if args.is_prefill_worker else "backend"
     config.endpoint = "generate"

64-67: Help text is outdated; there is no "provided namespace" CLI option anymore

The message suggests a CLI-provided namespace, but we now derive namespace from DYN_NAMESPACE (default: global). Update the help to avoid confusion.

Apply this diff:

-    parser.add_argument(
-        "--is-prefill-worker",
-        action="store_true",
-        help="Enable prefill functionality for this worker. Uses the provided namespace to construct dyn://namespace.prefill.generate",
-    )
+    parser.add_argument(
+        "--is-prefill-worker",
+        action="store_true",
+        help="Enable prefill functionality for this worker. Namespace is read from $DYN_NAMESPACE (default: global). Endpoint will be dyn://{namespace}.prefill.generate",
+    )

185-190: Fix assert message to reference side_channel_port (not kv_port)

The assertion checks side_channel_port but the message mentions kv_port. Minor clarity fix.

Apply this diff:

-    assert (
-        config.side_channel_port is not None
-    ), "Must set the kv_port, use configure_ports_with_etcd"
+    assert (
+        config.side_channel_port is not None
+    ), "Must set side_channel_port; call configure_ports_with_etcd first"
components/frontend/src/dynamo/frontend/main.py (3)

41-45: Keep namespace constants aligned across languages.

GLOBAL_NAMESPACE = "global" duplicates the Rust-side constant exposed via lib/llm/src/namespace.rs. This is fine for now, but it’s easy to drift. Consider centralizing the Python definition (e.g., a small shared Python module) or adding a brief comment referencing the Rust constant to reduce future mismatches.


125-130: Validate namespace input and document env-var fallback in help.

  • Add validation to prevent accidental invalid values (whitespace, dots) that would form malformed endpoint_ids.
  • Update the help string to explicitly mention the DYN_NAMESPACE env var fallback.

Apply this diff to improve the CLI arg:

     parser.add_argument(
         "--namespace",
-        type=str,
+        type=validate_namespace,
         default=None,
-        help="Dynamo namespace for model discovery scoping. If specified, models will only be discovered from this namespace. If not specified, discovers models from all namespaces (global discovery).",
+        help="Dynamo namespace for model discovery scoping. If specified, models are discovered only from this namespace. If not specified, falls back to DYN_NAMESPACE env var; if unset or blank, uses 'global' (global discovery). Allowed characters: [A-Za-z0-9_-].",
     )

Add this helper outside the selected lines (e.g., near other validators):

def validate_namespace(value: str) -> str:
    value = (value or "").strip()
    if not value:
        raise argparse.ArgumentTypeError("namespace must be non-empty if provided")
    # Disallow dots/whitespace; allow hyphens/underscores to align with examples like 'sglang-disagg'
    if not re.match(r"^[A-Za-z0-9][A-Za-z0-9_-]*$", value):
        raise argparse.ArgumentTypeError(
            f"namespace contains invalid characters: {value!r}. Allowed pattern: ^[A-Za-z0-9][A-Za-z0-9_-]*$"
        )
    return value

165-167: Treat blank env var as unset and strip whitespace.

If DYN_NAMESPACE is set to a blank or whitespace-only value, current code will pass that through and later default to GLOBAL_NAMESPACE, but you end up carrying around a misleading flags.namespace value. Strip and ignore blank here for clarity.

Apply this diff:

-    if not flags.namespace:
-        flags.namespace = os.environ.get(DYNAMO_NAMESPACE_ENV_VAR)
+    if not flags.namespace:
+        env_ns = os.environ.get(DYNAMO_NAMESPACE_ENV_VAR)
+        if env_ns is not None:
+            env_ns = env_ns.strip()
+            if env_ns:
+                flags.namespace = env_ns
components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (1)

74-76: Make namespace resolution robust and observable.

Treat blank env values as unset, and log the resolved namespace for easier debugging.

Apply this diff:

-    component = runtime.namespace(get_dynamo_namespace()).component("decode")
-    await component.create_service()
+    ns = (get_dynamo_namespace() or "").strip() or "global"
+    logging.info(f"Using namespace: '{ns}' for decode worker")
+    component = runtime.namespace(ns).component("decode")
+    await component.create_service()

Optionally, harden the source utility outside this file by trimming and defaulting there:

# components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py
def get_dynamo_namespace():
    return (os.environ.get("DYN_NAMESPACE") or "").strip() or "global"
lib/llm/tests/http_namespace_integration.rs (3)

34-101: Filtering logic mirrors intended isolation semantics; minor duplication nit.

The “global includes all; specific ns filters strictly to that ns” behavior is explicit. Consider extracting the filtering predicate into a small helper to reduce repetition across tests.

Example refactor (outside this hunk):

fn visible<'a>(models: &'a [ModelEntry], ns: &str) -> Vec<&'a ModelEntry> {
    let is_global = is_global_namespace(ns);
    models.iter().filter(|m| is_global || m.endpoint.namespace == ns).collect()
}

104-153: Good coverage of endpoint parsing and namespace classification; add one invalid parse case.

Parsing only valid triples misses negative cases (e.g., missing segments). Adding one assertion for an invalid endpoint ID will harden the contract and protect against regressions.

Suggested addition (outside this hunk):

#[test]
fn test_endpoint_id_invalid_formats() {
    let invalids = ["no-dots", "too.many.dots.here", ".leading.dot", "trailing.dot.", "ns..name"];
    for s in invalids {
        assert!(s.parse::<Endpoint>().is_err(), "Expected parse error for '{s}'");
    }
}

155-222: Scoping scenarios reflect strict namespace isolation; consider asserting absence of “global” under scoped frontend.

You already verify the count is 2; adding an assertion that the “global” entry is not present can catch accidental inclusion if logic changes.

For example:

assert!(visible_models.iter().all(|m| m.endpoint.namespace != GLOBAL_NAMESPACE));
lib/llm/src/entrypoint/input/common.rs (1)

78-83: Confirm intended scope: dynamic-input path now watches all namespaces.

Unlike the HTTP path (which filters by namespace), this path explicitly opts out of filtering. Please confirm this is intentional for the dynamic input flow; if not, consider passing the local model’s namespace similar to HTTP.

components/backends/sglang/src/dynamo/sglang/worker/main.py (1)

336-339: Sanitize namespace from env to avoid accidental whitespace/empty values.

If DYN_NAMESPACE is set to a blank/whitespace value, you’ll end up registering under that literal namespace (not “global”), which is likely unintended and inconsistent with Rust-side is_global_namespace semantics. Strip and default here.

Apply this diff:

-    dynamo_namespace = get_dynamo_namespace()
+    # Normalize: treat empty/whitespace as "global"
+    dynamo_namespace = (get_dynamo_namespace() or "global").strip() or "global"
lib/llm/src/discovery/watcher.rs (1)

91-94: Avoid relying on Option::is_none_or for broader MSRV compatibility.

Option::is_none_or may not be available on older toolchains. Use map_or(true, ...) to keep behavior identical without bumping MSRV.

Apply this diff:

-        let global_namespace = target_namespace.is_none_or(is_global_namespace);
+        let global_namespace = target_namespace.map_or(true, is_global_namespace);
lib/llm/src/entrypoint/input/http.rs (1)

176-186: WatcherConfig packaging is clean; consider clarifying doc re: “empty namespace”.

Docs mention “empty or equals GLOBAL_NAMESPACE”, but in this call site you only ever pass None (for global) or Some(non-global). Consider updating the comment to “If target_namespace is None or equals GLOBAL_NAMESPACE/empty on the watcher side, discover all namespaces” to avoid confusion.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d177cdf and c7b3957.

📒 Files selected for processing (23)
  • components/backends/mocker/src/dynamo/mocker/main.py (2 hunks)
  • components/backends/sglang/deploy/agg.yaml (1 hunks)
  • components/backends/sglang/deploy/agg_router.yaml (1 hunks)
  • components/backends/sglang/deploy/disagg-multinode.yaml (1 hunks)
  • components/backends/sglang/deploy/disagg.yaml (1 hunks)
  • components/backends/sglang/deploy/disagg_planner.yaml (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/common/__init__.py (2 hunks)
  • components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (2 hunks)
  • components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (2 hunks)
  • components/backends/sglang/src/dynamo/sglang/utils/clear_namespace.py (2 hunks)
  • components/backends/sglang/src/dynamo/sglang/worker/main.py (3 hunks)
  • components/backends/vllm/README.md (0 hunks)
  • components/backends/vllm/deploy/README.md (0 hunks)
  • components/backends/vllm/src/dynamo/vllm/args.py (1 hunks)
  • components/frontend/src/dynamo/frontend/main.py (6 hunks)
  • container/deps/vllm/install_vllm.sh (1 hunks)
  • lib/llm/src/discovery/watcher.rs (4 hunks)
  • lib/llm/src/entrypoint/input/common.rs (1 hunks)
  • lib/llm/src/entrypoint/input/http.rs (5 hunks)
  • lib/llm/src/lib.rs (1 hunks)
  • lib/llm/src/namespace.rs (1 hunks)
  • lib/llm/tests/http_namespace_integration.rs (1 hunks)
  • lib/llm/tests/namespace.rs (1 hunks)
💤 Files with no reviewable changes (2)
  • components/backends/vllm/README.md
  • components/backends/vllm/deploy/README.md
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-28T17:00:07.968Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Applied to files:

  • components/backends/sglang/deploy/disagg_planner.yaml
  • components/backends/sglang/deploy/disagg-multinode.yaml
  • components/backends/sglang/deploy/agg_router.yaml
  • components/backends/sglang/deploy/agg.yaml
  • components/backends/sglang/deploy/disagg.yaml
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

  • container/deps/vllm/install_vllm.sh
🧬 Code Graph Analysis (10)
lib/llm/src/lib.rs (5)
lib/runtime/src/distributed.rs (1)
  • namespace (185-187)
lib/runtime/src/component.rs (2)
  • namespace (218-220)
  • namespace (602-609)
lib/bindings/python/rust/lib.rs (1)
  • namespace (286-291)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • namespace (37-41)
lib/llm/src/kv_router/publisher.rs (1)
  • namespace (988-989)
components/backends/sglang/src/dynamo/sglang/common/__init__.py (1)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)
  • get_dynamo_namespace (106-107)
lib/llm/tests/http_namespace_integration.rs (1)
lib/llm/src/namespace.rs (1)
  • is_global_namespace (7-9)
components/backends/sglang/src/dynamo/sglang/utils/clear_namespace.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_http_server.py (1)
  • parse_args (179-194)
lib/llm/src/discovery/watcher.rs (1)
lib/llm/src/namespace.rs (1)
  • is_global_namespace (7-9)
lib/llm/tests/namespace.rs (2)
lib/llm/src/namespace.rs (1)
  • is_global_namespace (7-9)
lib/llm/src/discovery/watcher.rs (2)
  • serde_json (98-98)
  • serde_json (433-433)
components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (1)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)
  • get_dynamo_namespace (106-107)
lib/llm/src/entrypoint/input/http.rs (2)
lib/llm/src/namespace.rs (1)
  • is_global_namespace (7-9)
lib/llm/src/discovery/watcher.rs (1)
  • new (60-74)
components/frontend/src/dynamo/frontend/main.py (3)
lib/bindings/python/rust/lib.rs (2)
  • namespace (286-291)
  • get (363-379)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • namespace (37-41)
  • get (127-140)
lib/runtime/src/component/client.rs (1)
  • is_static (129-131)
components/backends/sglang/src/dynamo/sglang/worker/main.py (3)
components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)
  • get_dynamo_namespace (106-107)
lib/bindings/python/rust/lib.rs (2)
  • component (543-549)
  • namespace (286-291)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • component (187-191)
  • namespace (37-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (29)
container/deps/vllm/install_vllm.sh (1)

153-155: Confirm Python ABI compatibility with the precompiled wheel tag.

The wheel name is tagged cp38-abi3. That requires the runtime to be CPython >= 3.8 and compatible with the stable ABI. If our base images run Python < 3.8 or PyPy, installation will fail at runtime. Please confirm our build images use CPython 3.8+.

If helpful, I can add a guard to the script to assert Python version >= 3.8 before proceeding with the precompiled path and fall back to source build otherwise. Do you want me to push that change?

components/backends/sglang/src/dynamo/sglang/common/sgl_utils.py (1)

7-7: Import looks fine

Adding os is necessary for environment-driven namespace. No issues here.

lib/llm/src/namespace.rs (1)

5-5: Constant is clear and consistent

GLOBAL_NAMESPACE = "global" provides a single source of truth in Rust code.

components/frontend/src/dynamo/frontend/main.py (3)

22-22: LGTM: logging import is appropriate.

Importing logging is necessary for the new startup messages.


206-215: LGTM: dynamic endpoint_id construction is consistent with namespacing.

  • Scoped discovery when a namespace is provided: ".frontend.http"
  • Defaults to ".frontend.http" when absent

229-241: LGTM: helpful startup logs.

The messages clearly indicate static vs dynamic and the scope used, which will aid troubleshooting.

lib/llm/src/lib.rs (1)

30-30: LGTM: public namespace module export.

Exposing pub mod namespace; is straightforward and aligns with tests and watcher usage.

components/backends/sglang/deploy/disagg.yaml (1)

27-27: DYN_NAMESPACE injection and fallback verified

  • The operator’s component_common.go (lines 63–68) injects DYN_NAMESPACE from dynamoNamespace.
  • clear_namespace (clear_namespace.py, lines 43–50) falls back to os.environ["DYN_NAMESPACE"] when --namespace is omitted.
  • The frontend CLI (main.py, namespace flag) similarly defaults to the DYN_NAMESPACE env var if --namespace isn’t provided.

All tests in component_planner_test.go and graph_test.go cover the operator injection. No further changes needed.

components/backends/sglang/src/dynamo/sglang/decode_worker/main.py (1)

20-20: LGTM: dynamic namespace import.

Importing get_dynamo_namespace aligns this worker with the new environment-driven namespace model.

components/backends/sglang/deploy/disagg-multinode.yaml (2)

29-29: Namespace flag removal looks good; keep the startup gate via &&.

Dropping the explicit --namespace aligns with env-driven discovery. The && ensures the frontend won’t start if namespace clearing fails, which matches the intended behavior.

Please confirm that the operator injects DYN_NAMESPACE for this pod (derived from dynamoNamespace: sglang-disagg-multinode). If not injected, clear_namespace may default to the global namespace and clear the wrong scope.


20-22: Verify componentType usage for Frontend (“main” vs “frontend”).

Other manifests use componentType: frontend. Using main here may alter service naming or discovery labels depending on the CRD’s conventions.

If the CRD expects frontend for the HTTP entrypoint, consider aligning it for consistency. Otherwise, please confirm main is the intended role for this graph and won’t affect discovery or service exposure.

components/backends/sglang/deploy/agg_router.yaml (1)

27-27: LGTM: env-driven clear_namespace and startup gating preserved.

Removing --namespace is fine if DYN_NAMESPACE is provided by the operator from dynamoNamespace. The && ensures failure to clear blocks startup, which matches prior design.

Please confirm DYN_NAMESPACE injection for this Frontend pod; otherwise, clear_namespace would fall back to the global namespace.

components/backends/sglang/deploy/disagg_planner.yaml (2)

34-34: LGTM: clear_namespace now relies on env; startup still gated.

The change aligns with the env-driven namespace approach, and the && keeps the critical “do not start without clearing” behavior.


12-16: Environment variable name may be inconsistent with code (DYNAMO_NAMESPACE vs DYN_NAMESPACE).

Runtime code and helpers read DYN_NAMESPACE (per this PR). This manifest sets DYNAMO_NAMESPACE, which may be ignored unless separately consumed.

  • If the operator injects DYN_NAMESPACE from dynamoNamespace, you can drop DYNAMO_NAMESPACE here to reduce confusion.
  • If you intend to override via env, add DYN_NAMESPACE: "dynamo" explicitly or rename DYNAMO_NAMESPACE to DYN_NAMESPACE.

I can generate a search script to find remaining uses of DYNAMO_NAMESPACE across the repo if helpful.

components/backends/sglang/src/dynamo/sglang/common/__init__.py (2)

16-23: Expose get_dynamo_namespace from common: good addition.

Importing get_dynamo_namespace here provides a stable, central import path for callers (from dynamo.sglang.common import get_dynamo_namespace).


25-40: all updated correctly to export get_dynamo_namespace.

Public API surface now includes the namespace helper alongside existing utilities and types. No issues spotted.

lib/llm/tests/http_namespace_integration.rs (2)

15-32: Helper to construct ModelEntry is clear and reusable.

The helper keeps tests concise and standardizes Endpoint construction across scenarios.


224-275: Boundary tests are solid; case sensitivity check is valuable.

Treating empty as global and enforcing lower-case “global” matches the current is_global_namespace contract.

lib/llm/src/entrypoint/input/common.rs (1)

80-81: Updated watcher call aligns with new API (OK).

Passing None for the optional namespace parameter correctly preserves the previous behavior (no filtering). Looks good.

lib/llm/tests/namespace.rs (5)

11-47: Namespace helpers are well-covered.

Good coverage for GLOBAL_NAMESPACE, empty string, whitespace, and case sensitivity. This aligns with the current is_global_namespace implementation.


69-101: ModelEntry construction tests are clear and correct.

These validate Endpoint fields and ModelType round-trip well. Solid.


102-165: Filtering logic mirrors runtime watcher behavior (OK).

Using is_global_namespace to decide whether to include all is consistent with the watcher’s semantics. Nice.


167-194: Serialization round-trip test is valuable.

Ensures etcd payload compatibility. No issues.


195-213: Endpoint parsing tests are helpful.

Covering both dot and slash separators is great for robustness. Assuming Endpoint::from handles both forms, this should pass.

components/backends/sglang/src/dynamo/sglang/worker/main.py (2)

36-40: Importing get_dynamo_namespace (OK).

This keeps the resolution centralized. Good.


346-352: Decode client discovery uses the same namespace (OK).

Consistent with worker component creation; this avoids cross-namespace leakage.

lib/llm/src/discovery/watcher.rs (1)

141-158: Structured logs with namespace context (nice).

Including namespace in add/update logs improves observability across namespaces.

lib/llm/src/entrypoint/input/http.rs (2)

41-48: Namespace scoping logic is correct and non-intrusive.

Using is_global_namespace to decide between Some(namespace) vs None aligns with the watcher’s behavior. Passing as_deref() later is a clean handoff.


188-227: run_watcher task orchestration looks good.

  • Using a channel to toggle HTTP endpoints on model type updates is solid.
  • The watcher task passes target_namespace.as_deref() correctly.

No issues.

Comment on lines +108 to +110
config.namespace = os.environ.get("DYN_NAMESPACE", "global")
config.component = "prefill" if args.is_prefill_worker else "backend"
config.endpoint = "generate"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Repository-wide sanity check for lingering defaults and env var name drift

Given the migration to DYN_NAMESPACE and deprecation of hard-coded dyn://dynamo.* endpoints, verify there are no remaining references to:

  • dyn://dynamo.*
  • DYNAMO_NAMESPACE (old env var)
  • DEFAULT_ENDPOINT values that may capture stale env at import-time

Run this script:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for hard-coded dyn://dynamo.* endpoints..."
rg -nP -C2 'dyn://dynamo\.' -g '!**/node_modules/**'

echo
echo "Searching for old env var name DYNAMO_NAMESPACE..."
rg -nP -C3 '\bDYNAMO_NAMESPACE\b' -g '!**/node_modules/**'

echo
echo "Searching for module-level DEFAULT_ENDPOINT definitions (may capture env too early)..."
rg -nP -C2 '^\s*DEFAULT_ENDPOINT\s*=\s*f?[\'"]dyn://' --type=python

echo
echo "Review env namespace retrievals; ensure blank/whitespace are normalized to 'global'..."
rg -nP -C2 'os\.environ\.get\(\s*["\']DYN_NAMESPACE["\']\s*(?:,\s*["\'][^"\']*["\']\s*)?\)' --type=python

Length of output: 10567


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Searching for module-level DEFAULT_ENDPOINT definitions (may capture env too early)..."
rg -nP -C2 --type=python '^\s*DEFAULT_ENDPOINT\s*=\s*f?["\'].*dyn://' -g '!**/node_modules/**'

echo
echo "Searching for os.environ.get usages of DYN_NAMESPACE..."
rg -nP -C2 --type=python "os\.environ\.get\(\s*['\"]DYN_NAMESPACE['\"]" -g '!**/node_modules/**'

Length of output: 319


Remove hard-coded dyn://dynamo. endpoints & deprecated DYNAMO_NAMESPACE*
We ran a repo-wide search and found numerous lingering defaults. Please update or remove all hard-coded dyn://dynamo.* URLs and switch fully to the new environment variable, and deprecate the old DYNAMO_NAMESPACE. Also ensure that DYN_NAMESPACE values that are empty or whitespace normalize back to "global".

Locations requiring attention:

  • Examples & docs (remove/update default endpoints):
    • lib/bindings/python/examples/hello_world/* – DEFAULT_ENDPOINT = "dyn://dynamo.backend.generate"
    • docs/guides/dynamo_run.md – multiple in= / out=dyn://dynamo.* usages
    • examples/multimodal_v1/… – DEFAULT_ENDPOINT, DEFAULT_DOWNSTREAM_ENDPOINT, worker default endpoints
  • Backends:
    • components/backends/llama_cpp/src/dynamo/llama_cpp/main.py – DEFAULT_ENDPOINT = "dyn://dynamo.backend.generate"
    • components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py – several DEFAULT_ENDPOINT = "dyn://dynamo.*"
  • Deployment configs & defaults (remove old env var):
    • components/backends/vllm/deploy/disagg_planner.yaml – DYNAMO_NAMESPACE
    • components/backends/sglang/deploy/disagg_planner.yaml – DYNAMO_NAMESPACE
    • components/planner/src/dynamo/planner/defaults.py – namespace = os.environ.get("DYNAMO_NAMESPACE", …)

Suggested fixes:

  • Replace all hard-coded dyn://dynamo.* with a parameterized endpoint (e.g. derived from a new DYN_ENDPOINT or passed via CLI).
  • Remove references to DYNAMO_NAMESPACE; use only os.environ.get("DYN_NAMESPACE", "global") with .strip() and a fallback when the result is empty.

fi
else
echo "Installing vllm for AMD64 architecture"
uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Respect TORCH_BACKEND when installing PyTorch (avoid hardcoded cu128).

The AMD64 path hardcodes the cu128 channel, ignoring the --torch-backend argument and TORCH_BACKEND env var. This will silently install CUDA 12.8 wheels even if a different backend (e.g., cpu) is requested, leading to mismatches. Make the index URL and torch spec conditional on TORCH_BACKEND.

Apply this diff:

-    uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
+    # Respect --torch-backend for PyTorch channel and version tag
+    if [ "$TORCH_BACKEND" = "cpu" ]; then
+        PYTORCH_INDEX_URL="https://download.pytorch.org/whl/cpu"
+        TORCH_SPEC="torch==2.7.1"
+    else
+        PYTORCH_INDEX_URL="https://download.pytorch.org/whl/${TORCH_BACKEND}"
+        TORCH_SPEC="torch==2.7.1+${TORCH_BACKEND}"
+    fi
+    uv pip install "$TORCH_SPEC" torchaudio==2.7.1 torchvision==0.22.1 --index-url "$PYTORCH_INDEX_URL"

Note: Given past learnings, defaulting --torch-backend=auto during vLLM install also works well via uv; consider using auto as the default unless you have a specific need to pin.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
# Respect --torch-backend for PyTorch channel and version tag
if [ "$TORCH_BACKEND" = "cpu" ]; then
PYTORCH_INDEX_URL="https://download.pytorch.org/whl/cpu"
TORCH_SPEC="torch==2.7.1"
else
PYTORCH_INDEX_URL="https://download.pytorch.org/whl/${TORCH_BACKEND}"
TORCH_SPEC="torch==2.7.1+${TORCH_BACKEND}"
fi
uv pip install "$TORCH_SPEC" torchaudio==2.7.1 torchvision==0.22.1 --index-url "$PYTORCH_INDEX_URL"
🤖 Prompt for AI Agents
In container/deps/vllm/install_vllm.sh around line 150, the pip install line
hardcodes the cu128 CUDA wheel; change it to read the --torch-backend argument
or the TORCH_BACKEND env var (default to "auto"), and set the torch package spec
and --index-url conditionally based on that value (e.g., use +cu128 and the
cu128 index for cu128, +cpu and the cpu index for cpu, and for "auto" choose a
safe default that doesn't force cu128). Replace the literal pip args with
variables computed from TORCH_BACKEND so the uv pip install command uses the
correct wheel tag and index URL instead of the hardcoded cu128.

Comment on lines +113 to +125
// Filter by namespace if target_namespace is specified
if let Some(target_ns) = target_namespace {
if !global_namespace && model_entry.endpoint.namespace != target_ns {
tracing::debug!(
model_namespace = model_entry.endpoint.namespace,
target_namespace = target_ns,
model_name = model_entry.name,
"Skipping model from different namespace"
);
continue;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Asymmetric filtering causes noisy errors on Delete events.

You skip saving entries for non-target namespaces on Put events, but still process Delete events for those keys. This leads to “Missing ModelEntry for {key}” errors on deletes for namespaces you never tracked. Prefer to silently ignore deletes for unknown keys (or filter deletes similarly).

Two possible fixes:

  • Minimal: Treat missing entry on delete as a no-op with a debug log.
  • Broader: Only process deletes for keys you have tracked (requires checking manager state before calling handle_delete).

Minimal fix (change outside the edited hunk; proposed patch):

@@
     async fn handle_delete(&self, kv: &KeyValue) -> anyhow::Result<Option<String>> {
         let key = kv.key_str()?;
-        let model_entry = match self.manager.remove_model_entry(key) {
-            Some(entry) => entry,
-            None => {
-                anyhow::bail!("Missing ModelEntry for {key}");
-            }
-        };
+        let Some(model_entry) = self.manager.remove_model_entry(key) else {
+            // When watching a specific namespace we may never have tracked this key.
+            tracing::debug!("Delete for unknown model key {key}, ignoring");
+            return Ok(None);
+        };

Also applies to: 170-181

🤖 Prompt for AI Agents
In lib/llm/src/discovery/watcher.rs around lines 113-125 (and similarly
170-181), Delete events for keys in namespaces you never tracked produce noisy
"Missing ModelEntry" errors because Put events skipped non-target namespaces but
Delete events are still processed; update the Delete handling to treat a missing
entry as a no-op: before erroring when a ModelEntry lookup fails on Delete,
check if the key is unknown and simply emit a debug log and return early (or
guard deletes by first checking manager state for the tracked key) so deletes
for untracked namespaces are ignored silently.

@biswapanda biswapanda closed this Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants