(feat) rhoai-lls does not require kubernetes to run #22

gallettilance · 2025-09-09T12:23:39Z

What does this PR do?

Test Plan

Summary by CodeRabbit

New Features
- Added three build modes: Full (default), Standalone (no Kubernetes), and Unified.
- New entrypoint chooses the appropriate configuration at startup and launches the service.
- Standalone images/configs enable local runs with simple Docker/Podman commands on port 8321.
Documentation
- Added Build Modes guide with examples and updated build/run instructions and image tagging (llama-stack-rh).
- Added Docker build example.
Chores
- Container build now produces mode-aware images and includes additional runtime dependencies.

coderabbitai · 2025-09-09T12:23:48Z

Walkthrough

Adds build-mode selection (full, standalone, unified) to the build script and templates; introduces a standalone build spec and runtime config; Containerfile generation now emits a mode banner and uses an entrypoint script that chooses config and filters providers at runtime; README updated with build/run examples.

Changes

Cohort / File(s)	Summary
Documentation & Instructions `README.md`	Adds Build Modes section; updates Containerfile generation guidance to `./distribution/build.py [--standalone] [--unified]`; changes example image tag to `llama-stack-rh`; adds Docker build/run examples for standalone and full modes.
Build Orchestration `distribution/build.py`, `distribution/build.yaml`, `distribution/build-standalone.yaml`	`build.py`: adds CLI flags and env fallbacks for modes, modifies get_dependencies and generate_containerfile signatures, writes mode banner to generated Containerfile, prints mode-specific guidance. `build.yaml`: provider entries normalized to strings. Adds `build-standalone.yaml` with standalone distribution spec.
Container Image Definitions `distribution/Containerfile`, `distribution/Containerfile.in`	Adds mode banner metadata; copies dual configs (`run-full.yaml` & `run-standalone.yaml`); installs additional pip deps; switches ENTRYPOINT from direct Python module to `/opt/app-root/entrypoint.sh`.
Runtime Entrypoint & Config `distribution/entrypoint.sh`, `distribution/run-standalone.yaml`	Adds `entrypoint.sh` that selects `run-standalone.yaml` or `run-full.yaml` based on `STANDALONE`, filters/excludes TrustyAI providers in standalone, writes providers into `${HOME}/.llama/providers.d`, disables `external_providers_dir` for standalone, and `exec`s the server. Adds `run-standalone.yaml` with providers, models, sqlite stores, server port 8321, and tool group mappings.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant Build as distribution/build.py
  participant Tpl as distribution/Containerfile.in
  participant CF as distribution/Containerfile
  participant Image as Container Image
  participant Entryp as /opt/app-root/entrypoint.sh
  participant Server as Llama Server

  Dev->>Build: ./distribution/build.py [--standalone|--unified]
  Build->>Build: parse args & env (STANDALONE/UNIFIED)
  alt standalone
    Build->>Build: load distribution/build-standalone.yaml
  else full
    Build->>Build: load distribution/build.yaml
  end
  Build->>Tpl: render template with deps + mode banner
  Tpl-->>CF: write generated Containerfile
  Build->>Dev: print image build/run guidance (docker/podman)

  Dev->>Image: docker build -f distribution/Containerfile -t llama-stack-rh .
  Dev->>Image: docker run -e STANDALONE=true -e VLLM_URL=... -e INFERENCE_MODEL=... -p 8321:8321 llama-stack-rh

  Image->>Entryp: ENTRYPOINT /opt/app-root/entrypoint.sh
  Entryp->>Entryp: if STANDALONE=true -> use run-standalone.yaml, filter TrustyAI, disable external_providers_dir
  Entryp->>Entryp: else -> use run-full.yaml, copy all providers
  Entryp->>Server: exec python -m llama_stack.core.server.server "$CONFIG_FILE"
  Server-->>Dev: APIs available on :8321

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description currently contains only placeholder template sections with no actual summary of changes or test plan details, making it uninformative and unrelated to the modifications in build scripts, documentation, and container configuration.	Please update the description to include a concise summary of the implemented standalone and unified build modes, document the changes to the Containerfile and entrypoint script, and provide clear test instructions with example commands and expected outcomes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title indicates the system no longer requires Kubernetes to run, which aligns with the introduction of a standalone mode implemented across the build script, container configuration, and entrypoint changes. It refers to a real aspect of the change by highlighting non-Kubernetes operation. However, it omits mention of the new unified build mode and uses terminology that may differ from the repository name. Nonetheless, it gives a clear indication of a key new capability and is sufficiently related to the changeset to pass this check.

Poem

In a burrow of builds I hop and mend,
Flags at my whiskers: standalone or blend.
An entrypoint nibble, providers pruned,
Containers built and ports attuned.
Pack the image — off it goes, my friend. 🐇

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

distribution/build.py (3)

66-71: Avoid shell=True and pass argv list to subprocess.run.

Prevents quoting/escaping issues and command injection risks; also removes reliance on the shell.

-    cmd = f"llama stack build --config {config_file} --print-deps-only"
-    try:
-        result = subprocess.run(
-            cmd, shell=True, capture_output=True, text=True, check=True
-        )
+    cmd = ["llama", "stack", "build", "--config", config_file, "--print-deps-only"]
+    try:
+        result = subprocess.run(
+            cmd, capture_output=True, text=True, check=True
+        )

77-111: Do not sort/deduplicate or re-order pip args; it breaks flag–value pairing (e.g., --index-url URL, -r requirements.txt) and may fail builds. Keep uv and original token order.

The current logic splits, sorts, and rejoins tokens, which can separate flags from their required values and alter install order. Also, replacing “uv ” with “RUN ” drops uv entirely.

-        # Categorize and sort different types of pip install commands
-        standard_deps = []
-        torch_deps = []
-        no_deps = []
-        no_cache = []
+        # Preserve original order and semantics; just convert to Docker RUN lines
+        converted_lines = []
@@
-        for line in result.stdout.splitlines():
-            if line.strip().startswith("uv pip"):
-                # Split the line into command and packages
-                parts = line.replace("uv ", "RUN ", 1).split(" ", 3)
-                if len(parts) >= 4:  # We have packages to sort
-                    cmd_parts = parts[:3]  # "RUN pip install"
-                    packages = sorted(
-                        set(parts[3].split())
-                    )  # Sort the package names and remove duplicates
-
-                    # Determine command type and format accordingly
-                    if "--index-url" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        torch_deps.append(full_cmd)
-                    elif "--no-deps" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        no_deps.append(full_cmd)
-                    elif "--no-cache" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        no_cache.append(full_cmd)
-                    else:
-                        formatted_packages = " \\\n    ".join(packages)
-                        full_cmd = f"{' '.join(cmd_parts)} \\\n    {formatted_packages}"
-                        standard_deps.append(full_cmd)
-                else:
-                    standard_deps.append(" ".join(parts))
+        for raw in result.stdout.splitlines():
+            line = raw.strip()
+            if not line:
+                continue
+            if line.startswith("uv pip"):
+                # Keep uv and entire arg string intact
+                converted_lines.append(f"RUN {line}")
+            else:
+                # Pass-through anything unexpected (future-proofing)
+                converted_lines.append(f"RUN {line}")
@@
-        # Combine all dependencies in specific order
-        all_deps = []
-        all_deps.extend(sorted(standard_deps))  # Regular pip installs first
-        all_deps.extend(sorted(torch_deps))  # PyTorch specific installs
-        all_deps.extend(sorted(no_deps))  # No-deps installs
-        all_deps.extend(sorted(no_cache))  # No-cache installs
-
-        return "\n".join(all_deps)
+        return "\n".join(converted_lines)

33-41: Version check: avoid shell=True and parse the version robustly.

Some CLIs print labels (e.g., “llama-stack 0.2.18”); extract the semver to avoid false mismatches.

-        result = subprocess.run(
-            ["llama stack --version"],
-            shell=True,
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        installed_version = result.stdout.strip()
+        result = subprocess.run(
+            ["llama", "stack", "--version"],
+            capture_output=True,
+            text=True,
+            check=True,
+        )
+        installed_text = result.stdout.strip()
+        import re
+        m = re.search(r"\b(\d+\.\d+\.\d+)\b", installed_text)
+        installed_version = m.group(1) if m else installed_text

🧹 Nitpick comments (2)

distribution/build.py (2)

49-56: Improve mismatch guidance and make pre-commit reference optional.

Avoid pointing to .pre-commit-config.yaml unless it’s guaranteed to exist; suggest both files to update.

-            print(
-                "  If you just bumped the llama-stack version in BASE_REQUIREMENTS, you must update the version from .pre-commit-config.yaml"
-            )
+            print("  If you bumped llama-stack in BASE_REQUIREMENTS, update any pinned versions in tooling (e.g., pre-commit hooks) to match.")

8-9: Document flag/env precedence and conflict behavior in usage.

Clarify that UNIFIED takes precedence over STANDALONE, and note example where both are set.

-# Usage: ./distribution/build.py [--standalone] [--unified]
-# Or set STANDALONE=true or UNIFIED=true environment variables
+# Usage: ./distribution/build.py [--standalone] [--unified]
+# Or set STANDALONE=true or UNIFIED=true (UNIFIED takes precedence if both are set)
@@
-        epilog="""
+        epilog="""
 Examples:
   %(prog)s                    # Build full version (default)
   %(prog)s --standalone       # Build standalone version (no Kubernetes deps)
   %(prog)s --unified          # Build unified version (supports both modes)
   STANDALONE=true %(prog)s    # Build standalone via environment variable
-  UNIFIED=true %(prog)s       # Build unified via environment variable
+  UNIFIED=true %(prog)s       # Build unified via environment variable
+  STANDALONE=true UNIFIED=true %(prog)s  # Unified takes precedence
 """

Also applies to: 154-170

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf91259 and ac1e39e.

📒 Files selected for processing (8)

README.md (2 hunks)
distribution/Containerfile (3 hunks)
distribution/Containerfile.in (1 hunks)
distribution/build-standalone.yaml (1 hunks)
distribution/build.py (5 hunks)
distribution/build.yaml (1 hunks)
distribution/entrypoint.sh (1 hunks)
distribution/run-standalone.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (7)

README.md
distribution/run-standalone.yaml
distribution/entrypoint.sh
distribution/Containerfile
distribution/Containerfile.in
distribution/build.yaml
distribution/build-standalone.yaml

🔇 Additional comments (2)

distribution/build.py (2)

129-145: Template formatting: verify braces in Containerfile.in.

str.format will treat single braces as placeholders. If the template has shell braces (e.g., ${VAR} or {something} in comments), ensure they’re doubled {{ }} or switch to string.Template to avoid accidental KeyError.

Would you like me to switch this to string.Template to avoid brace escaping?

196-210: Runtime hints: add explicit unified-mode env for clarity.

If the entrypoint supports LLAMA_STACK_UNIFIED or similar, include it; otherwise, consider adding a short note that STANDALONE=false (default) selects full mode.

If helpful, I can align these run examples with entrypoint.sh semantics once confirmed.

coderabbitai · 2025-09-09T12:42:28Z

distribution/build.py

+    standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
+    unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")
+
+    if unified:
+        mode = "unified"
+        print("Building unified version (supports both full and standalone modes)...")
+    else:
+        mode = "standalone" if standalone else "full"
+        print(f"Building {mode} version...")


⚠️ Potential issue

UNIFIED overrides STANDALONE but dependencies still built with standalone=True when both are set. Fix precedence and call get_dependencies with the effective mode.

Currently, UNIFIED messaging is shown while using standalone deps if both envs/flags are true.

- standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes") - unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes") + standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes") + unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes") + # Precedence: UNIFIED implies using full dependency set + effective_standalone = False if unified else standalone @@ - if unified: + if unified: mode = "unified" print("Building unified version (supports both full and standalone modes)...") else: - mode = "standalone" if standalone else "full" + mode = "standalone" if standalone else "full" print(f"Building {mode} version...") @@ - dependencies = get_dependencies(standalone) + dependencies = get_dependencies(effective_standalone) @@ - generate_containerfile(dependencies, standalone, unified) + generate_containerfile(dependencies, standalone=effective_standalone, unified=unified)

Also applies to: 189-194

cdoern

I might have missed some context here, but IMO changes like this should go into the providers rather than our distro repo so we can use LLS as similarly to upstream as possible. I think trusty has some work here actually: trustyai-explainability/llama-stack-provider-lmeval#58

nathan-weinberg · 2025-09-09T14:16:41Z

README.md

+## Build Modes
+
+The build script supports three modes:
+
+### 1. Full Mode (Default)
+Includes all features including TrustyAI providers that require Kubernetes/OpenShift:
+```bash
+./distribution/build.py
+```
+
+### 2. Standalone Mode
+Builds a version without Kubernetes dependencies, using Llama Guard for safety:
+```bash
+./distribution/build.py --standalone
+```
+
+### 3. Unified Mode (Recommended)
+Builds a single container that supports both modes via environment variables:
+```bash
+./distribution/build.py --unified
+```


Why would we add this level of complexity rather than just always doing a Unified mode?

nathan-weinberg · 2025-09-09T14:17:48Z

README.md

 ### Using Podman build image for x86_64

 ```bash
-podman build --platform linux/amd64 -f distribution/Containerfile -t rh .
+podman build --platform linux/amd64 -f distribution/Containerfile -t llama-stack-rh .
+```
+
+### Using Docker
+
+```bash
+docker build -f distribution/Containerfile -t llama-stack-rh .
+```


It would make more sense to unified these sections - we don't need separate ones for basically the same command. I've noticed the Docker command has no arg for the platform arch?

nathan-weinberg · 2025-09-09T14:19:57Z

distribution/build.yaml

-    - provider_type: remote::vllm
-    - provider_type: inline::sentence-transformers
+    - "remote::vllm"
+    - "inline::sentence-transformers"


@leseb @cdoern is this valid?

this used to be but not anymore

nathan-weinberg · 2025-09-09T14:23:29Z

distribution/entrypoint.sh

+    # Filter out TrustyAI providers from providers.d directory
+    echo "Filtering out TrustyAI providers for standalone mode..."
+    mkdir -p ${HOME}/.llama/providers.d
+
+    # Copy only non-TrustyAI providers
+    find /opt/app-root/.llama/providers.d -name "*.yaml" ! -name "*trustyai*" -exec cp {} ${HOME}/.llama/providers.d/ \; 2>/dev/null || true
+
+    # Remove the external_providers_dir from the config to prevent loading TrustyAI providers
+    echo "Disabling external providers directory for standalone mode..."
+    sed -i 's|external_providers_dir:.*|# external_providers_dir: disabled for standalone mode|' "$CONFIG_FILE"
+
+    echo "✓ Standalone configuration ready"
+    echo "✓ TrustyAI providers excluded"
+else


This script seems like it will break as soon as we have a non-TrustyAI provider that requires Kubernetes?

nathan-weinberg · 2025-09-09T14:24:37Z

distribution/Containerfile

@@ -1,5 +1,5 @@
 # WARNING: This file is auto-generated. Do not modify it manually.
-# Generated by: distribution/build.py
+# Generated by: distribution/build.py --unified


How do we plan to stop this changing every time someone builds a different Container image locally?

nathan-weinberg · 2025-09-11T15:23:15Z

Note that as of #16 the container can be run outside of a Kubernetes env, thanks to a fix made in the TrustyAI LM Eval provider

Available here if you want to try: https://quay.io/repository/opendatahub/llama-stack

leseb

sooooo many changes just to exclude trustyai from the distro? Did I get it right?

nathan-weinberg · 2025-09-15T12:40:58Z

We haven't heard on the PR or Slack so I'm going to close this for now, but feel free to reopen it you feel it is still relevant

gallettilance requested review from Elbehery, VaishnaviHire, cdoern, derekhiggins, leseb, nathan-weinberg, rhdedgar, rhuss and skamenan7 as code owners September 9, 2025 12:23

gallettilance mentioned this pull request Sep 9, 2025

LCORE-651: using rhoai lls image lightspeed-core/lightspeed-stack#517

Merged

(feat) rhoai-lls does not require kubernetes to run

ac1e39e

gallettilance force-pushed the standalone branch from cf91259 to ac1e39e Compare September 9, 2025 12:34

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

cdoern reviewed Sep 9, 2025

View reviewed changes

nathan-weinberg requested changes Sep 9, 2025

View reviewed changes

leseb requested changes Sep 12, 2025

View reviewed changes

nathan-weinberg closed this Sep 15, 2025

(feat) rhoai-lls does not require kubernetes to run #22

(feat) rhoai-lls does not require kubernetes to run #22

Uh oh!

Conversation

gallettilance commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (1 passed, 2 warnings)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

cdoern left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

leseb Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

leseb Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg commented Sep 11, 2025

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gallettilance commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

cdoern left a comment •

edited

Loading