Skip to content

Conversation

@gallettilance
Copy link

@gallettilance gallettilance commented Sep 9, 2025

What does this PR do?

Test Plan

Summary by CodeRabbit

  • New Features

    • Added three build modes: Full (default), Standalone (no Kubernetes), and Unified.
    • New entrypoint chooses the appropriate configuration at startup and launches the service.
    • Standalone images/configs enable local runs with simple Docker/Podman commands on port 8321.
  • Documentation

    • Added Build Modes guide with examples and updated build/run instructions and image tagging (llama-stack-rh).
    • Added Docker build example.
  • Chores

    • Container build now produces mode-aware images and includes additional runtime dependencies.

@coderabbitai
Copy link

coderabbitai bot commented Sep 9, 2025

Walkthrough

Adds build-mode selection (full, standalone, unified) to the build script and templates; introduces a standalone build spec and runtime config; Containerfile generation now emits a mode banner and uses an entrypoint script that chooses config and filters providers at runtime; README updated with build/run examples.

Changes

Cohort / File(s) Summary
Documentation & Instructions
README.md
Adds Build Modes section; updates Containerfile generation guidance to ./distribution/build.py [--standalone] [--unified]; changes example image tag to llama-stack-rh; adds Docker build/run examples for standalone and full modes.
Build Orchestration
distribution/build.py, distribution/build.yaml, distribution/build-standalone.yaml
build.py: adds CLI flags and env fallbacks for modes, modifies get_dependencies and generate_containerfile signatures, writes mode banner to generated Containerfile, prints mode-specific guidance. build.yaml: provider entries normalized to strings. Adds build-standalone.yaml with standalone distribution spec.
Container Image Definitions
distribution/Containerfile, distribution/Containerfile.in
Adds mode banner metadata; copies dual configs (run-full.yaml & run-standalone.yaml); installs additional pip deps; switches ENTRYPOINT from direct Python module to /opt/app-root/entrypoint.sh.
Runtime Entrypoint & Config
distribution/entrypoint.sh, distribution/run-standalone.yaml
Adds entrypoint.sh that selects run-standalone.yaml or run-full.yaml based on STANDALONE, filters/excludes TrustyAI providers in standalone, writes providers into ${HOME}/.llama/providers.d, disables external_providers_dir for standalone, and execs the server. Adds run-standalone.yaml with providers, models, sqlite stores, server port 8321, and tool group mappings.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant Build as distribution/build.py
  participant Tpl as distribution/Containerfile.in
  participant CF as distribution/Containerfile
  participant Image as Container Image
  participant Entryp as /opt/app-root/entrypoint.sh
  participant Server as Llama Server

  Dev->>Build: ./distribution/build.py [--standalone|--unified]
  Build->>Build: parse args & env (STANDALONE/UNIFIED)
  alt standalone
    Build->>Build: load distribution/build-standalone.yaml
  else full
    Build->>Build: load distribution/build.yaml
  end
  Build->>Tpl: render template with deps + mode banner
  Tpl-->>CF: write generated Containerfile
  Build->>Dev: print image build/run guidance (docker/podman)

  Dev->>Image: docker build -f distribution/Containerfile -t llama-stack-rh .
  Dev->>Image: docker run -e STANDALONE=true -e VLLM_URL=... -e INFERENCE_MODEL=... -p 8321:8321 llama-stack-rh

  Image->>Entryp: ENTRYPOINT /opt/app-root/entrypoint.sh
  Entryp->>Entryp: if STANDALONE=true -> use run-standalone.yaml, filter TrustyAI, disable external_providers_dir
  Entryp->>Entryp: else -> use run-full.yaml, copy all providers
  Entryp->>Server: exec python -m llama_stack.core.server.server "$CONFIG_FILE"
  Server-->>Dev: APIs available on :8321
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description currently contains only placeholder template sections with no actual summary of changes or test plan details, making it uninformative and unrelated to the modifications in build scripts, documentation, and container configuration. Please update the description to include a concise summary of the implemented standalone and unified build modes, document the changes to the Containerfile and entrypoint script, and provide clear test instructions with example commands and expected outcomes.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title indicates the system no longer requires Kubernetes to run, which aligns with the introduction of a standalone mode implemented across the build script, container configuration, and entrypoint changes. It refers to a real aspect of the change by highlighting non-Kubernetes operation. However, it omits mention of the new unified build mode and uses terminology that may differ from the repository name. Nonetheless, it gives a clear indication of a key new capability and is sufficiently related to the changeset to pass this check.

Poem

In a burrow of builds I hop and mend,
Flags at my whiskers: standalone or blend.
An entrypoint nibble, providers pruned,
Containers built and ports attuned.
Pack the image — off it goes, my friend. 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
distribution/build.py (3)

66-71: Avoid shell=True and pass argv list to subprocess.run.

Prevents quoting/escaping issues and command injection risks; also removes reliance on the shell.

-    cmd = f"llama stack build --config {config_file} --print-deps-only"
-    try:
-        result = subprocess.run(
-            cmd, shell=True, capture_output=True, text=True, check=True
-        )
+    cmd = ["llama", "stack", "build", "--config", config_file, "--print-deps-only"]
+    try:
+        result = subprocess.run(
+            cmd, capture_output=True, text=True, check=True
+        )

77-111: Do not sort/deduplicate or re-order pip args; it breaks flag–value pairing (e.g., --index-url URL, -r requirements.txt) and may fail builds. Keep uv and original token order.

The current logic splits, sorts, and rejoins tokens, which can separate flags from their required values and alter install order. Also, replacing “uv ” with “RUN ” drops uv entirely.

-        # Categorize and sort different types of pip install commands
-        standard_deps = []
-        torch_deps = []
-        no_deps = []
-        no_cache = []
+        # Preserve original order and semantics; just convert to Docker RUN lines
+        converted_lines = []
@@
-        for line in result.stdout.splitlines():
-            if line.strip().startswith("uv pip"):
-                # Split the line into command and packages
-                parts = line.replace("uv ", "RUN ", 1).split(" ", 3)
-                if len(parts) >= 4:  # We have packages to sort
-                    cmd_parts = parts[:3]  # "RUN pip install"
-                    packages = sorted(
-                        set(parts[3].split())
-                    )  # Sort the package names and remove duplicates
-
-                    # Determine command type and format accordingly
-                    if "--index-url" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        torch_deps.append(full_cmd)
-                    elif "--no-deps" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        no_deps.append(full_cmd)
-                    elif "--no-cache" in line:
-                        full_cmd = " ".join(cmd_parts + [" ".join(packages)])
-                        no_cache.append(full_cmd)
-                    else:
-                        formatted_packages = " \\\n    ".join(packages)
-                        full_cmd = f"{' '.join(cmd_parts)} \\\n    {formatted_packages}"
-                        standard_deps.append(full_cmd)
-                else:
-                    standard_deps.append(" ".join(parts))
+        for raw in result.stdout.splitlines():
+            line = raw.strip()
+            if not line:
+                continue
+            if line.startswith("uv pip"):
+                # Keep uv and entire arg string intact
+                converted_lines.append(f"RUN {line}")
+            else:
+                # Pass-through anything unexpected (future-proofing)
+                converted_lines.append(f"RUN {line}")
@@
-        # Combine all dependencies in specific order
-        all_deps = []
-        all_deps.extend(sorted(standard_deps))  # Regular pip installs first
-        all_deps.extend(sorted(torch_deps))  # PyTorch specific installs
-        all_deps.extend(sorted(no_deps))  # No-deps installs
-        all_deps.extend(sorted(no_cache))  # No-cache installs
-
-        return "\n".join(all_deps)
+        return "\n".join(converted_lines)

33-41: Version check: avoid shell=True and parse the version robustly.

Some CLIs print labels (e.g., “llama-stack 0.2.18”); extract the semver to avoid false mismatches.

-        result = subprocess.run(
-            ["llama stack --version"],
-            shell=True,
-            capture_output=True,
-            text=True,
-            check=True,
-        )
-        installed_version = result.stdout.strip()
+        result = subprocess.run(
+            ["llama", "stack", "--version"],
+            capture_output=True,
+            text=True,
+            check=True,
+        )
+        installed_text = result.stdout.strip()
+        import re
+        m = re.search(r"\b(\d+\.\d+\.\d+)\b", installed_text)
+        installed_version = m.group(1) if m else installed_text
🧹 Nitpick comments (2)
distribution/build.py (2)

49-56: Improve mismatch guidance and make pre-commit reference optional.

Avoid pointing to .pre-commit-config.yaml unless it’s guaranteed to exist; suggest both files to update.

-            print(
-                "  If you just bumped the llama-stack version in BASE_REQUIREMENTS, you must update the version from .pre-commit-config.yaml"
-            )
+            print("  If you bumped llama-stack in BASE_REQUIREMENTS, update any pinned versions in tooling (e.g., pre-commit hooks) to match.")

8-9: Document flag/env precedence and conflict behavior in usage.

Clarify that UNIFIED takes precedence over STANDALONE, and note example where both are set.

-# Usage: ./distribution/build.py [--standalone] [--unified]
-# Or set STANDALONE=true or UNIFIED=true environment variables
+# Usage: ./distribution/build.py [--standalone] [--unified]
+# Or set STANDALONE=true or UNIFIED=true (UNIFIED takes precedence if both are set)
@@
-        epilog="""
+        epilog="""
 Examples:
   %(prog)s                    # Build full version (default)
   %(prog)s --standalone       # Build standalone version (no Kubernetes deps)
   %(prog)s --unified          # Build unified version (supports both modes)
   STANDALONE=true %(prog)s    # Build standalone via environment variable
-  UNIFIED=true %(prog)s       # Build unified via environment variable
+  UNIFIED=true %(prog)s       # Build unified via environment variable
+  STANDALONE=true UNIFIED=true %(prog)s  # Unified takes precedence
 """

Also applies to: 154-170

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf91259 and ac1e39e.

📒 Files selected for processing (8)
  • README.md (2 hunks)
  • distribution/Containerfile (3 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build-standalone.yaml (1 hunks)
  • distribution/build.py (5 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/entrypoint.sh (1 hunks)
  • distribution/run-standalone.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • README.md
  • distribution/run-standalone.yaml
  • distribution/entrypoint.sh
  • distribution/Containerfile
  • distribution/Containerfile.in
  • distribution/build.yaml
  • distribution/build-standalone.yaml
🔇 Additional comments (2)
distribution/build.py (2)

129-145: Template formatting: verify braces in Containerfile.in.

str.format will treat single braces as placeholders. If the template has shell braces (e.g., ${VAR} or {something} in comments), ensure they’re doubled {{ }} or switch to string.Template to avoid accidental KeyError.

Would you like me to switch this to string.Template to avoid brace escaping?


196-210: Runtime hints: add explicit unified-mode env for clarity.

If the entrypoint supports LLAMA_STACK_UNIFIED or similar, include it; otherwise, consider adding a short note that STANDALONE=false (default) selects full mode.

If helpful, I can align these run examples with entrypoint.sh semantics once confirmed.

Comment on lines +173 to +181
standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")

if unified:
mode = "unified"
print("Building unified version (supports both full and standalone modes)...")
else:
mode = "standalone" if standalone else "full"
print(f"Building {mode} version...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

UNIFIED overrides STANDALONE but dependencies still built with standalone=True when both are set. Fix precedence and call get_dependencies with the effective mode.

Currently, UNIFIED messaging is shown while using standalone deps if both envs/flags are true.

-    standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
-    unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")
+    standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
+    unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")
+    # Precedence: UNIFIED implies using full dependency set
+    effective_standalone = False if unified else standalone
@@
-    if unified:
+    if unified:
         mode = "unified"
         print("Building unified version (supports both full and standalone modes)...")
     else:
-        mode = "standalone" if standalone else "full"
+        mode = "standalone" if standalone else "full"
         print(f"Building {mode} version...")
@@
-    dependencies = get_dependencies(standalone)
+    dependencies = get_dependencies(effective_standalone)
@@
-    generate_containerfile(dependencies, standalone, unified)
+    generate_containerfile(dependencies, standalone=effective_standalone, unified=unified)

Also applies to: 189-194

Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed some context here, but IMO changes like this should go into the providers rather than our distro repo so we can use LLS as similarly to upstream as possible. I think trusty has some work here actually: trustyai-explainability/llama-stack-provider-lmeval#58

Comment on lines +11 to +31
## Build Modes

The build script supports three modes:

### 1. Full Mode (Default)
Includes all features including TrustyAI providers that require Kubernetes/OpenShift:
```bash
./distribution/build.py
```

### 2. Standalone Mode
Builds a version without Kubernetes dependencies, using Llama Guard for safety:
```bash
./distribution/build.py --standalone
```

### 3. Unified Mode (Recommended)
Builds a single container that supports both modes via environment variables:
```bash
./distribution/build.py --unified
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we add this level of complexity rather than just always doing a Unified mode?

Comment on lines 57 to +67
### Using Podman build image for x86_64

```bash
podman build --platform linux/amd64 -f distribution/Containerfile -t rh .
podman build --platform linux/amd64 -f distribution/Containerfile -t llama-stack-rh .
```

### Using Docker

```bash
docker build -f distribution/Containerfile -t llama-stack-rh .
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense to unified these sections - we don't need separate ones for basically the same command. I've noticed the Docker command has no arg for the platform arch?

Comment on lines -6 to +7
- provider_type: remote::vllm
- provider_type: inline::sentence-transformers
- "remote::vllm"
- "inline::sentence-transformers"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leseb @cdoern is this valid?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this used to be but not anymore

Comment on lines +17 to +30
# Filter out TrustyAI providers from providers.d directory
echo "Filtering out TrustyAI providers for standalone mode..."
mkdir -p ${HOME}/.llama/providers.d

# Copy only non-TrustyAI providers
find /opt/app-root/.llama/providers.d -name "*.yaml" ! -name "*trustyai*" -exec cp {} ${HOME}/.llama/providers.d/ \; 2>/dev/null || true

# Remove the external_providers_dir from the config to prevent loading TrustyAI providers
echo "Disabling external providers directory for standalone mode..."
sed -i 's|external_providers_dir:.*|# external_providers_dir: disabled for standalone mode|' "$CONFIG_FILE"

echo "✓ Standalone configuration ready"
echo "✓ TrustyAI providers excluded"
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script seems like it will break as soon as we have a non-TrustyAI provider that requires Kubernetes?

@@ -1,5 +1,5 @@
# WARNING: This file is auto-generated. Do not modify it manually.
# Generated by: distribution/build.py
# Generated by: distribution/build.py --unified
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we plan to stop this changing every time someone builds a different Container image locally?

@nathan-weinberg
Copy link
Collaborator

Note that as of #16 the container can be run outside of a Kubernetes env, thanks to a fix made in the TrustyAI LM Eval provider

Available here if you want to try: https://quay.io/repository/opendatahub/llama-stack

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sooooo many changes just to exclude trustyai from the distro? Did I get it right?

@nathan-weinberg
Copy link
Collaborator

We haven't heard on the PR or Slack so I'm going to close this for now, but feel free to reopen it you feel it is still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants