-
Notifications
You must be signed in to change notification settings - Fork 17
(feat) rhoai-lls does not require kubernetes to run #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds build-mode selection (full, standalone, unified) to the build script and templates; introduces a standalone build spec and runtime config; Containerfile generation now emits a mode banner and uses an entrypoint script that chooses config and filters providers at runtime; README updated with build/run examples. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer
participant Build as distribution/build.py
participant Tpl as distribution/Containerfile.in
participant CF as distribution/Containerfile
participant Image as Container Image
participant Entryp as /opt/app-root/entrypoint.sh
participant Server as Llama Server
Dev->>Build: ./distribution/build.py [--standalone|--unified]
Build->>Build: parse args & env (STANDALONE/UNIFIED)
alt standalone
Build->>Build: load distribution/build-standalone.yaml
else full
Build->>Build: load distribution/build.yaml
end
Build->>Tpl: render template with deps + mode banner
Tpl-->>CF: write generated Containerfile
Build->>Dev: print image build/run guidance (docker/podman)
Dev->>Image: docker build -f distribution/Containerfile -t llama-stack-rh .
Dev->>Image: docker run -e STANDALONE=true -e VLLM_URL=... -e INFERENCE_MODEL=... -p 8321:8321 llama-stack-rh
Image->>Entryp: ENTRYPOINT /opt/app-root/entrypoint.sh
Entryp->>Entryp: if STANDALONE=true -> use run-standalone.yaml, filter TrustyAI, disable external_providers_dir
Entryp->>Entryp: else -> use run-full.yaml, copy all providers
Entryp->>Server: exec python -m llama_stack.core.server.server "$CONFIG_FILE"
Server-->>Dev: APIs available on :8321
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
cf91259 to
ac1e39e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
distribution/build.py (3)
66-71: Avoid shell=True and pass argv list to subprocess.run.Prevents quoting/escaping issues and command injection risks; also removes reliance on the shell.
- cmd = f"llama stack build --config {config_file} --print-deps-only" - try: - result = subprocess.run( - cmd, shell=True, capture_output=True, text=True, check=True - ) + cmd = ["llama", "stack", "build", "--config", config_file, "--print-deps-only"] + try: + result = subprocess.run( + cmd, capture_output=True, text=True, check=True + )
77-111: Do not sort/deduplicate or re-order pip args; it breaks flag–value pairing (e.g., --index-url URL, -r requirements.txt) and may fail builds. Keep uv and original token order.The current logic splits, sorts, and rejoins tokens, which can separate flags from their required values and alter install order. Also, replacing “uv ” with “RUN ” drops uv entirely.
- # Categorize and sort different types of pip install commands - standard_deps = [] - torch_deps = [] - no_deps = [] - no_cache = [] + # Preserve original order and semantics; just convert to Docker RUN lines + converted_lines = [] @@ - for line in result.stdout.splitlines(): - if line.strip().startswith("uv pip"): - # Split the line into command and packages - parts = line.replace("uv ", "RUN ", 1).split(" ", 3) - if len(parts) >= 4: # We have packages to sort - cmd_parts = parts[:3] # "RUN pip install" - packages = sorted( - set(parts[3].split()) - ) # Sort the package names and remove duplicates - - # Determine command type and format accordingly - if "--index-url" in line: - full_cmd = " ".join(cmd_parts + [" ".join(packages)]) - torch_deps.append(full_cmd) - elif "--no-deps" in line: - full_cmd = " ".join(cmd_parts + [" ".join(packages)]) - no_deps.append(full_cmd) - elif "--no-cache" in line: - full_cmd = " ".join(cmd_parts + [" ".join(packages)]) - no_cache.append(full_cmd) - else: - formatted_packages = " \\\n ".join(packages) - full_cmd = f"{' '.join(cmd_parts)} \\\n {formatted_packages}" - standard_deps.append(full_cmd) - else: - standard_deps.append(" ".join(parts)) + for raw in result.stdout.splitlines(): + line = raw.strip() + if not line: + continue + if line.startswith("uv pip"): + # Keep uv and entire arg string intact + converted_lines.append(f"RUN {line}") + else: + # Pass-through anything unexpected (future-proofing) + converted_lines.append(f"RUN {line}") @@ - # Combine all dependencies in specific order - all_deps = [] - all_deps.extend(sorted(standard_deps)) # Regular pip installs first - all_deps.extend(sorted(torch_deps)) # PyTorch specific installs - all_deps.extend(sorted(no_deps)) # No-deps installs - all_deps.extend(sorted(no_cache)) # No-cache installs - - return "\n".join(all_deps) + return "\n".join(converted_lines)
33-41: Version check: avoid shell=True and parse the version robustly.Some CLIs print labels (e.g., “llama-stack 0.2.18”); extract the semver to avoid false mismatches.
- result = subprocess.run( - ["llama stack --version"], - shell=True, - capture_output=True, - text=True, - check=True, - ) - installed_version = result.stdout.strip() + result = subprocess.run( + ["llama", "stack", "--version"], + capture_output=True, + text=True, + check=True, + ) + installed_text = result.stdout.strip() + import re + m = re.search(r"\b(\d+\.\d+\.\d+)\b", installed_text) + installed_version = m.group(1) if m else installed_text
🧹 Nitpick comments (2)
distribution/build.py (2)
49-56: Improve mismatch guidance and make pre-commit reference optional.Avoid pointing to .pre-commit-config.yaml unless it’s guaranteed to exist; suggest both files to update.
- print( - " If you just bumped the llama-stack version in BASE_REQUIREMENTS, you must update the version from .pre-commit-config.yaml" - ) + print(" If you bumped llama-stack in BASE_REQUIREMENTS, update any pinned versions in tooling (e.g., pre-commit hooks) to match.")
8-9: Document flag/env precedence and conflict behavior in usage.Clarify that UNIFIED takes precedence over STANDALONE, and note example where both are set.
-# Usage: ./distribution/build.py [--standalone] [--unified] -# Or set STANDALONE=true or UNIFIED=true environment variables +# Usage: ./distribution/build.py [--standalone] [--unified] +# Or set STANDALONE=true or UNIFIED=true (UNIFIED takes precedence if both are set) @@ - epilog=""" + epilog=""" Examples: %(prog)s # Build full version (default) %(prog)s --standalone # Build standalone version (no Kubernetes deps) %(prog)s --unified # Build unified version (supports both modes) STANDALONE=true %(prog)s # Build standalone via environment variable - UNIFIED=true %(prog)s # Build unified via environment variable + UNIFIED=true %(prog)s # Build unified via environment variable + STANDALONE=true UNIFIED=true %(prog)s # Unified takes precedence """Also applies to: 154-170
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
README.md(2 hunks)distribution/Containerfile(3 hunks)distribution/Containerfile.in(1 hunks)distribution/build-standalone.yaml(1 hunks)distribution/build.py(5 hunks)distribution/build.yaml(1 hunks)distribution/entrypoint.sh(1 hunks)distribution/run-standalone.yaml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- README.md
- distribution/run-standalone.yaml
- distribution/entrypoint.sh
- distribution/Containerfile
- distribution/Containerfile.in
- distribution/build.yaml
- distribution/build-standalone.yaml
🔇 Additional comments (2)
distribution/build.py (2)
129-145: Template formatting: verify braces in Containerfile.in.str.format will treat single braces as placeholders. If the template has shell braces (e.g., ${VAR} or {something} in comments), ensure they’re doubled {{ }} or switch to string.Template to avoid accidental KeyError.
Would you like me to switch this to string.Template to avoid brace escaping?
196-210: Runtime hints: add explicit unified-mode env for clarity.If the entrypoint supports LLAMA_STACK_UNIFIED or similar, include it; otherwise, consider adding a short note that STANDALONE=false (default) selects full mode.
If helpful, I can align these run examples with entrypoint.sh semantics once confirmed.
| standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes") | ||
| unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes") | ||
|
|
||
| if unified: | ||
| mode = "unified" | ||
| print("Building unified version (supports both full and standalone modes)...") | ||
| else: | ||
| mode = "standalone" if standalone else "full" | ||
| print(f"Building {mode} version...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNIFIED overrides STANDALONE but dependencies still built with standalone=True when both are set. Fix precedence and call get_dependencies with the effective mode.
Currently, UNIFIED messaging is shown while using standalone deps if both envs/flags are true.
- standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
- unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")
+ standalone = args.standalone or os.getenv("STANDALONE", "false").lower() in ("true", "1", "yes")
+ unified = args.unified or os.getenv("UNIFIED", "false").lower() in ("true", "1", "yes")
+ # Precedence: UNIFIED implies using full dependency set
+ effective_standalone = False if unified else standalone
@@
- if unified:
+ if unified:
mode = "unified"
print("Building unified version (supports both full and standalone modes)...")
else:
- mode = "standalone" if standalone else "full"
+ mode = "standalone" if standalone else "full"
print(f"Building {mode} version...")
@@
- dependencies = get_dependencies(standalone)
+ dependencies = get_dependencies(effective_standalone)
@@
- generate_containerfile(dependencies, standalone, unified)
+ generate_containerfile(dependencies, standalone=effective_standalone, unified=unified)Also applies to: 189-194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed some context here, but IMO changes like this should go into the providers rather than our distro repo so we can use LLS as similarly to upstream as possible. I think trusty has some work here actually: trustyai-explainability/llama-stack-provider-lmeval#58
| ## Build Modes | ||
|
|
||
| The build script supports three modes: | ||
|
|
||
| ### 1. Full Mode (Default) | ||
| Includes all features including TrustyAI providers that require Kubernetes/OpenShift: | ||
| ```bash | ||
| ./distribution/build.py | ||
| ``` | ||
|
|
||
| ### 2. Standalone Mode | ||
| Builds a version without Kubernetes dependencies, using Llama Guard for safety: | ||
| ```bash | ||
| ./distribution/build.py --standalone | ||
| ``` | ||
|
|
||
| ### 3. Unified Mode (Recommended) | ||
| Builds a single container that supports both modes via environment variables: | ||
| ```bash | ||
| ./distribution/build.py --unified | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we add this level of complexity rather than just always doing a Unified mode?
| ### Using Podman build image for x86_64 | ||
|
|
||
| ```bash | ||
| podman build --platform linux/amd64 -f distribution/Containerfile -t rh . | ||
| podman build --platform linux/amd64 -f distribution/Containerfile -t llama-stack-rh . | ||
| ``` | ||
|
|
||
| ### Using Docker | ||
|
|
||
| ```bash | ||
| docker build -f distribution/Containerfile -t llama-stack-rh . | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense to unified these sections - we don't need separate ones for basically the same command. I've noticed the Docker command has no arg for the platform arch?
| - provider_type: remote::vllm | ||
| - provider_type: inline::sentence-transformers | ||
| - "remote::vllm" | ||
| - "inline::sentence-transformers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this used to be but not anymore
| # Filter out TrustyAI providers from providers.d directory | ||
| echo "Filtering out TrustyAI providers for standalone mode..." | ||
| mkdir -p ${HOME}/.llama/providers.d | ||
|
|
||
| # Copy only non-TrustyAI providers | ||
| find /opt/app-root/.llama/providers.d -name "*.yaml" ! -name "*trustyai*" -exec cp {} ${HOME}/.llama/providers.d/ \; 2>/dev/null || true | ||
|
|
||
| # Remove the external_providers_dir from the config to prevent loading TrustyAI providers | ||
| echo "Disabling external providers directory for standalone mode..." | ||
| sed -i 's|external_providers_dir:.*|# external_providers_dir: disabled for standalone mode|' "$CONFIG_FILE" | ||
|
|
||
| echo "✓ Standalone configuration ready" | ||
| echo "✓ TrustyAI providers excluded" | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script seems like it will break as soon as we have a non-TrustyAI provider that requires Kubernetes?
| @@ -1,5 +1,5 @@ | |||
| # WARNING: This file is auto-generated. Do not modify it manually. | |||
| # Generated by: distribution/build.py | |||
| # Generated by: distribution/build.py --unified | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we plan to stop this changing every time someone builds a different Container image locally?
|
Note that as of #16 the container can be run outside of a Kubernetes env, thanks to a fix made in the TrustyAI LM Eval provider Available here if you want to try: https://quay.io/repository/opendatahub/llama-stack |
leseb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sooooo many changes just to exclude trustyai from the distro? Did I get it right?
|
We haven't heard on the PR or Slack so I'm going to close this for now, but feel free to reopen it you feel it is still relevant |
What does this PR do?
Test Plan
Summary by CodeRabbit
New Features
Documentation
Chores