-
Notifications
You must be signed in to change notification settings - Fork 270
Add support for llama-stack #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR adds optional llama-stack API support via a new Sequence diagram for serve command with llama-stacksequenceDiagram
actor User
participant CLI
participant Stack
participant Podman
User->>CLI: ramalama serve <MODEL> --api llama-stack --port <host_port>
CLI->>CLI: Parse arguments
CLI->>CLI: args.port = compute_serving_port(args)
CLI->>Stack: new Stack(args)
Stack-->>CLI: stack instance
CLI->>Stack: serve()
Stack->>Stack: generate() YAML for Podman Kube
Stack->>Podman: podman kube play --replace <generated_yaml_file>
Podman-->>Stack: Deployment status
Stack-->>CLI: Status
CLI->>User: Output (LamaStack & OpenAI API URLs)
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rhatdan - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
-
Invalid
raise ()syntax (link) -
Undefined
self.enginein Stack.Serve (link) -
Replace the bare
raise ()in serve_cli with a proper exception type and descriptive message when--containeris missing. -
The new
container_connectionfunction in engine.py isn’t referenced anywhere—consider integrating it where needed or removing it to avoid dead code. -
Manually concatenating YAML strings in
Stack.generateis brittle; consider using a YAML library or templating for more robust and maintainable serialization.
Here's what I looked at during the review
- 🔴 General issues: 2 blocking issues, 7 other issues
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| raise ValueError("no container manager (Podman, Docker) found") | ||
|
|
||
| conman_args = [conman, "port", name, port] | ||
| output = run_cmd(conman_args, debug=args.debug).stdout.decode("utf-8").strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Parsing docker port output by splitting on '>' is brittle
This breaks if Docker/Podman returns multiple mappings or uses a different format. Use a regex or docker port --format structured output for more reliable parsing.
ramalama/engine.py
Outdated
| def add_port_option(self): | ||
| if hasattr(self.args, "port"): | ||
| self.exec_args += ["-p", f"{self.args.port}:{self.args.port}"] | ||
| if hasattr(self.args, "port") and self.args.port != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Port handling may error if args.port is non-string or None
Avoid .count() on None or ints by casting args.port to str or checking truthiness (e.g., if self.args.port:) first.
ramalama/cli.py
Outdated
|
|
||
| if args.api == "llama-stack": | ||
| if not args.container: | ||
| raise () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Invalid raise () syntax
Replace raise () with a specific exception, e.g.:
raise ValueError('args.container is required')
ramalama/stack.py
Outdated
| volumes:{volumes}""" | ||
| return self.stack_yaml | ||
|
|
||
| def Serve(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Undefined self.engine in Stack.Serve
self.engine isn't initialized, so calling self.engine.run() will raise an AttributeError. Inject or instantiate an engine, or remove this call.
| self.model_port = str(int(self.args.port) + 1) | ||
| self.stack_image = tagged_image("quay.io/ramalama/llama-stack") | ||
|
|
||
| def generate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Generated YAML retains unintended indentation
Use textwrap.dedent or construct the YAML string without indentation to avoid extra leading spaces from the triple-quoted string.
| seLinuxOptions: | ||
| type: spc_t""" | ||
|
|
||
| self.stack_yaml = f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider building the YAML manifest using Python dicts and yaml.safe_dump() instead of string concatenation for clarity and maintainability.
Consider replacing all the raw‐string/YAML concatenation with a Python dict + `yaml.safe_dump()`. This keeps your intent explicit, eliminates all the nesting/quotes, and is much easier to maintain.
Example:
```python
import yaml
from ramalama.common import tagged_image
from ramalama.model_factory import ModelFactory, New
class Stack:
def __init__(self, args):
self.args = args
self.host = "127.0.0.1"
mf = ModelFactory(args.MODEL, args)
self.model = mf.prune_model_input()
nw = New(args.MODEL, args)
self.model_type = nw.type
self.model_path = nw.get_model_path(args)
self.model_port = str(int(args.port) + 1)
self.stack_image = tagged_image("quay.io/ramalama/llama-stack")
def generate(self):
# build lists/dicts instead of strings
volume_mounts = [
{'mountPath': '/mnt/models/model.file', 'name': 'model'},
{'mountPath': '/dev/dri', 'name': 'dri'},
]
if self.model_type == "OCI":
volume_mounts[0] = {
'mountPath': '/mnt/models',
'subPath': '/models',
'name': 'model'
}
volumes = [
{'hostPath': {'path': self.model_path}, 'name': 'model'},
{'hostPath': {'path': '/dev/dri'}, 'name': 'dri'},
]
llama_cmd = [
"llama-server", "--port", self.model_port,
"--model", "/mnt/models/model.file",
"--alias", self.model, "--ctx-size", self.args.context,
"--temp", self.args.temp, "--jinja", "--cache-reuse", "256",
"-v", "--threads", self.args.threads,
"--host", self.host,
]
security_ctx = {
'allowPrivilegeEscalation': False,
'capabilities': {
'drop': [
"CAP_CHOWN", "CAP_FOWNER", "CAP_FSETID",
"CAP_KILL", "CAP_NET_BIND_SERVICE",
"CAP_SETFCAP", "CAP_SETGID",
"CAP_SETPCAP", "CAP_SETUID",
"CAP_SYS_CHROOT"
],
'add': ["CAP_DAC_OVERRIDE"]
},
'seLinuxOptions': {'type': 'spc_t'}
}
deployment = {
'apiVersion': 'v1',
'kind': 'Deployment',
'metadata': {
'name': self.args.name,
'labels': {'app': self.args.name}
},
'spec': {
'replicas': 1,
'selector': {'matchLabels': {'app': self.args.name}},
'template': {
'metadata': {'labels': {'app': self.args.name}},
'spec': {
'containers': [
{
'name': 'model-server',
'image': self.args.image,
'command': ["/usr/libexec/ramalama/ramalama-serve-core"],
'args': llama_cmd,
'securityContext': security_ctx,
'volumeMounts': volume_mounts,
},
{
'name': 'llama-stack',
'image': self.stack_image,
'args': ['/bin/sh', '-c', 'llama stack run --image-type venv /etc/ramalama/ramalama-run.yaml'],
'env': [
{'name': 'RAMALAMA_URL', 'value': f"http://127.0.0.1:{self.model_port}"},
{'name': 'INFERENCE_MODEL', 'value': self.model},
],
'securityContext': security_ctx,
'ports': [{'containerPort': 8321, 'hostPort': self.args.port}],
}
],
'volumes': volumes
}
}
}
}
return yaml.safe_dump(deployment, sort_keys=False)Benefits:
- No more fragile indentation/f‐strings
- Clear data model of your Kubernetes manifest
- Easy to tweak/add/remove fields
- Full fidelity:
yaml.safe_dump()produces valid YAML under the hood
| except Exception: | ||
| raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error)
| except Exception: | |
| raise e | |
| except Exception as exc: | |
| raise e from exc |
cd15235 to
609b47e
Compare
e635313 to
517a7c9
Compare
|
@sourcery-ai Please review my PR |
|
Sure! I'm generating a new review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rhatdan - I've reviewed your changes and they look great!
Blocking issues:
- file object opened without corresponding close (link)
- Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'. (link)
- Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'. (link)
- file object opened without corresponding close (link)
- Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'. (link)
- Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'. (link)
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🔴 Security: 6 blocking issues
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| cleanlabel = label.replace("=", ": ", 1) | ||
| self.labels = f"{self.labels}\n {cleanlabel}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Label formatting replaces '=' with ': ', which may not be YAML-safe for all label values.
If label values include special characters or newlines, the output may be invalid YAML. Consider using a YAML library or properly escaping values.
| if self.args.dryrun: | ||
| print(yaml) | ||
| return | ||
| yaml_file = tempfile.NamedTemporaryFile(prefix='RamaLama_', delete=not self.args.debug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (open-never-closed): file object opened without corresponding close
Source: opengrep
| print(yaml) | ||
| return | ||
| yaml_file = tempfile.NamedTemporaryFile(prefix='RamaLama_', delete=not self.args.debug) | ||
| with open(yaml_file.name, 'w') as c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (tempfile-without-flush): Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'.
Source: opengrep
ramalama/stack.py
Outdated
| "kube", | ||
| "play", | ||
| "--replace", | ||
| yaml_file.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (tempfile-without-flush): Using 'yaml_file.name' without '.flush()' or '.close()' may cause an error because the file may not exist when 'yaml_file.name' is used. Use '.flush()' or close the file before using 'yaml_file.name'.
Source: opengrep
| exec_cmd(exec_args, debug=self.args.debug) | ||
|
|
||
| def stop(self): | ||
| yaml_file = tempfile.NamedTemporaryFile(prefix='RamaLama_', delete=not self.args.debug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (open-never-closed): file object opened without corresponding close
Source: opengrep
| data = { | ||
| "stream": True, | ||
| "messages": self.conversation_history, | ||
| "model": self.models[0], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)
ramalama/engine.py
Outdated
| value = getattr(args, arg) | ||
| if value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)
| value = getattr(args, arg) | |
| if value: | |
| if value := getattr(args, arg): |
ramalama/model.py
Outdated
|
|
||
| # otherwise compute a random serving port in the range | ||
| target_port = get_available_port_if_any(debug) | ||
| if args.port != "" and args.port != str(DEFAULT_PORT): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace multiple comparisons of same variable with in operator (merge-comparisons)
| if args.port != "" and args.port != str(DEFAULT_PORT): | |
| if args.port not in ["", str(DEFAULT_PORT)]: |
| except Exception: | ||
| raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Explicitly raise from a previous error (raise-from-previous-error)
| except Exception: | |
| raise e | |
| except Exception as exc: | |
| raise e from exc |
ramalama/stack.py
Outdated
| if hasattr(args, "name") and args.name: | ||
| self.name = args.name | ||
| else: | ||
| self.name = genname() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace if statement with if expression (assign-if-exp)
| if hasattr(args, "name") and args.name: | |
| self.name = args.name | |
| else: | |
| self.name = genname() | |
| self.name = args.name if hasattr(args, "name") and args.name else genname() |
|
Hey @rhatdan, I've posted a new review for you! |
f0a4c73 to
62610b7
Compare
33c8488 to
48fa9ed
Compare
|
Looks good so far! Seems a test is failing, saying no connection I wonder if that has to do with the endpoints or fixing the test maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a --api flag to enable a unified API layer, implements a new llama-stack mode with Kubernetes pod generation, and refactors port computation and label handling.
- Introduce
--apioption and default in configuration - Implement
Stackclass to generate/deploy llama-stack Kubernetes YAML - Refactor
compute_serving_portto acceptargsand print REST API endpoints
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/test_model.py | Updated tests for new compute_serving_port(args, …) signature |
| test/unit/test_engine.py | Removed outdated stop_container import and test |
| test/unit/test_config.py | Added default and overridden api value to config tests |
| test/system/040-serve.bats | Removed assertions on old port output in serve tests |
| ramalama/stack.py | New Stack class to support llama-stack API mode |
| ramalama/shortnames.py | Flush file after writing shortnames |
| ramalama/rag.py | Flush file after writing containerfile in RAG build |
| ramalama/oci.py | Flush file after writing containerfile in OCI build |
| ramalama/model_factory.py | Add New and Serve helpers for model instantiation |
| ramalama/model.py | Refactor compute_serving_port to use args and handle api |
| ramalama/engine.py | Generalize label handling and add container helpers |
| ramalama/config.py | Set default api to none |
| ramalama/cli.py | Register --api option and route llama-stack serve |
| libexec/ramalama/ramalama-client-core | Revamped interactive shell for API-driven client |
| docs/ramalama.conf.5.md | Document api setting in manpage |
| docs/ramalama.conf | Document api setting in sample config |
| docs/ramalama-serve.1.md | Document --api in serve CLI manpage |
| docs/ramalama-run.1.md | Document --api in run CLI manpage |
Comments suppressed due to low confidence (2)
test/unit/test_model.py:118
- The
mock_compute_portsis defined but never applied tocompute_serving_port; you may need to patchget_available_port_if_anyor remove the unused mock.
mock_compute_ports = Mock(return_value=expectedRandomizedResult)
test/system/040-serve.bats:13
- Existing tests for port output were removed but should be updated to assert the new REST API endpoint messages printed by
compute_serving_port.
assert "$output" =~ "serving on port .*"
| return res(response, self.parsed_args.color) | ||
|
|
||
| print(f"\rError: could not connect to: {self.url}", file=sys.stderr) | ||
| self.kills(self.parsed_args) |
Copilot
AI
May 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method kills is defined without parameters but is called with self.parsed_args, leading to a TypeError. Call self.kills() instead.
| self.kills(self.parsed_args) | |
| self.kills() |
| seLinuxOptions: | ||
| type: spc_t""" | ||
|
|
||
| self.stack_yaml = f""" |
Copilot
AI
May 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Building large YAML via f-strings with embedded indentation is error-prone. Consider using a YAML templating library or a structured builder for readability and reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot could you rewrite this section to use YAML templating library?
103ac92 to
fd311d1
Compare
|
@ericcurtin @bmahabirbu @nathan-weinberg @cdoern This is ready to go in. PTAL |
2ee5ae9 to
b3c822f
Compare
Add new option --api which allows users to specify the API Server either llama-stack or none. With None, we just generate a service with serve command. With `--api llama-stack`, RamaLama will generate an API Server listening on port 8321 and a openai server listening on port 8080. Signed-off-by: Daniel J Walsh <[email protected]>
Add new option --api which allows users to specify the API Server either llama-stack or none. With None, we just generate a service with serve command. With
--api llama-stack, RamaLama will generate an API Server listening on port 8321 and a openai server listening on port 8080.Summary by Sourcery
Add support for a unified API layer with a new --api option, implement llama-stack mode via a Stack class that generates and deploys a Kubernetes pod stack, refactor engine command helpers and label handling, update compute_serving_port and model_factory helpers, and refresh documentation and tests accordingly
New Features:
Enhancements:
Documentation:
Tests: