-
Couldn't load subscription status.
- Fork 270
[WIP] Add support for converting to OCI artifacts #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR extends RamaLama to support OCI artifacts as a new target type by implementing artifact-specific logic across transport and CLI modules, updating configuration and documentation, and adding comprehensive system tests. File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @rhatdan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Refactor repetitive run_cmd command construction in artifact/image operations (push, remove, exists) into shared helper functions to reduce duplication and improve maintainability.
- Standardize the inspect and get_inspect return types (JSON string vs Python data structures) so that printing always produces a consistent and predictable output format.
- Revisit how the CLI --type default is set from CONFIG.convert_type—parsing it at definition time won’t pick up runtime config or environment overrides properly, so consider applying defaults after loading config.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Refactor repetitive run_cmd command construction in artifact/image operations (push, remove, exists) into shared helper functions to reduce duplication and improve maintainability.
- Standardize the inspect and get_inspect return types (JSON string vs Python data structures) so that printing always produces a consistent and predictable output format.
- Revisit how the CLI --type default is set from CONFIG.convert_type—parsing it at definition time won’t pick up runtime config or environment overrides properly, so consider applying defaults after loading config.
## Individual Comments
### Comment 1
<location> `ramalama/transports/oci.py:67-68` </location>
<code_context>
# Keep this in sync with docs/ramalama-oci.5.md !
is_car = args.type == "car"
+ is_raw = args.type == "raw"
+ if not is_car and not is_raw:
+ raise ValueError(f"argument --type: invalid choice: '{args.type}' (choose from artifact, car, raw)")
has_gguf = getattr(args, 'gguf', None) is not None
content = ""
</code_context>
<issue_to_address>
**issue (bug_risk):** The validation for args.type excludes 'artifact', which is now a valid option.
Please update the validation to accept 'artifact' as a valid type to prevent errors for users and ensure consistency with the CLI and config.
</issue_to_address>
### Comment 2
<location> `ramalama/transports/oci.py:175-176` </location>
<code_context>
+ "rm",
+ self.model,
+ ]
+ if ignore:
+ rm_cmd.extend("--ignore")
+ rm_cmd.append(self.model)
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The use of extend with a string will add each character as a separate argument.
Use append for '--ignore' to avoid splitting the string into individual characters in the command arguments.
</issue_to_address>
### Comment 3
<location> `ramalama/transports/oci.py:177` </location>
<code_context>
+ ]
+ if ignore:
+ rm_cmd.extend("--ignore")
+ rm_cmd.append(self.model)
+
+ run_cmd(
</code_context>
<issue_to_address>
**issue (bug_risk):** The model is appended twice to rm_cmd if ignore is True.
Appending the model twice may lead to duplicate arguments and unintended command behavior.
</issue_to_address>
### Comment 4
<location> `ramalama/transports/oci.py:303-300` </location>
<code_context>
- perror(f"Failed to push OCI {target} : {e}")
- raise e
+ try:
+ if args.type != "artifact":
+ perror(f"Pushing artifact {self.model} ...")
+ conman_args.insert(1, "artifact")
+ run_cmd(conman_args)
+ except subprocess.CalledProcessError:
+ perror(f"Failed to push OCI {target} : {e}")
</code_context>
<issue_to_address>
**suggestion:** The error handling in push retries pushing as artifact if the initial push fails and type is not 'artifact'.
Log the original error before retrying, and clarify the fallback logic to ensure maintainers understand when and why the retry occurs.
</issue_to_address>
### Comment 5
<location> `ramalama/oci_tools.py:35` </location>
<code_context>
+ if output == "":
+ return []
+
+ artifacts = json.loads("[" + output[:-1] + "]")
+ models = []
+ for artifact in artifacts:
</code_context>
<issue_to_address>
**issue (bug_risk):** Parsing output with output[:-1] assumes a trailing comma, which may be fragile.
This approach may fail if the output format changes. Use a parsing method that does not rely on a trailing comma to avoid potential errors.
</issue_to_address>
### Comment 6
<location> `ramalama/oci_tools.py:166-167` </location>
<code_context>
+ # drop the id from the model
+ del model["id"]
models += list_manifests(args)
+ models += list_artifacts(args)
for model in models:
# Convert to ISO 8601 format
</code_context>
<issue_to_address>
**suggestion:** The models list is extended with both manifests and artifacts, which may result in duplicate entries if the same model exists in both forms.
Deduplicate models by name or ID to prevent listing the same model multiple times.
Suggested implementation:
```python
# Combine manifests and artifacts, then deduplicate by 'name'
manifest_models = list_manifests(args)
artifact_models = list_artifacts(args)
combined_models = manifest_models + artifact_models
# Deduplicate by 'name'
seen_names = set()
deduped_models = []
for model in combined_models:
model_name = model.get("name")
if model_name and model_name not in seen_names:
deduped_models.append(model)
seen_names.add(model_name)
models += deduped_models
for model in models:
# Convert to ISO 8601 format
```
- If your models use a different unique key (e.g., "id" instead of "name"), replace `"name"` with the appropriate key in the deduplication logic.
- If `models` is not empty before this block, you may want to deduplicate the entire list (including existing entries).
</issue_to_address>
### Comment 7
<location> `ramalama/cli.py:1361-1363` </location>
<code_context>
m = New(model, args)
- m.remove(args)
+ # Don't ignore missing so that we attempt OCI as well.
+ newargs=args
+ newargs.ignore=False
+ m.remove(newargs)
+ continue
except KeyError as e:
</code_context>
<issue_to_address>
**issue (bug_risk):** Reassigning args to newargs and setting ignore=False may have unintended side effects if args is used elsewhere.
To prevent unintended mutations, use a copy of args or a more explicit method when modifying its properties.
</issue_to_address>
### Comment 8
<location> `test/system/056-artifact.bats:377-386` </location>
<code_context>
+@test "ramalama artifact - concurrent operations" {
</code_context>
<issue_to_address>
**suggestion (testing):** Concurrency test for artifact creation and removal.
Consider adding assertions to verify data integrity and detect race conditions, ensuring artifacts remain intact after concurrent creation.
</issue_to_address>
### Comment 9
<location> `ramalama/oci_tools.py:20` </location>
<code_context>
+ return input
+
+
+def list_artifacts(args: EngineArgType):
+ if args.engine == "docker":
+ return []
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring repeated command execution and JSON parsing logic into reusable helper functions for clarity and maintainability.
```suggestion
You can collapse most of the “run‐cmd → decode → strip → wrap JSON → json.loads → filter/inspect → map” logic into a couple of small helpers. For example:
```python
def run_list_json(cmd: list[str]) -> list[dict]:
out = run_cmd(cmd).stdout.decode().strip()
if not out:
return []
# remove trailing comma and wrap in [ … ]
return json.loads(f"[{out.rstrip(',')}]")
```
```python
def inspect_and_map(
items: list[dict], # e.g. output of run_list_json(...)
base_inspect_cmd: list[str], # e.g. [engine, "artifact", "inspect"]
keep_fn: Callable[[dict], bool], # predicate on the inspect JSON
map_fn: Callable[[dict, dict], dict], # builds final record from (list_item, inspect_json)
) -> list[dict]:
results = []
for item in items:
infos = run_list_json(base_inspect_cmd + [item["ID"]])
if not infos or not keep_fn(infos[0]):
continue
results.append(map_fn(item, infos[0]))
return results
```
Then `list_artifacts` becomes:
```python
def list_artifacts(args: EngineArgType):
if args.engine == "docker":
return []
list_cmd = [
args.engine, "artifact", "ls",
"--format", '{"name":"oci://{{ .Repository }}:{{ .Tag }}","size":"{{ .Size }}","ID":"{{ .Digest }}"},',
]
raw = run_list_json(list_cmd)
return inspect_and_map(
raw,
[args.engine, "artifact", "inspect"],
lambda info: info.get("Manifest", {}).get("artifactType") == annotations.ArtifactTypeModelManifest,
lambda item, info: {
"name": item["name"],
"modified": datetime.now().isoformat(),
"size": convert_from_human_readable_size(item["size"]),
}
)
```
You can apply the same pattern in your image/list_manifests paths, so you remove the per‐block run_cmd/json.loads boilerplate entirely.
</issue_to_address>
### Comment 10
<location> `ramalama/oci_tools.py:13-17` </location>
<code_context>
def convert_from_human_readable_size(input):
sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)]
for unit, size in sizes:
if input.endswith(unit) or input.endswith(unit.lower()):
return float(input[: -len(unit)]) * size
return input
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))
```suggestion
return next(
(
float(input[: -len(unit)]) * size
for unit, size in sizes
if input.endswith(unit) or input.endswith(unit.lower())
),
input,
)
```
</issue_to_address>
### Comment 11
<location> `ramalama/oci_tools.py:35` </location>
<code_context>
def list_artifacts(args: EngineArgType):
if args.engine == "docker":
return []
conman_args = [
args.engine,
"artifact",
"ls",
"--format",
('{"name":"oci://{{ .Repository }}:{{ .Tag }}", "size":"{{ .Size }}", "ID":"{{ .Digest }}"},'),
]
output = run_cmd(conman_args).stdout.decode("utf-8").strip()
if output == "":
return []
artifacts = json.loads("[" + output[:-1] + "]")
models = []
for artifact in artifacts:
conman_args = [
args.engine,
"artifact",
"inspect",
artifact["ID"],
]
output = run_cmd(conman_args).stdout.decode("utf-8").strip()
if output == "":
continue
inspect = json.loads(output)
if "Manifest" not in inspect:
continue
if "artifactType" not in inspect["Manifest"]:
continue
if inspect["Manifest"]['artifactType'] != annotations.ArtifactTypeModelManifest:
continue
models += [
{
"name": artifact["name"],
"modified": str(datetime.now()),
"size": convert_from_human_readable_size(artifact["size"]),
}
]
return models
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
```suggestion
artifacts = json.loads(f"[{output[:-1]}]")
```
</issue_to_address>
### Comment 12
<location> `ramalama/transports/base.py:678` </location>
<code_context>
def get_inspect(
self,
show_all: bool = False,
show_all_metadata: bool = False,
get_field: str = "",
dryrun: bool = False,
) -> None:
as_json = True
model_name = self.filename
model_registry = self.type.lower()
model_path = self._get_entry_model_path(False, False, dryrun)
if GGUFInfoParser.is_model_gguf(model_path):
if not show_all_metadata and get_field == "":
gguf_info: GGUFModelInfo = GGUFInfoParser.parse(model_name, model_registry, model_path)
return gguf_info.serialize(json=as_json, all=show_all)
metadata = GGUFInfoParser.parse_metadata(model_path)
if show_all_metadata:
return metadata.serialize(json=as_json)
elif get_field != "": # If a specific field is requested, print only that field
field_value = metadata.get(get_field)
if field_value is None:
raise KeyError(f"Field '{get_field}' not found in GGUF model metadata")
return field_value
if SafetensorInfoParser.is_model_safetensor(model_name):
safetensor_info: SafetensorModelInfo = SafetensorInfoParser.parse(model_name, model_registry, model_path)
return safetensor_info.serialize(json=as_json, all=show_all)
return ModelInfoBase(model_name, model_registry, model_path).serialize(json=as_json)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Replaces an empty collection equality with a boolean operation ([`simplify-empty-collection-comparison`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/simplify-empty-collection-comparison/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Remove redundant conditional ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
</issue_to_address>
### Comment 13
<location> `ramalama/transports/oci.py:286` </location>
<code_context>
def push(self, source_model, args):
target = self.model
source = source_model.model
conman_args = [self.conman, "push"]
type = "image"
if args.type == "artifact":
type = args.type
conman_args.insert(1, "artifact")
perror(f"Pushing {type} {self.model} ...")
if args.authfile:
conman_args.extend([f"--authfile={args.authfile}"])
if str(args.tlsverify).lower() == "false":
conman_args.extend([f"--tls-verify={args.tlsverify}"])
conman_args.extend([target])
if source != target:
self._convert(source_model, args)
try:
run_cmd(conman_args)
except subprocess.CalledProcessError as e:
try:
if args.type != "artifact":
perror(f"Pushing artifact {self.model} ...")
conman_args.insert(1, "artifact")
run_cmd(conman_args)
except subprocess.CalledProcessError:
perror(f"Failed to push OCI {target} : {e}")
raise e
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Explicitly raise from a previous error ([`raise-from-previous-error`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/raise-from-previous-error/))
- Don't assign to builtin variable `type` [×2] ([`avoid-builtin-shadow`](https://docs.sourcery.ai/Reference/Default-Rules/comments/avoid-builtin-shadow/))
<br/><details><summary>Explanation</summary>
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.</details>
</issue_to_address>
### Comment 14
<location> `ramalama/transports/oci.py:376-377` </location>
<code_context>
def inspect(
self,
show_all: bool = False,
show_all_metadata: bool = False,
get_field: str = "",
as_json: bool = False,
dryrun: bool = False,
) -> None:
out = super().get_inspect(show_all, show_all_metadata, get_field, dryrun)
conman_args = [self.conman, "image", "inspect", self.model]
try:
out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip().split('\n')
print("OCI Image")
except Exception as e:
conman_args = [self.conman, "artifact", "inspect", self.model]
try:
out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip()
print("OCI Artifact")
except Exception:
raise e
print(out)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Explicitly raise from a previous error ([`raise-from-previous-error`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/raise-from-previous-error/))
```suggestion
except Exception as exc:
raise e from exc
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if source != target: | ||
| self._convert(source_model, args) | ||
| try: | ||
| run_cmd(conman_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The error handling in push retries pushing as artifact if the initial push fails and type is not 'artifact'.
Log the original error before retrying, and clarify the fallback logic to ensure maintainers understand when and why the retry occurs.
ramalama/oci_tools.py
Outdated
| if output == "": | ||
| return [] | ||
|
|
||
| artifacts = json.loads("[" + output[:-1] + "]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Parsing output with output[:-1] assumes a trailing comma, which may be fragile.
This approach may fail if the output format changes. Use a parsing method that does not rely on a trailing comma to avoid potential errors.
| models += list_manifests(args) | ||
| models += list_artifacts(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The models list is extended with both manifests and artifacts, which may result in duplicate entries if the same model exists in both forms.
Deduplicate models by name or ID to prevent listing the same model multiple times.
Suggested implementation:
# Combine manifests and artifacts, then deduplicate by 'name'
manifest_models = list_manifests(args)
artifact_models = list_artifacts(args)
combined_models = manifest_models + artifact_models
# Deduplicate by 'name'
seen_names = set()
deduped_models = []
for model in combined_models:
model_name = model.get("name")
if model_name and model_name not in seen_names:
deduped_models.append(model)
seen_names.add(model_name)
models += deduped_models
for model in models:
# Convert to ISO 8601 format- If your models use a different unique key (e.g., "id" instead of "name"), replace
"name"with the appropriate key in the deduplication logic. - If
modelsis not empty before this block, you may want to deduplicate the entire list (including existing entries).
| @test "ramalama artifact - concurrent operations" { | ||
| skip_if_nocontainer | ||
| skip_if_docker | ||
| skip_if_ppc64le | ||
| skip_if_s390x | ||
|
|
||
| echo "test model 1" > $RAMALAMA_TMPDIR/testmodel1 | ||
| echo "test model 2" > $RAMALAMA_TMPDIR/testmodel2 | ||
|
|
||
| # Create two artifacts concurrently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Concurrency test for artifact creation and removal.
Consider adding assertions to verify data integrity and detect race conditions, ensuring artifacts remain intact after concurrent creation.
ramalama/oci_tools.py
Outdated
| for unit, size in sizes: | ||
| if input.endswith(unit) or input.endswith(unit.lower()): | ||
| return float(input[: -len(unit)]) * size | ||
|
|
||
| return input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use the built-in function next instead of a for-loop (use-next)
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| return input | |
| return next( | |
| ( | |
| float(input[: -len(unit)]) * size | |
| for unit, size in sizes | |
| if input.endswith(unit) or input.endswith(unit.lower()) | |
| ), | |
| input, | |
| ) |
|
|
||
| model_path = self._get_entry_model_path(False, False, dryrun) | ||
| if GGUFInfoParser.is_model_gguf(model_path): | ||
| if not show_all_metadata and get_field == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Replaces an empty collection equality with a boolean operation (
simplify-empty-collection-comparison) - Remove unnecessary else after guard condition (
remove-unnecessary-else) - Remove redundant conditional (
remove-redundant-if)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces support for OCI artifacts, which is a great addition. The changes span across documentation, CLI, configuration, and transports, and include a comprehensive new test suite. The implementation looks promising but is marked as a WIP, and there are a few areas that need attention. I've found some critical issues in the oci transport implementation, particularly in artifact removal and inspection logic, which could lead to runtime errors. There are also some high-severity issues related to incorrect data being presented (e.g., modification timestamps) and potential bugs from object mutation. The new test file has a typo in its shebang that could prevent it from running. I've provided specific comments and suggestions to address these points.
ramalama/transports/oci.py
Outdated
| def _rm_artifact(self, ignore): | ||
| rm_cmd = [ | ||
| self.conman, | ||
| "artifact", | ||
| "rm", | ||
| self.model, | ||
| ] | ||
| if ignore: | ||
| rm_cmd.extend("--ignore") | ||
| rm_cmd.append(self.model) | ||
|
|
||
| run_cmd( | ||
| rm_cmd, | ||
| ignore_all=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues in the _rm_artifact method:
rm_cmd.extend("--ignore")will add each character of the string as a separate argument to the command list (e.g.,..., '-', '-', 'i', ...). It should berm_cmd.append("--ignore")orrm_cmd.extend(["--ignore"]).self.modelis added torm_cmdtwice: once during initialization and again withrm_cmd.append(self.model).
This will result in an incorrect command.
def _rm_artifact(self, ignore):
rm_cmd = [
self.conman,
"artifact",
"rm",
]
if ignore:
rm_cmd.append("--ignore")
rm_cmd.append(self.model)
run_cmd(
rm_cmd,
ignore_all=True,
)
ramalama/transports/oci.py
Outdated
| def inspect( | ||
| self, | ||
| show_all: bool = False, | ||
| show_all_metadata: bool = False, | ||
| get_field: str = "", | ||
| as_json: bool = False, | ||
| dryrun: bool = False, | ||
| ) -> None: | ||
| out = super().get_inspect(show_all, show_all_metadata, get_field, dryrun) | ||
| conman_args = [self.conman, "image", "inspect", self.model] | ||
| try: | ||
| out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip().split('\n') | ||
| print("OCI Image") | ||
| except Exception as e: | ||
| conman_args = [self.conman, "artifact", "inspect", self.model] | ||
| try: | ||
| out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip() | ||
| print("OCI Artifact") | ||
| except Exception: | ||
| raise e | ||
|
|
||
| print(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inspect method has a bug that will cause a TypeError. On line 369, out += ... .split('\n') attempts to concatenate a list (the result of .split('\n')) to a string (out), which is not a valid operation. The logic for combining the output from super().get_inspect() and the podman inspect commands needs to be reworked to handle the data correctly, likely by parsing the JSON strings into Python objects and merging them before printing.
ramalama/cli.py
Outdated
| newargs=args | ||
| newargs.ignore=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line newargs=args creates a new reference to the args object, not a copy. Modifying newargs.ignore will mutate the original args object, which can lead to unexpected side effects elsewhere in the program. It's safer to work on a shallow copy of the args object.
| newargs=args | |
| newargs.ignore=False | |
| import copy | |
| newargs = copy.copy(args) | |
| newargs.ignore=False |
ramalama/oci_tools.py
Outdated
| models += [ | ||
| { | ||
| "name": artifact["name"], | ||
| "modified": str(datetime.now()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modified timestamp for an artifact is being set to the current time using datetime.now(). This is inaccurate as it reflects when the list_artifacts function was run, not when the artifact was actually created or modified. You can get the creation timestamp from podman artifact ls by adding {{ .CreatedAt }} to the format string on line 29, and then using that value here.
| "modified": str(datetime.now()), | |
| "modified": artifact["modified"], |
docs/ramalama.conf
Outdated
| # | ||
| # artifact: Store AI Models as artifacts | ||
| # car: Traditional OCI image including base image with the model stored in a /models subdir | ||
| # raw Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ramalama/cli.py
Outdated
| default="raw", | ||
| choices=["car", "raw"], | ||
| choices=["artifact", "car", "raw"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The push command's --type argument defaults to "raw", while the convert command's default is configurable via CONFIG.convert_type. For consistency, it would be better if the push command also respected the configured default type.
| default="raw", | |
| choices=["car", "raw"], | |
| choices=["artifact", "car", "raw"], | |
| default=CONFIG.convert_type, |
ramalama/oci_tools.py
Outdated
| def convert_from_human_readable_size(input): | ||
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | ||
| for unit, size in sizes: | ||
| if input.endswith(unit) or input.endswith(unit.lower()): | ||
| return float(input[: -len(unit)]) * size | ||
|
|
||
| return input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function convert_from_human_readable_size returns the input as a string if it doesn't match any of the size units. If the input is a plain number string (e.g., representing bytes), this will result in a string being returned instead of a numeric type, which could cause issues for consumers of this function that expect a number. It would be more robust to attempt to convert the input to a float if no units are found.
| def convert_from_human_readable_size(input): | |
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | |
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| return input | |
| def convert_from_human_readable_size(input): | |
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | |
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| try: | |
| return float(input) | |
| except ValueError: | |
| return input |
ramalama/transports/base.py
Outdated
| show_all_metadata: bool = False, | ||
| get_field: str = "", | ||
| dryrun: bool = False, | ||
| ) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/system/056-artifact.bats
Outdated
| @@ -0,0 +1,545 @@ | |||
| #!/Us/bin/env bats | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7f55115 to
6d468dc
Compare
|
/gemini retest |
|
It looks like you tried to use the |
|
@sourcery-ai review |
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- The push/pull/remove/exists methods have a lot of duplicated branching for 'artifact' vs 'image' commands—consider centralizing this logic into a helper or dispatch table to reduce complexity and repetition.
- The inspect implementations in both OCI and base transports contain deeply nested try/except blocks and decode/formatting logic—refactor them into a shared helper or strategy to simplify the flow.
- Listing functions in oci_tools.py repeat JSON parsing and size conversion logic for images and artifacts—extract common parsing routines to keep behavior consistent and avoid code duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The push/pull/remove/exists methods have a lot of duplicated branching for 'artifact' vs 'image' commands—consider centralizing this logic into a helper or dispatch table to reduce complexity and repetition.
- The inspect implementations in both OCI and base transports contain deeply nested try/except blocks and decode/formatting logic—refactor them into a shared helper or strategy to simplify the flow.
- Listing functions in oci_tools.py repeat JSON parsing and size conversion logic for images and artifacts—extract common parsing routines to keep behavior consistent and avoid code duplication.
## Individual Comments
### Comment 1
<location> `ramalama/transports/oci.py:67-68` </location>
<code_context>
# Keep this in sync with docs/ramalama-oci.5.md !
is_car = args.type == "car"
+ is_raw = args.type == "raw"
+ if not is_car and not is_raw:
+ raise ValueError(f"argument --type: invalid choice: '{args.type}' (choose from artifact, car, raw)")
has_gguf = getattr(args, 'gguf', None) is not None
content = ""
</code_context>
<issue_to_address>
**issue (bug_risk):** The type validation excludes 'artifact', which is now a valid option.
Please update the validation to accept 'artifact' as a valid type to align with recent CLI and config updates.
</issue_to_address>
### Comment 2
<location> `ramalama/oci_tools.py:11-17` </location>
<code_context>
-def engine_supports_manifest_attributes(engine):
+def convert_from_human_readable_size(input) -> str:
+ sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)]
+ for unit, size in sizes:
+ if input.endswith(unit) or input.endswith(unit.lower()):
+ return float(input[: -len(unit)]) * size
+
+ return str(input)
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The function convert_from_human_readable_size returns a string for unrecognized units, which may cause downstream issues.
Consider raising an exception or returning a default numeric value for unrecognized units to maintain consistent return types and prevent type errors.
</issue_to_address>
### Comment 3
<location> `ramalama/transports/base.py:204-206` </location>
<code_context>
if self.model_type == 'oci':
if use_container or should_generate:
+ if self.artifact:
+ return os.path.join(MNT_DIR, self.artifact_name())
return os.path.join(MNT_DIR, 'model.file')
</code_context>
<issue_to_address>
**issue (bug_risk):** The artifact_name method is called without checking if it is implemented in all subclasses.
This may cause an AttributeError at runtime. Use hasattr to check for the method or add a default implementation in the base class.
</issue_to_address>
### Comment 4
<location> `ramalama/transports/base.py:676-685` </location>
<code_context>
+ as_json = True
</code_context>
<issue_to_address>
**issue (bug_risk):** The as_json variable is always set to True, which may override user intent.
This assignment ignores the user's input. Use the as_json argument value instead.
</issue_to_address>
### Comment 5
<location> `ramalama/cli.py:1384-1385` </location>
<code_context>
m = New(model, args)
- m.remove(args)
+ # Don't ignore missing so that we attempt OCI as well.
+ newargs = args
+ newargs.ignore = False
+ m.remove(newargs)
+ continue
</code_context>
<issue_to_address>
**issue (bug_risk):** Modifying args in-place may have unintended side effects elsewhere in the code.
Copy args before modifying to prevent side effects in other parts of the code.
</issue_to_address>
### Comment 6
<location> `test/system/056-artifact.bats:405` </location>
<code_context>
+ assert "$output" !~ ".*concurrent-artifact2" "second concurrent artifact was removed"
+}
+
+@test "ramalama artifact - error handling for invalid source" {
+ skip_if_nocontainer
+ skip_if_docker
</code_context>
<issue_to_address>
**suggestion (testing):** Invalid source error handling is covered.
Please add a test case for sources with insufficient permissions, such as unreadable files, to enhance error handling coverage.
</issue_to_address>
### Comment 7
<location> `test/system/056-artifact.bats:416-417` </location>
<code_context>
+ is "$output" ".*Error.*" "directory as source is handled gracefully"
+}
+
+@test "ramalama artifact - size reporting accuracy" {
+ skip_if_nocontainer
+ skip_if_docker
</code_context>
<issue_to_address>
**suggestion (testing):** Artifact size reporting is validated.
Please add a test for multi-file artifacts to confirm the reported size matches the total size of all files.
```suggestion
is "$output" ".*Error.*" "directory as source is handled gracefully"
}
@test "ramalama artifact - multi-file size reporting accuracy" {
skip_if_nocontainer
skip_if_docker
# Create a directory with multiple files
mkdir -p $RAMALAMA_TMPDIR/multifile_artifact
echo "abc" > $RAMALAMA_TMPDIR/multifile_artifact/file1.txt # 3 bytes
echo "defgh" > $RAMALAMA_TMPDIR/multifile_artifact/file2.txt # 5 bytes
echo "ijklmno" > $RAMALAMA_TMPDIR/multifile_artifact/file3.txt # 7 bytes
artifact=artifact-multifile:latest
run_ramalama ? rm ${artifact}
run_ramalama convert --type artifact file://$RAMALAMA_TMPDIR/multifile_artifact ${artifact}
# Get reported size from output (assuming output contains "Size: <bytes>")
reported_size=$(echo "$output" | grep -Eo 'Size: [0-9]+' | awk '{print $2}')
# Calculate expected size
expected_size=$((3 + 5 + 7))
[ "$reported_size" -eq "$expected_size" ]
assert_success "Reported artifact size ($reported_size) matches total file size ($expected_size)"
}
```
</issue_to_address>
### Comment 8
<location> `test/system/056-artifact.bats:472-474` </location>
<code_context>
+ assert "$output" !~ ".*config-test-artifact" "artifact was removed"
+}
+
+@test "ramalama config - convert_type validation" {
+ skip_if_nocontainer
+
</code_context>
<issue_to_address>
**suggestion (testing):** Config validation for convert_type is tested.
Please add tests for missing config values and malformed config files to further strengthen config parsing robustness.
```suggestion
@test "ramalama config - convert_type validation" {
skip_if_nocontainer
}
@test "ramalama config - missing required config value" {
skip_if_nocontainer
# Create a config missing a required value (e.g., missing 'convert_type')
config_file="$(mktemp)"
cat > "$config_file" <<EOF
artifact:
name: config-test-artifact
# convert_type is intentionally missing
EOF
run ramalama convert --config "$config_file"
# Expect failure and a message about missing config value
[ "$status" -ne 0 ]
[[ "$output" == *"missing required config value"* ]]
rm -f "$config_file"
}
@test "ramalama config - malformed config file" {
skip_if_nocontainer
# Create a malformed config file
config_file="$(mktemp)"
echo "artifact: name: config-test-artifact convert_type: [unclosed" > "$config_file"
run ramalama convert --config "$config_file"
# Expect failure and a message about malformed config
[ "$status" -ne 0 ]
[[ "$output" == *"malformed config"* || "$output" == *"parse error"* ]]
rm -f "$config_file"
}
```
</issue_to_address>
### Comment 9
<location> `test/system/056-artifact.bats:351` </location>
<code_context>
+ assert "$output" !~ ".*large-artifact" "large artifact was removed"
+}
+
+@test "ramalama artifact - multiple files in artifact" {
+ skip_if_nocontainer
+ skip_if_docker
</code_context>
<issue_to_address>
**suggestion (testing):** Multi-file artifact creation is tested.
Please add a test to check that all expected files are present in the artifact and correctly named.
</issue_to_address>
### Comment 10
<location> `ramalama/transports/oci.py:340` </location>
<code_context>
+ except subprocess.CalledProcessError:
+ self._rm_artifact(args.ignore)
def exists(self) -> bool:
if self.conman is None:
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting a generic helper for conman command invocation to eliminate duplicated logic and simplify method implementations.
```markdown
You’ve got a lot of 1:1 duplication around building/running `conman` commands for “image” vs “artifact” and retry‐handling. Extract a small generic helper and then simplify each operation to one‐liner calls. For example:
```python
# add near top of class
def _invoke(self,
kind: Literal["image", "artifact"],
action: str,
*args: str,
ignore_stderr: bool | None = None,
retry: bool = False) -> subprocess.CompletedProcess:
"""
Generic conman invoker for either image or artifact commands.
"""
cmd = [self.conman, kind, action, *args]
try:
return run_cmd(cmd, ignore_stderr=(self.ignore_stderr if ignore_stderr is None else ignore_stderr))
except subprocess.CalledProcessError:
if retry:
return run_cmd(cmd, ignore_stderr=True)
raise
# refactor exists() to use helper
def exists(self) -> bool:
if self.conman is None:
return False
# try image.inspect, fallback to artifact.inspect
try:
self._invoke("image", "inspect", self.model)
return True
except subprocess.CalledProcessError:
try:
self._invoke("artifact", "inspect", self.model)
return True
except subprocess.CalledProcessError:
return False
# refactor remove() to use helper
def remove(self, args, ignore_stderr=False):
if self.conman is None:
raise NotImplementedError("OCI Images require a container engine")
# try manifest.rm, else rmi, else artifact.rm
try:
self._invoke("manifest", "rm", self.model, ignore_stderr=ignore_stderr)
except subprocess.CalledProcessError:
try:
self._invoke("image", "rmi", f"--force={args.ignore}", self.model, ignore_stderr=ignore_stderr)
except subprocess.CalledProcessError:
self._invoke("artifact", "rm", self.model, ignore_stderr=ignore_stderr)
```
Then collapse your other methods (`push`, `_inspect`, `mount_cmd`, etc.) to calls like:
```python
kind = "artifact" if args.type == "artifact" else "image"
self._invoke(kind, "push", target, ignore_stderr=False, retry=True)
```
This removes per‐method branching, centralizes retry/stderr logic, and keeps all functionality intact.
</issue_to_address>
### Comment 11
<location> `ramalama/oci_tools.py:20` </location>
<code_context>
+ return str(input)
+
+
+def list_artifacts(args: EngineArgType):
+ if args.engine == "docker":
+ return []
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring by extracting shared logic into helper functions to reduce duplication and simplify list_models and list_artifacts.
Here’s one way to pull out the common bits into small helpers and keep both list_models and list_artifacts thin. You can
1. factor out the “run‐and‐JSON‐parse” step
2. unify the manifest‐based filtering/inspect loop
3. share your size‐conversion and timestamp‐normalization
```python
def _run_and_parse(cmd: list[str]) -> list[dict]:
out = run_cmd(cmd).stdout.decode().strip()
return json.loads(f"[{out[:-1]}]") if out else []
def _normalize_timestamp(ts: str) -> str:
dt = datetime.fromisoformat(ts.replace(" UTC","")
.replace("+0000", "+00:00")
.replace(" ", "T"))
return dt.isoformat()
def _inspect_and_filter(engine: str, items: list[dict], subcmd: str, want_type: str) -> list[dict]:
result = []
for item in items:
info = run_cmd([engine, subcmd, "inspect", item["ID"]]).stdout.decode().strip()
if not info:
continue
manifest = json.loads(info).get("Manifest", {})
if manifest.get("artifactType") != want_type:
continue
result.append(item)
return result
```
Then your two listing functions become:
```python
def list_models(args):
# 1) grab image‐list
models = _run_and_parse([args.engine, "images", "--filter", f"label={ocilabeltype}", "--format", formatLine])
models = [m for m in models if m["name"] != "oci://<none>:<none>"]
# 2) inspect/filter for model‐manifests
models = _inspect_and_filter(args.engine, models, "image", annotations.ArtifactTypeModelManifest)
# 3) size for docker
if args.engine == "docker":
for m in models:
m["size"] = int(run_cmd([args.engine, "image", "inspect", m["ID"], "--format", "{{.Size}}"])
.stdout.decode().strip())
# 4) normalize
for m in models:
m["modified"] = _normalize_timestamp(m["modified"])
m["name"], m.pop("ID")
return models
def list_artifacts(args):
if args.engine == "docker":
return []
# 1) grab artifact‐list
arts = _run_and_parse([
args.engine, "artifact", "ls", "--format",
'{"name":"oci://{{ .Repository }}:{{ .Tag }}","modified":"{{ .CreatedAt }}","size":"{{ .Size }}","ID":"{{ .Digest }}"},'
])
# 2) inspect/filter for model‐manifests
models = _inspect_and_filter(args.engine, arts, "artifact", annotations.ArtifactTypeModelManifest)
# 3) convert sizes & normalize
for m in models:
m["size"] = convert_from_human_readable_size(m["size"])
m["modified"] = _normalize_timestamp(m["modified"])
m.pop("ID")
return models
```
This removes ALL the duplicated loops/JSON hacks and keeps each function focused on just “which subcommand to call” and “what to filter on.”
</issue_to_address>
### Comment 12
<location> `ramalama/oci_tools.py:13-17` </location>
<code_context>
def convert_from_human_readable_size(input) -> str:
sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)]
for unit, size in sizes:
if input.endswith(unit) or input.endswith(unit.lower()):
return float(input[: -len(unit)]) * size
return str(input)
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))
```suggestion
return next(
(
float(input[: -len(unit)]) * size
for unit, size in sizes
if input.endswith(unit) or input.endswith(unit.lower())
),
str(input),
)
```
</issue_to_address>
### Comment 13
<location> `ramalama/oci_tools.py:40` </location>
<code_context>
def list_artifacts(args: EngineArgType):
if args.engine == "docker":
return []
conman_args = [
args.engine,
"artifact",
"ls",
"--format",
(
'{"name":"oci://{{ .Repository }}:{{ .Tag }}",\
"created":"{{ .CreatedAt }}", \
"size":"{{ .Size }}", \
"ID":"{{ .Digest }}"},'
),
]
output = run_cmd(conman_args).stdout.decode("utf-8").strip()
if output == "":
return []
artifacts = json.loads("[" + output[:-1] + "]")
models = []
for artifact in artifacts:
conman_args = [
args.engine,
"artifact",
"inspect",
artifact["ID"],
]
output = run_cmd(conman_args).stdout.decode("utf-8").strip()
if output == "":
continue
inspect = json.loads(output)
if "Manifest" not in inspect:
continue
if "artifactType" not in inspect["Manifest"]:
continue
if inspect["Manifest"]['artifactType'] != annotations.ArtifactTypeModelManifest:
continue
models += [
{
"name": artifact["name"],
"modified": artifact["created"],
"size": convert_from_human_readable_size(artifact["size"]),
}
]
return models
</code_context>
<issue_to_address>
**suggestion (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
```suggestion
artifacts = json.loads(f"[{output[:-1]}]")
```
</issue_to_address>
### Comment 14
<location> `ramalama/transports/base.py:681` </location>
<code_context>
def get_inspect(
self,
show_all: bool = False,
show_all_metadata: bool = False,
get_field: str = "",
dryrun: bool = False,
) -> Any:
as_json = True
model_name = self.filename
model_registry = self.type.lower()
model_path = self._get_entry_model_path(False, False, dryrun)
if GGUFInfoParser.is_model_gguf(model_path):
if not show_all_metadata and get_field == "":
gguf_info: GGUFModelInfo = GGUFInfoParser.parse(model_name, model_registry, model_path)
return gguf_info.serialize(json=as_json, all=show_all)
metadata = GGUFInfoParser.parse_metadata(model_path)
if show_all_metadata:
return metadata.serialize(json=as_json)
elif get_field != "": # If a specific field is requested, print only that field
field_value = metadata.get(get_field)
if field_value is None:
raise KeyError(f"Field '{get_field}' not found in GGUF model metadata")
return field_value
if SafetensorInfoParser.is_model_safetensor(model_name):
safetensor_info: SafetensorModelInfo = SafetensorInfoParser.parse(model_name, model_registry, model_path)
return safetensor_info.serialize(json=as_json, all=show_all)
return ModelInfoBase(model_name, model_registry, model_path).serialize(json=as_json)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Replaces an empty collection equality with a boolean operation ([`simplify-empty-collection-comparison`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/simplify-empty-collection-comparison/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Remove redundant conditional ([`remove-redundant-if`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-if/))
</issue_to_address>
### Comment 15
<location> `ramalama/transports/oci.py:284` </location>
<code_context>
def push(self, source_model, args):
target = self.model
source = source_model.model
conman_args = [self.conman, "push"]
type = "image"
if args.type == "artifact":
type = args.type
conman_args.insert(1, "artifact")
perror(f"Pushing {type} {self.model} ...")
if args.authfile:
conman_args.extend([f"--authfile={args.authfile}"])
if str(args.tlsverify).lower() == "false":
conman_args.extend([f"--tls-verify={args.tlsverify}"])
conman_args.extend([target])
if source != target:
self._convert(source_model, args)
try:
run_cmd(conman_args)
except subprocess.CalledProcessError as e:
try:
if args.type != "artifact":
perror(f"Pushing artifact {self.model} ...")
conman_args.insert(1, "artifact")
run_cmd(conman_args)
except subprocess.CalledProcessError:
perror(f"Failed to push OCI {target} : {e}")
raise e
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Explicitly raise from a previous error ([`raise-from-previous-error`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/raise-from-previous-error/))
- Don't assign to builtin variable `type` [×2] ([`avoid-builtin-shadow`](https://docs.sourcery.ai/Reference/Default-Rules/comments/avoid-builtin-shadow/))
<br/><details><summary>Explanation</summary>
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.</details>
</issue_to_address>
### Comment 16
<location> `ramalama/transports/oci.py:374-375` </location>
<code_context>
def _inspect(
self,
show_all: bool = False,
show_all_metadata: bool = False,
get_field: str = "",
as_json: bool = False,
dryrun: bool = False,
) -> (str, str):
out = super().get_inspect(show_all, show_all_metadata, get_field, dryrun)
conman_args = [self.conman, "image", "inspect", self.model]
type = "Image"
try:
out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip()
except Exception as e:
conman_args = [self.conman, "artifact", "inspect", self.model]
try:
out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip()
type = "Artifact"
except Exception:
raise e
return out, type
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Explicitly raise from a previous error ([`raise-from-previous-error`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/raise-from-previous-error/))
- Don't assign to builtin variable `type` [×2] ([`avoid-builtin-shadow`](https://docs.sourcery.ai/Reference/Default-Rules/comments/avoid-builtin-shadow/))
```suggestion
except Exception as exc:
raise e from exc
```
<br/><details><summary>Explanation</summary>
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.
Python has a number of `builtin` variables: functions and constants that
form a part of the language, such as `list`, `getattr`, and `type`
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
```python
list = [1, 2, 3]
```
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as `integers`.
In a pinch, `my_list` and similar names are colloquially-recognized
placeholders.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| assert "$output" !~ ".*concurrent-artifact2" "second concurrent artifact was removed" | ||
| } | ||
|
|
||
| @test "ramalama artifact - error handling for invalid source" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Invalid source error handling is covered.
Please add a test case for sources with insufficient permissions, such as unreadable files, to enhance error handling coverage.
| is "$output" ".*Error.*" "directory as source is handled gracefully" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Artifact size reporting is validated.
Please add a test for multi-file artifacts to confirm the reported size matches the total size of all files.
| is "$output" ".*Error.*" "directory as source is handled gracefully" | |
| } | |
| is "$output" ".*Error.*" "directory as source is handled gracefully" | |
| } | |
| @test "ramalama artifact - multi-file size reporting accuracy" { | |
| skip_if_nocontainer | |
| skip_if_docker | |
| # Create a directory with multiple files | |
| mkdir -p $RAMALAMA_TMPDIR/multifile_artifact | |
| echo "abc" > $RAMALAMA_TMPDIR/multifile_artifact/file1.txt # 3 bytes | |
| echo "defgh" > $RAMALAMA_TMPDIR/multifile_artifact/file2.txt # 5 bytes | |
| echo "ijklmno" > $RAMALAMA_TMPDIR/multifile_artifact/file3.txt # 7 bytes | |
| artifact=artifact-multifile:latest | |
| run_ramalama ? rm ${artifact} | |
| run_ramalama convert --type artifact file://$RAMALAMA_TMPDIR/multifile_artifact ${artifact} | |
| # Get reported size from output (assuming output contains "Size: <bytes>") | |
| reported_size=$(echo "$output" | grep -Eo 'Size: [0-9]+' | awk '{print $2}') | |
| # Calculate expected size | |
| expected_size=$((3 + 5 + 7)) | |
| [ "$reported_size" -eq "$expected_size" ] | |
| assert_success "Reported artifact size ($reported_size) matches total file size ($expected_size)" | |
| } |
| @test "ramalama config - convert_type validation" { | ||
| skip_if_nocontainer | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Config validation for convert_type is tested.
Please add tests for missing config values and malformed config files to further strengthen config parsing robustness.
| @test "ramalama config - convert_type validation" { | |
| skip_if_nocontainer | |
| @test "ramalama config - convert_type validation" { | |
| skip_if_nocontainer | |
| } | |
| @test "ramalama config - missing required config value" { | |
| skip_if_nocontainer | |
| # Create a config missing a required value (e.g., missing 'convert_type') | |
| config_file="$(mktemp)" | |
| cat > "$config_file" <<EOF | |
| artifact: | |
| name: config-test-artifact | |
| # convert_type is intentionally missing | |
| EOF | |
| run ramalama convert --config "$config_file" | |
| # Expect failure and a message about missing config value | |
| [ "$status" -ne 0 ] | |
| [[ "$output" == *"missing required config value"* ]] | |
| rm -f "$config_file" | |
| } | |
| @test "ramalama config - malformed config file" { | |
| skip_if_nocontainer | |
| # Create a malformed config file | |
| config_file="$(mktemp)" | |
| echo "artifact: name: config-test-artifact convert_type: [unclosed" > "$config_file" | |
| run ramalama convert --config "$config_file" | |
| # Expect failure and a message about malformed config | |
| [ "$status" -ne 0 ] | |
| [[ "$output" == *"malformed config"* || "$output" == *"parse error"* ]] | |
| rm -f "$config_file" | |
| } |
| assert "$output" !~ ".*large-artifact" "large artifact was removed" | ||
| } | ||
|
|
||
| @test "ramalama artifact - multiple files in artifact" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Multi-file artifact creation is tested.
Please add a test to check that all expected files are present in the artifact and correctly named.
ramalama/oci_tools.py
Outdated
| for unit, size in sizes: | ||
| if input.endswith(unit) or input.endswith(unit.lower()): | ||
| return float(input[: -len(unit)]) * size | ||
|
|
||
| return str(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use the built-in function next instead of a for-loop (use-next)
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| return str(input) | |
| return next( | |
| ( | |
| float(input[: -len(unit)]) * size | |
| for unit, size in sizes | |
| if input.endswith(unit) or input.endswith(unit.lower()) | |
| ), | |
| str(input), | |
| ) |
841e532 to
1127668
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces comprehensive support for OCI artifacts, a significant and valuable feature. The changes are extensive, touching documentation, configuration, the CLI, and transport layers, and are well-supported by new end-to-end and system tests. The overall implementation is strong, but I have identified a few issues. These include a critical bug in how inspection data is combined, a couple of high-severity bugs related to command construction and data type handling, and a minor formatting inconsistency in the documentation. Addressing these points will help ensure the robustness and correctness of this new functionality.
ramalama/transports/oci.py
Outdated
| def _inspect( | ||
| self, | ||
| show_all: bool = False, | ||
| show_all_metadata: bool = False, | ||
| get_field: str = "", | ||
| as_json: bool = False, | ||
| dryrun: bool = False, | ||
| ) -> (str, str): | ||
| out = super().get_inspect(show_all, show_all_metadata, get_field, dryrun) | ||
| conman_args = [self.conman, "image", "inspect", self.model] | ||
| type = "Image" | ||
| try: | ||
| out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip() | ||
| except Exception as e: | ||
| conman_args = [self.conman, "artifact", "inspect", self.model] | ||
| try: | ||
| out += run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip() | ||
| type = "Artifact" | ||
| except Exception: | ||
| raise e | ||
|
|
||
| return out, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _inspect method concatenates multiple JSON strings using += on lines 411 and 415. This results in an invalid JSON string (e.g., {"key":"value"}{"other_key":"other_value"}). To correctly combine the information, you should parse each JSON output into a Python dictionary, merge them, and then serialize the result back to a single, valid JSON string.
def _inspect(
self,
show_all: bool = False,
show_all_metadata: bool = False,
get_field: str = "",
as_json: bool = False,
dryrun: bool = False,
) -> (str, str):
out_data = json.loads(super().get_inspect(show_all, show_all_metadata, get_field, dryrun))
conman_args = [self.conman, "image", "inspect", self.model]
oci_type = "Image"
try:
inspect_output = run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip()
# podman image inspect returns a list of objects
inspect_data = json.loads(inspect_output)
if inspect_data:
out_data.update(inspect_data[0])
except Exception as e:
conman_args = [self.conman, "artifact", "inspect", self.model]
try:
inspect_output = run_cmd(conman_args, ignore_stderr=True).stdout.decode('utf-8').strip()
# podman artifact inspect returns a single object
out_data.update(json.loads(inspect_output))
oci_type = "Artifact"
except Exception:
raise e
return json.dumps(out_data), oci_type
ramalama/oci_tools.py
Outdated
| def convert_from_human_readable_size(input) -> str: | ||
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | ||
| for unit, size in sizes: | ||
| if input.endswith(unit) or input.endswith(unit.lower()): | ||
| return float(input[: -len(unit)]) * size | ||
|
|
||
| return str(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has an incorrect return type hint (-> str) and returns inconsistent types. It returns a float if a unit is found, but a str otherwise. This can lead to TypeError exceptions in functions that consume this output and expect a numeric size. The function should consistently return a numeric type, such as float, to ensure type safety and prevent runtime errors.
| def convert_from_human_readable_size(input) -> str: | |
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | |
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| return str(input) | |
| def convert_from_human_readable_size(input) -> float: | |
| sizes = [("KB", 1024), ("MB", 1024**2), ("GB", 1024**3), ("TB", 1024**4), ("B", 1)] | |
| for unit, size in sizes: | |
| if input.endswith(unit) or input.endswith(unit.lower()): | |
| return float(input[: -len(unit)]) * size | |
| return float(input) |
| def _rm_artifact(self, ignore) -> None: | ||
| rm_cmd = [ | ||
| self.conman, | ||
| "artifact", | ||
| "rm", | ||
| self.model, | ||
| ] | ||
| if ignore: | ||
| rm_cmd.append("--ignore") | ||
| rm_cmd.append(self.model) | ||
|
|
||
| run_cmd( | ||
| rm_cmd, | ||
| ignore_all=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _rm_artifact method constructs the rm command incorrectly. It appends self.model twice and places the --ignore flag in the wrong position, which will cause the command to fail. Command-line options should precede arguments. This needs to be corrected to ensure the command is valid.
def _rm_artifact(self, ignore) -> None:
rm_cmd = [
self.conman,
"artifact",
"rm",
]
if ignore:
rm_cmd.append("--ignore")
rm_cmd.append(self.model)
run_cmd(
rm_cmd,
ignore_all=True,
)
docs/ramalama.conf
Outdated
| # | ||
| # artifact: Store AI Models as artifacts | ||
| # car: Traditional OCI image including base image with the model stored in a /models subdir | ||
| # raw: Traditional OCI image including only the model and a link file `model.file` pointed at it stored at / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Daniel J Walsh <[email protected]>
Just showing this off, I am still working on PRs to get into podman 5.7 release.
containers/podman#27329
containers/podman#27328
containers/podman#27325
containers/podman#27324
containers/podman#27319
containers/podman#27253
Summary by Sourcery
Add comprehensive support for storing and managing AI models as OCI artifacts, extending the transport, CLI, and configuration layers, and covering the new functionality with extensive documentation and end-to-end system tests.
New Features:
Enhancements:
Documentation:
Tests: