Skip to content

ci(manfest) updates the commit-latest script and action for the updated repo structure #3304

Closed
mtchoum1 wants to merge 1 commit intomainfrom
mtchoum1/commit-action
Closed

ci(manfest) updates the commit-latest script and action for the updated repo structure #3304
mtchoum1 wants to merge 1 commit intomainfrom
mtchoum1/commit-action

Conversation

@mtchoum1
Copy link
Copy Markdown
Contributor

@mtchoum1 mtchoum1 commented Apr 7, 2026

Description

changed scripts/update-commit-latest-env.py to look at both odh and rhoai folders in manfest for params-latest
changed .github/workflows/update-commit-latest-env.yaml' to utilize the update of scripts/update-commit-latest-env.py`

How Has This Been Tested?

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Chores
    • Automated manifest update workflow now watches and updates an additional product variant and its config paths.
    • CI workflow installs required tooling (container inspection CLI and Python/runner setup) to ensure reliable manifest generation.
    • Manifest update invocation method adjusted; commit step still detects and commits generated manifest changes with clearer "no changes" reporting.

@openshift-ci openshift-ci Bot requested review from daniellutz and dibryant April 7, 2026 19:54
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 7, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jiridanek for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 89f825c4-09a5-4597-9532-a4ed141d271e

📥 Commits

Reviewing files that changed from the base of the PR and between b883987 and 8d830f5.

📒 Files selected for processing (2)
  • .github/workflows/update-commit-latest-env.yaml
  • scripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (1)
  • scripts/update-commit-latest-env.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/update-commit-latest-env.yaml

📝 Walkthrough

Walkthrough

The workflow trigger path was changed from manifests/base/params-latest.env to manifests/odh/base/params-latest.env. The job now installs skopeo and configures the uv/Python runtime, and invokes the updater via ./uv run python scripts/update-commit-latest-env.py. The Python script was updated to read manifests/odh/base/params-latest.env and write manifests/odh/base/commit-latest.env. The commit-and-push step logic in the workflow still stages/commits manifests/base/commit-latest.env.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Actionable Issues

  • Commit-path mismatch / missing commits (functional): The workflow still stages/commits manifests/base/commit-latest.env while the script writes manifests/odh/base/commit-latest.env. This will cause produced changes to go uncommitted. Fix: align workflow commit paths with the script output or restore script path. (No CWE — functional correctness)

  • Command injection when invoking CLI tools (CWE-78): The updater relies on skopeo to inspect image refs. If the script constructs shell commands using unescaped image references, an attacker-controlled or malformed reference can result in shell injection. Use exec-style subprocess APIs (no shell), validate/sanitize image refs against an allowlist/regex, and avoid composing shell lines from untrusted data.

  • Unvalidated file/path inputs (CWE-22): The script now reads/writes under manifests/odh/... based on path constants. If any path components are derived from repository content or inputs, ensure they are validated/whitelisted and resolve to expected repo subpaths (compare .resolve() prefixes) to prevent path traversal or accidental writes outside the tree.

  • Tool installation and execution on runner (operational risk): Installing and running skopeo in the runner increases attack surface if inputs are untrusted. Run such tools in a constrained environment, validate inputs beforehand, and avoid printing secrets or untrusted file contents to logs. Cite: CWE-94 (injection risks from interpreted inputs) where applicable.

Address the above before merging.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title uses imperative mood ('updates') but contains a typo ('manfest' instead of 'manifest') and lacks required issue/ticket reference. Fix the typo to 'manifest' and add a ticket reference (e.g., 'RHAIENG-XXXX' or 'NO-JIRA') following the format: 'NO-JIRA: ci(manifest): update commit-latest script and action for updated repo structure'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed Description includes basic information about changes and merge criteria checked, but the 'How Has This Been Tested?' section lacks detail on actual testing steps, environment, and verification methods.
Branch Prefix Policy ✅ Passed PR title does not start with a branch prefix like [release-] or [rhoai-]; conventional commit type ci() is compliant with policy for main branch.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the size/l label Apr 7, 2026
@github-actions github-actions Bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Apr 7, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from 3124f74 to 39b5a3c Compare April 7, 2026 19:58
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 7, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 3.50%. Comparing base (eed4d4c) to head (8d830f5).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
scripts/update-commit-latest-env.py 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #3304   +/-   ##
=====================================
  Coverage   3.50%   3.50%           
=====================================
  Files         30      30           
  Lines       3395    3395           
  Branches     543     543           
=====================================
  Hits         119     119           
  Misses      3274    3274           
  Partials       2       2           
Flag Coverage Δ
python 3.50% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
scripts/update-commit-latest-env.py 0.00% <0.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eed4d4c...8d830f5. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from 39b5a3c to a7acbb9 Compare April 7, 2026 19:59
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 7, 2026
@jiridanek
Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 8, 2026
# Using 'docker://' prefix is required for skopeo to identify the transport.
full_image_url = f"docker://{image_url}"

# Use 'inspect --config' which is much faster as it only fetches the config blob.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the comments, IMO quite nonobvious info

try:
async with semaphore:
log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}")
# Create an asynchronous subprocess
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, these can be removed

"""
Main function to orchestrate the concurrent inspection of multiple images.
"""
semaphore = asyncio.Semaphore(22) # Limit concurrent skopeo processes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is somewhat valuable comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for letting me know they we important I'll bring them back

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/update-commit-latest-env.yaml (1)

6-7: ⚠️ Potential issue | 🟡 Minor

Cron comment is incorrect: 0 11 * * * is 11:00 UTC, not midnight.

Either fix the comment or adjust the cron expression.

-    - cron: '0 11 * * *'  # Daily at midnight UTC
+    - cron: '0 11 * * *'  # Daily at 11:00 UTC
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/update-commit-latest-env.yaml around lines 6 - 7, The cron
comment for the schedule block is wrong: the existing schedule: - cron: '0 11 *
* *' runs at 11:00 UTC, not midnight; update either the cron expression to '0 0
* * *' if you want daily midnight UTC, or change the comment to "Daily at 11:00
UTC" to match the current cron, making the edit next to the schedule -> cron: '0
11 * * *' entry.
🧹 Nitpick comments (4)
scripts/update-commit-latest-env.py (3)

174-174: Magic number 22 for semaphore concurrency limit.

Extract to a named constant or CLI argument for clarity and easier tuning.

+MAX_CONCURRENT_INSPECTIONS = 22
+
 async def inspect_urls(urls: typing.Iterable[str]) -> dict[str, str | None]:
     unique = list(dict.fromkeys(urls))
-    semaphore = asyncio.Semaphore(22)
+    semaphore = asyncio.Semaphore(MAX_CONCURRENT_INSPECTIONS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` at line 174, Replace the hard-coded
semaphore = asyncio.Semaphore(22) with a named constant or configurable
parameter: introduce a constant like MAX_CONCURRENT_TASKS (or add a CLI/ENV
argument parsed in main) and use asyncio.Semaphore(MAX_CONCURRENT_TASKS) so the
concurrency limit is clearly named and can be tuned; update any help text or
defaults where the script parses arguments and use that symbol in place of the
magic number in scripts/update-commit-latest-env.py.

57-65: Missing file existence check before open.

load_params_env will raise FileNotFoundError with a raw traceback. Consider checking existence and providing a clearer error message, especially since the workflow depends on both files being present.

Proposed fix
 def load_params_env(path: pathlib.Path) -> list[tuple[str, str]]:
+    if not path.is_file():
+        log.error("params-latest.env not found", path=str(path))
+        sys.exit(1)
     with open(path, "rt") as file:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 57 - 65, The function
load_params_env currently opens the given pathlib.Path directly and will raise a
raw FileNotFoundError; add an explicit existence check at the top of
load_params_env (use path.exists() and/or path.is_file()) and raise a
FileNotFoundError (or ValueError) with a clear, descriptive message that
includes the path value if the file is missing or not a regular file, so callers
get a readable error instead of an opaque traceback.

78-82: Inconsistent exit patterns: mix of sys.exit(1) and raise SystemExit(1).

Lines 78, 82, 99, 180 use sys.exit(1) while line 200 uses raise SystemExit(1). Pick one for consistency—sys.exit(1) is more common in scripts.

-            raise SystemExit(1)
+            sys.exit(1)

Also applies to: 99-100, 199-200

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 78 - 82, The script mixes
exit styles (sys.exit(1) and raise SystemExit(1)); standardize to sys.exit(1)
throughout: replace the raise SystemExit(1) occurrence with sys.exit(1) so all
exits use sys.exit, ensuring the sys module is already imported and leaving
existing sys.exit(1) calls (e.g., in the blocks referencing odh_by_var[var] and
dummy checks) unchanged; update every instance (around the formerly raise
SystemExit usage and the other noted exit calls) to use sys.exit(1) for
consistency.
.github/workflows/update-commit-latest-env.yaml (1)

46-54: git diff may fail if files don't exist yet.

If commit-latest.env files are newly created (first run), git diff on untracked files returns 0 (no error, no diff shown), but git add will stage them. The logic works, but consider using git diff --exit-code explicitly or checking for untracked files to capture new file creation.

More robust change detection
-          if ! git diff --quiet \
-            manifests/odh/base/commit-latest.env \
-            manifests/rhoai/base/commit-latest.env
-          then
-            git add manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env
+          git add manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env
+          if ! git diff --cached --quiet; then
             git commit -m "ci: update commit SHAs for image digests changes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/update-commit-latest-env.yaml around lines 46 - 54, The
current check using "git diff --quiet" can miss newly added untracked files;
change the flow to stage the two commit-latest.env files first (git add
manifests/odh/base/commit-latest.env manifests/rhoai/base/commit-latest.env) and
then test staged changes with "git diff --cached --quiet --exit-code" (or "! git
diff --cached --quiet --exit-code" in the if condition) before running git
commit and git push; update the conditional around the staging/commit logic so
new files are detected reliably by referencing the existing manifest file names
and the git add/git diff --cached commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 96-100: The current re.sub(r"-n$", "-commit-n", var) silently
leaves var unchanged when it doesn't end with "-n"; update the logic around the
lines.append call to validate var first (e.g., check var.endswith("-n") or use
re.search(r"-n$") on var) and if it doesn't match, call log.error (include var
and url) and exit or skip with a warning, then only append (modified_var,
vcs[:7]) when the validation passes so you avoid producing malformed output;
refer to the variables/functions vcs_by_url, vcs, var, lines.append and the
re.sub usage to locate where to add the check and error handling.
- Around line 53-54: resolve_manifest_dir currently resolves a user-supplied
path against PROJECT_ROOT but does not validate that the resolved path remains
inside PROJECT_ROOT, allowing path traversal; update resolve_manifest_dir to
resolve the candidate (use .resolve(strict=False) if needed) and then verify the
resulting path is inside PROJECT_ROOT (e.g., using
Path.is_relative_to(PROJECT_ROOT) or comparing parents via a loop) and raise an
exception (ValueError) if it is not; ensure the function still returns the
resolved path when validation passes.

---

Outside diff comments:
In @.github/workflows/update-commit-latest-env.yaml:
- Around line 6-7: The cron comment for the schedule block is wrong: the
existing schedule: - cron: '0 11 * * *' runs at 11:00 UTC, not midnight; update
either the cron expression to '0 0 * * *' if you want daily midnight UTC, or
change the comment to "Daily at 11:00 UTC" to match the current cron, making the
edit next to the schedule -> cron: '0 11 * * *' entry.

---

Nitpick comments:
In @.github/workflows/update-commit-latest-env.yaml:
- Around line 46-54: The current check using "git diff --quiet" can miss newly
added untracked files; change the flow to stage the two commit-latest.env files
first (git add manifests/odh/base/commit-latest.env
manifests/rhoai/base/commit-latest.env) and then test staged changes with "git
diff --cached --quiet --exit-code" (or "! git diff --cached --quiet --exit-code"
in the if condition) before running git commit and git push; update the
conditional around the staging/commit logic so new files are detected reliably
by referencing the existing manifest file names and the git add/git diff
--cached commands.

In `@scripts/update-commit-latest-env.py`:
- Line 174: Replace the hard-coded semaphore = asyncio.Semaphore(22) with a
named constant or configurable parameter: introduce a constant like
MAX_CONCURRENT_TASKS (or add a CLI/ENV argument parsed in main) and use
asyncio.Semaphore(MAX_CONCURRENT_TASKS) so the concurrency limit is clearly
named and can be tuned; update any help text or defaults where the script parses
arguments and use that symbol in place of the magic number in
scripts/update-commit-latest-env.py.
- Around line 57-65: The function load_params_env currently opens the given
pathlib.Path directly and will raise a raw FileNotFoundError; add an explicit
existence check at the top of load_params_env (use path.exists() and/or
path.is_file()) and raise a FileNotFoundError (or ValueError) with a clear,
descriptive message that includes the path value if the file is missing or not a
regular file, so callers get a readable error instead of an opaque traceback.
- Around line 78-82: The script mixes exit styles (sys.exit(1) and raise
SystemExit(1)); standardize to sys.exit(1) throughout: replace the raise
SystemExit(1) occurrence with sys.exit(1) so all exits use sys.exit, ensuring
the sys module is already imported and leaving existing sys.exit(1) calls (e.g.,
in the blocks referencing odh_by_var[var] and dummy checks) unchanged; update
every instance (around the formerly raise SystemExit usage and the other noted
exit calls) to use sys.exit(1) for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: b9eb554d-0fbe-4fbd-ba63-89960ae7d50a

📥 Commits

Reviewing files that changed from the base of the PR and between 715a38f and a7acbb9.

📒 Files selected for processing (4)
  • .github/workflows/update-commit-latest-env.yaml
  • manifests/odh/base/commit-latest.env
  • manifests/rhoai/base/commit-latest.env
  • scripts/update-commit-latest-env.py

Comment thread scripts/update-commit-latest-env.py Outdated
Comment thread scripts/update-commit-latest-env.py Outdated
Comment thread scripts/update-commit-latest-env.py Outdated
Comment on lines +68 to +72
def resolve_rhoai_urls(
rhoai_pairs: list[tuple[str, str]],
odh_by_var: dict[str, str],
) -> list[tuple[str, str]]:
"""Replace RHOAI dummy placeholders with the same variable's image URL from ODH params."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what? @coderabbitai how can this possibly produce the right answer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me look at the actual env files and the failing tests to understand this fully.


🏁 Script executed:

#!/bin/bash
# Check variable naming in params-latest.env
echo "=== ODH params-latest.env ==="
head -30 manifests/odh/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
echo "=== RHOAI params-latest.env ==="
head -30 manifests/rhoai/base/params-latest.env 2>/dev/null || echo "NOT FOUND"
echo ""
# Look at failing tests
echo "=== test_image_manifests_version_alignment ==="
rg -n "test_image_manifests_version_alignment" --type py -A 40
echo ""
echo "=== test_image_pyprojects ==="
rg -n "test_image_pyprojects" --type py -A 20
echo ""
# Check how params-latest and commit-latest are related in tests
echo "=== conftest / fixtures ==="
rg -n "manifest_directory" --type py -A 5 -B 5 | head -100

Repository: opendatahub-io/notebooks
Length of output: 10530


🏁 Script executed:

#!/bin/bash
echo "=== conftest.py manifests_directory fixture ==="
rg -n "manifests_directory" --type py -A 10 -B 2

echo ""
echo "=== load_manifests_file_for ==="
rg -n "load_manifests_file_for" --type py -A 30 -B 2

echo ""
echo "=== Manifest class ==="
rg -n "class Manifest" --type py -A 20

echo ""
echo "=== old manifests/base directory ==="
ls manifests/base/ 2>/dev/null || echo "NOT FOUND (old path)"
ls manifests/odh/base/ 2>/dev/null
ls manifests/rhoai/base/ 2>/dev/null

Repository: opendatahub-io/notebooks
Length of output: 22984


🏁 Script executed:

#!/bin/bash
echo "=== MANIFESTS_ODH_DIR and MANIFESTS_RHOAI_DIR definitions ==="
rg -n "MANIFESTS_ODH_DIR\|MANIFESTS_RHOAI_DIR" --type py -A 2 -B 2

echo ""
echo "=== workflow file ==="
cat .github/workflows/update-commit-latest-env.yaml 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== Check if manifests/odh/base/base exists ==="
ls manifests/odh/base/base/ 2>/dev/null || echo "manifests/odh/base/base/ does NOT exist"
ls manifests/rhoai/base/base/ 2>/dev/null || echo "manifests/rhoai/base/base/ does NOT exist"

Repository: opendatahub-io/notebooks
Length of output: 1877


🏁 Script executed:

#!/bin/bash
echo "=== Search for MANIFESTS constants in tests/manifests.py ==="
cat tests/manifests.py | head -60

echo ""
echo "=== Explicit grep for MANIFESTS dir constants ==="
grep -n "MANIFESTS" tests/manifests.py | head -30

echo ""
echo "=== Check rstudio-notebook-imagestream.yaml in rhoai ==="
ls manifests/rhoai/base/rstudio-notebook-imagestream.yaml 2>/dev/null || echo "MISSING in rhoai"
ls manifests/odh/base/rstudio-notebook-imagestream.yaml 2>/dev/null || echo "MISSING in odh"

Repository: opendatahub-io/notebooks
Length of output: 4324


@jiridanek — here's a full trace of what resolve_rhoai_urls does and why it produces the correct result:

The design: manifests/rhoai/base/params-latest.env uses dummy for every single one of its 18 entries (confirmed from the actual file). The variable names are identical to ODH's. resolve_rhoai_urls substitutes each dummy with the real image URL from ODH params-latest.env by matching on variable name:

RHOAI: odh-workbench-jupyter-minimal-cpu-py312-ubi9-n = dummy
ODH:   odh-workbench-jupyter-minimal-cpu-py312-ubi9-n = quay.io/opendatahub/odh-workbench-jupyter-minimal-cpu-py312-ubi9:3.4-v1.43

After substitution, RHOAI's resolved pairs are the ODH image URLs → skopeo inspect is called on those to get vcs-ref → RHOAI's commit-latest.env ends up with the same commit hashes as ODH's. This matches the PR's actual diff: both odh/base/commit-latest.env and rhoai/base/commit-latest.env were updated to the exact same hashes (7f2342d, 335acea, c2e4c8a, etc.).

So it does produce the right answer — the premise is that RHOAI and ODH are built from the same git source, so ODH image vcs-ref values are authoritative for both. The dummy convention is an explicit declaration of that dependency.


However, the 2 failing test_image_manifests_version_alignment tests appear to be a separate, unrelated issue that this PR has exposed:

  • manifests/odh/base/rstudio-notebook-imagestream.yaml ✅ exists
  • manifests/rhoai/base/rstudio-notebook-imagestream.yamlmissing

The test is parametrized over both MANIFESTS_ODH_DIR (manifests/odh) and MANIFESTS_RHOAI_DIR (manifests/rhoai), and get_source_of_truth_filepath will try to open manifests/rhoai/base/rstudio-notebook-imagestream.yaml which doesn't exist, raising FileNotFoundError. The RHOAI base/ directory appears to be missing several imagestream files compared to ODH.

Could you clarify — is rstudio-notebook-imagestream.yaml intentionally absent from RHOAI (in which case the test needs a guard), or does it need to be added/symlinked?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the premise is that RHOAI and ODH are built from the same git source, so ODH image vcs-ref values are authoritative for both

@coderabbitai exactly, this is a false premise, RHOAI/rhds images are built from rhoai-3.x branch in red-hat-data-services/notebooks repo, the commit hash will be naturally different

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== RHOAI params-latest.env (full) ==="
cat manifests/rhoai/base/params-latest.env 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ODH params-latest.env (first few lines) ==="
head -10 manifests/odh/base/params-latest.env 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== RHOAI commit-latest.env (from this PR) ==="
cat manifests/rhoai/base/commit-latest.env 2>/dev/null || echo "NOT FOUND"

echo ""
echo "=== ODH commit-latest.env (from this PR) ==="
cat manifests/odh/base/commit-latest.env 2>/dev/null || echo "NOT FOUND"

Repository: opendatahub-io/notebooks
Length of output: 4961


@jiridanek — confirmed. The data makes the problem unambiguous: manifests/rhoai/base/commit-latest.env and manifests/odh/base/commit-latest.env are byte-for-byte identical for all 18 shared entries in this PR's diff. That is the smoking gun.

The bug chain in resolve_rhoai_urls:

  1. RHOAI params-latest.env — every entry is dummy
  2. resolve_rhoai_urls substitutes each dummy with the ODH image URL (e.g. quay.io/opendatahub/odh-workbench-jupyter-minimal-cpu-py312-ubi9:3.4-v1.43)
  3. skopeo inspect runs against those ODH images → returns a vcs-ref pointing to a commit in opendatahub-io/notebooks
  4. That ODH commit hash is written into manifests/rhoai/base/commit-latest.env

So commit-latest.env for RHOAI is tracking ODH source commits, not RHOAI/RHDS source commits from red-hat-data-services/notebooks @ rhoai-3.x.

The root cause is that manifests/rhoai/base/params-latest.env must be populated with actual RHOAI image URLs (the images built from red-hat-data-services/notebooks) rather than dummy. Only then will skopeo inspect return the correct vcs-ref values from the RHOAI build branch. The dummy substitution mechanism, as designed here, can never yield correct results for RHOAI because it always redirects to ODH images.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai manifests/rhoai/base/params-latest.env must remain dummy, it is populated by rhods-operator during openshift deployment, as described at https://github.com/opendatahub-io/notebooks/blob/main/manifests/README.md. dummy values serve to catch when for some reason the substitutions don't happen correctly; it's better to fail the deployment than to deploy the wrong ("fallback") images

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek — understood and noted. That is an important design constraint.

This actually strengthens the bug finding: the resolve_rhoai_urls function, rather than being a valid workaround, is definitively broken for RHOAI. Because:

  1. RHOAI params-latest.env always has dummy values in the repo (by design, as a deployment-safety guard)
  2. resolve_rhoai_urls replaces those dummy values with ODH image URLs from manifests/odh/base/params-latest.env
  3. skopeo inspect then runs against those ODH images → returns vcs-ref from opendatahub-io/notebooks
  4. Those ODH commit hashes land in manifests/rhoai/base/commit-latest.env — which should track commits from red-hat-data-services/notebooks @ rhoai-3.x

The result is that RHOAI's commit-latest.env silently contains the wrong commit hashes. The CI script, as written in the diff, can never produce correct values for RHOAI — it would need the actual RHOAI image URLs (the real digests served by the registry) to be provided at execution time through a different mechanism (e.g., passed in via CI variable or fetched from the registry catalog), rather than falling back to ODH's params-latest.env.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@opendatahub-io opendatahub-io deleted a comment from coderabbitai Bot Apr 8, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from a7acbb9 to f01e078 Compare April 8, 2026 12:59
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
scripts/update-commit-latest-env.py (2)

68-86: ⚠️ Potential issue | 🔴 Critical

resolve_rhoai_urls is writing RHOAI output from ODH images.

With the current manifests, every rhoai entry is dummy, so this function always swaps in the matching ODH URL and inspects that image. The generated manifests/rhoai/base/commit-latest.env therefore records ODH vcs-ref values instead of the RHDS/RHOAI commits. Fail fast on dummy here until a real RHOAI image source exists; reusing ODH URLs cannot produce correct RHOAI SHAs.

Minimal safe guard
 def resolve_rhoai_urls(
     rhoai_pairs: list[tuple[str, str]],
     odh_by_var: dict[str, str],
 ) -> list[tuple[str, str]]:
     """Replace RHOAI dummy placeholders with the same variable's image URL from ODH params."""
     out: list[tuple[str, str]] = []
     for var, raw in rhoai_pairs:
         if raw.strip().lower() == DUMMY_IMAGE:
-            if var not in odh_by_var:
-                log.error("No ODH params-latest entry to resolve RHOAI dummy image", variable=var)
-                sys.exit(1)
-            url = odh_by_var[var]
-            if url.strip().lower() == DUMMY_IMAGE:
-                log.error("ODH params-latest still has dummy for variable", variable=var)
-                sys.exit(1)
-            out.append((var, url))
+            log.error(
+                "RHOAI params-latest.env entry cannot be 'dummy'; provide the real RHOAI image URL",
+                variable=var,
+            )
+            sys.exit(1)
         else:
             out.append((var, raw))
     return out
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 68 - 86, The function
resolve_rhoai_urls currently replaces RHOAI DUMMY_IMAGE entries with ODH URLs
(using rhoai_pairs and odh_by_var), which causes RHOAI commits to be recorded
from ODH; change the behavior to fail-fast when a rhoai entry is DUMMY_IMAGE:
for each (var, raw) in resolve_rhoai_urls, if raw.strip().lower() == DUMMY_IMAGE
log an error including the variable name and exit (do not substitute
odh_by_var[var] or append it to out); only allow non-dummy rhoai values to be
appended to out so RHOAI SHAs come from real RHOAI image sources.

53-54: ⚠️ Potential issue | 🔴 Critical

CWE-22: --odh-dir / --rhoai-dir can escape the checkout and overwrite files outside the repo.

Severity: critical. A caller can pass an absolute path or ../../..., and Line 54 will then read params-latest.env and write commit-latest.env outside PROJECT_ROOT. Canonicalize the candidate first and reject anything that does not stay under the repository root.

Proposed fix
 def resolve_manifest_dir(raw: pathlib.Path) -> pathlib.Path:
-    return raw if raw.is_absolute() else (PROJECT_ROOT / raw).resolve()
+    project_root = PROJECT_ROOT.resolve()
+    candidate = raw if raw.is_absolute() else project_root / raw
+    resolved = candidate.resolve(strict=False)
+    try:
+        resolved.relative_to(project_root)
+    except ValueError:
+        log.error("Manifest directory must stay within the repository", path=str(resolved))
+        sys.exit(1)
+    return resolved

As per coding guidelines, "Validate file paths (prevent path traversal)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 53 - 54, The
resolve_manifest_dir function currently allows absolute or traversal paths to
escape PROJECT_ROOT; change it to canonicalize the candidate path after joining
(use .resolve()) and then verify it is inside PROJECT_ROOT (e.g., using
Path.is_relative_to or try: candidate.relative_to(PROJECT_ROOT) except
ValueError) and raise an exception or reject the input if the resolved path is
not under PROJECT_ROOT; update any callers that use resolve_manifest_dir (e.g.,
where params-latest.env / commit-latest.env are read/written) to handle the
rejection instead of proceeding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 68-86: The function resolve_rhoai_urls currently replaces RHOAI
DUMMY_IMAGE entries with ODH URLs (using rhoai_pairs and odh_by_var), which
causes RHOAI commits to be recorded from ODH; change the behavior to fail-fast
when a rhoai entry is DUMMY_IMAGE: for each (var, raw) in resolve_rhoai_urls, if
raw.strip().lower() == DUMMY_IMAGE log an error including the variable name and
exit (do not substitute odh_by_var[var] or append it to out); only allow
non-dummy rhoai values to be appended to out so RHOAI SHAs come from real RHOAI
image sources.
- Around line 53-54: The resolve_manifest_dir function currently allows absolute
or traversal paths to escape PROJECT_ROOT; change it to canonicalize the
candidate path after joining (use .resolve()) and then verify it is inside
PROJECT_ROOT (e.g., using Path.is_relative_to or try:
candidate.relative_to(PROJECT_ROOT) except ValueError) and raise an exception or
reject the input if the resolved path is not under PROJECT_ROOT; update any
callers that use resolve_manifest_dir (e.g., where params-latest.env /
commit-latest.env are read/written) to handle the rejection instead of
proceeding.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 9e646073-9504-42ec-85a4-e81980939766

📥 Commits

Reviewing files that changed from the base of the PR and between a7acbb9 and f01e078.

📒 Files selected for processing (4)
  • .github/workflows/update-commit-latest-env.yaml
  • manifests/odh/base/commit-latest.env
  • manifests/rhoai/base/commit-latest.env
  • scripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (1)
  • manifests/rhoai/base/commit-latest.env
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/update-commit-latest-env.yaml
  • manifests/odh/base/commit-latest.env

@openshift-ci openshift-ci Bot removed the size/l label Apr 8, 2026
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 8, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from b31a687 to e8af878 Compare April 8, 2026 20:14
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
scripts/update-commit-latest-env.py (2)

174-185: ⚠️ Potential issue | 🟠 Major

Fail fast on malformed commit.env rows.

_load_env_dict() still swallows non-comment lines without =, then _write_env() rewrites the file without them. A typo in commit.env becomes silent data loss on the next run. Mirror _parse_env_lines() here.

Proposed fix
 def _load_env_dict(path: pathlib.Path) -> dict[str, str]:
     d: dict[str, str] = {}
     if not path.is_file():
         return d
-    for line in path.read_text().splitlines():
-        line = line.strip()
+    for lineno, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
+        line = raw_line.strip()
         if not line or line.startswith("#"):
             continue
         if "=" not in line:
-            continue
+            raise ValueError(f"_load_env_dict: {path}:{lineno}: expected KEY=VALUE, got {raw_line!r}")
         k, _, v = line.partition("=")
         d[k.strip()] = v.strip()
     return d

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 174 - 185, The
_load_env_dict function currently ignores non-comment lines missing an "=" which
silently drops malformed rows; update _load_env_dict to fail fast by raising an
explicit ValueError (including the offending line text and the provided path)
when a non-empty, non-comment line does not contain "=", mirroring the behavior
of _parse_env_lines; keep existing trimming and comment/blank skipping logic and
ensure callers like _write_env will no longer rewrite away malformed entries
silently.

189-212: ⚠️ Potential issue | 🟠 Major

--all cannot succeed for rhoai with placeholder URLs.

This function assumes every params-latest.env value is directly inspectable. With the workflow now always running --all, the rhoai branch will hit checked-in dummy placeholders and call skopeo inspect docker://dummy, then return False before commit.env is refreshed. rhoai needs a CI-time source of real image refs, or an explicit skip/fail-fast path before inspection.

As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 189 - 212, In
sync_commit_latest, detect and handle placeholder image URLs before calling
_inspect_pairs: after building pairs from params_latest (using _PARAM_KEY_RE),
filter out entries whose url is a placeholder (e.g., "dummy" or known
placeholder pattern used for rhoai), log a message indicating those keys were
skipped, and if that filtering leaves no remaining inspectable pairs return True
(or otherwise short-circuit appropriately) so rhoai's checked-in placeholders
don't trigger skopeo inspection and fail the run; otherwise call _inspect_pairs
only with the filtered list and proceed as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 322-324: Remove the unreachable
sys.exit(asyncio.run(main_async(variants))) call at module bottom: main()
already calls sys.exit() and variants is local to main, so delete the duplicate
asyncio invocation. Keep only the existing if __name__ == "__main__": main()
entry; do not reintroduce module-level references to variants or call main_async
from top-level. Ensure no other code depends on that removed call.

---

Duplicate comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 174-185: The _load_env_dict function currently ignores non-comment
lines missing an "=" which silently drops malformed rows; update _load_env_dict
to fail fast by raising an explicit ValueError (including the offending line
text and the provided path) when a non-empty, non-comment line does not contain
"=", mirroring the behavior of _parse_env_lines; keep existing trimming and
comment/blank skipping logic and ensure callers like _write_env will no longer
rewrite away malformed entries silently.
- Around line 189-212: In sync_commit_latest, detect and handle placeholder
image URLs before calling _inspect_pairs: after building pairs from
params_latest (using _PARAM_KEY_RE), filter out entries whose url is a
placeholder (e.g., "dummy" or known placeholder pattern used for rhoai), log a
message indicating those keys were skipped, and if that filtering leaves no
remaining inspectable pairs return True (or otherwise short-circuit
appropriately) so rhoai's checked-in placeholders don't trigger skopeo
inspection and fail the run; otherwise call _inspect_pairs only with the
filtered list and proceed as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 6561ae5c-b5c2-4d4f-b670-df70377cf494

📥 Commits

Reviewing files that changed from the base of the PR and between 4f7dc5c and e8af878.

📒 Files selected for processing (2)
  • .github/workflows/update-commit-latest-env.yaml
  • scripts/update-commit-latest-env.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/update-commit-latest-env.yaml

Comment thread scripts/update-commit-latest-env.py Outdated
Comment on lines +322 to +324
if __name__ == "__main__":
main()
sys.exit(asyncio.run(main_async(variants)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import ast
import pathlib

path = next(pathlib.Path(".").rglob("update-commit-latest-env.py"))
tree = ast.parse(path.read_text(encoding="utf-8"))

class Visitor(ast.NodeVisitor):
    def __init__(self):
        self.scope = []

    def visit_FunctionDef(self, node):
        self.scope.append(f"function {node.name}")
        self.generic_visit(node)
        self.scope.pop()

    visit_AsyncFunctionDef = visit_FunctionDef

    def visit_Name(self, node):
        if node.id == "variants":
            scope = self.scope[-1] if self.scope else "module scope"
            print(f"Line {node.lineno}: {type(node.ctx).__name__} in {scope}")

Visitor().visit(tree)
PY

Repository: opendatahub-io/notebooks

Length of output: 267


🏁 Script executed:

#!/bin/bash
# Read the relevant section of the file to understand the context
sed -n '310,330p' scripts/update-commit-latest-env.py

Repository: opendatahub-io/notebooks

Length of output: 415


Remove the unreachable sys.exit(asyncio.run(main_async(variants))) on line 324.

The main() function exits via sys.exit() on line 319 and never returns. The duplicate call on line 324 is unreachable dead code, and if sys.exit() is intercepted in tests, it raises NameError because variants is only defined locally within main() and does not exist in module scope.

Proposed fix
 if __name__ == "__main__":
     main()
-    sys.exit(asyncio.run(main_async(variants)))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if __name__ == "__main__":
main()
sys.exit(asyncio.run(main_async(variants)))
if __name__ == "__main__":
main()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 322 - 324, Remove the
unreachable sys.exit(asyncio.run(main_async(variants))) call at module bottom:
main() already calls sys.exit() and variants is local to main, so delete the
duplicate asyncio invocation. Keep only the existing if __name__ == "__main__":
main() entry; do not reintroduce module-level references to variants or call
main_async from top-level. Ensure no other code depends on that removed call.

Comment thread scripts/update-commit-latest-env.py Outdated
return True


async def sync_commit_released(variant: str) -> bool:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai is this just dupicated async def sync_commit_latest(variant: str) -> bool?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek — Yes, the docstring body of _inspect_pairs is simply duplicated — the same two sentences are pasted twice verbatim:

async def _inspect_pairs(pairs: list[tuple[str, str]]) -> list[tuple[str, str | None]]:
    """
    Orchestrate concurrent inspection of multiple images (param key → digest URL pairs).

    Limits parallelism with a semaphore (same default concurrency as the historical main-branch script).
    Orchestrate concurrent inspection of multiple images (param key → digest URL pairs).         # ← duplicate

    Limits parallelism with a semaphore (same default concurrency as the historical main-branch script).  # ← duplicate
    """

This is a copy-paste artifact; the function itself is not a duplicate of sync_commit_latest — they do different things (_inspect_pairs runs skopeo concurrently, sync_commit_latest orchestrates reading/writing env files).


While reviewing the file I also noticed a separate bug in the __main__ guard at the bottom of the file:

if __name__ == "__main__":
    main()                                   # calls sys.exit() internally → process ends here
    sys.exit(asyncio.run(main_async(variants)))  # dead code; `variants` is also not defined in this scope → NameError

The second sys.exit(...) line is unreachable dead code and would raise a NameError on variants if it were ever reached. It should simply be removed — main() already handles everything including sys.exit.

Comment thread scripts/update-commit-latest-env.py Outdated
if any(commit_hash is None for variable, commit_hash in results):
log.error("Failed to get commit hash for some images. Quitting, please try again to try again, like.")
sys.exit(1)
async def sync_commit_latest(variant: str) -> bool:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai are sync_commit_latest and sync_commit_released duplicates of one another?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek — they are not duplicates, but they are structurally parallel by design. Here's how they differ:

sync_commit_latest sync_commit_released
Input file params-latest.env params.env
Output file commit-latest.env commit.env
Key filter -n suffix only Non--n suffixes (e.g. -1-3)
Missing input Hard error → False Soft skip → True
Write strategy Fresh overwrite Merges into existing entries via _load_env_dict
No applicable keys N/A Skips silently → True

sync_commit_latest handles the nightly/latest image tier (-n tags), while sync_commit_released handles versioned/released images (e.g. -1-3 digests). The merge behavior in sync_commit_released is deliberate — it preserves existing commit.env keys for pipeline images that only appear in params-latest.env.

Two unrelated issues worth flagging while we're here:

  1. Duplicated docstring in _inspect_pairs — the last two sentences in its docstring repeat the first two verbatim (visible in the diff).

  2. Unreachable + broken code at the bottom of the file:

    if __name__ == "__main__":
        main()
        sys.exit(asyncio.run(main_async(variants)))  # unreachable; `variants` not in scope

    The second line can never execute (it follows main() which calls sys.exit), and variants is not defined at module scope — this would raise a NameError if somehow reached.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtchoum1 what's the reason for handling params and params-latest differently, like this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

- 'manifests/base/params-latest.env'
- 'manifests/odh/base/params-latest.env'
- 'manifests/odh/base/params.env'
- 'manifests/rhoai/base/params-latest.env'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtchoum1 as we discussed with @coderabbitai in #3304 (comment), what can the script possibly usefully do with rhoai/base/params-latest.env? It only contains dummy values

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
cat manifests/rhoai/base/params-latest.env
echo "---"
cat manifests/odh/base/params-latest.env 2>/dev/null | head -30

Repository: opendatahub-io/notebooks
Length of output: 3658


🏁 Script executed:

#!/bin/bash
echo "=== rhoai/base/params.env ==="
cat manifests/rhoai/base/params.env 2>/dev/null | head -30
echo "=== rhoai/base/commit-latest.env (if exists) ==="
cat manifests/rhoai/base/commit-latest.env 2>/dev/null | head -10
echo "=== rhoai/base/commit.env (if exists) ==="
cat manifests/rhoai/base/commit.env 2>/dev/null | head -10

Repository: opendatahub-io/notebooks
Length of output: 6360


@jiridanek, the short answer is: nothing useful. Here's what the data shows:

  • manifests/rhoai/base/params-latest.env has all 18 entries set to dummy — there are no real image URLs.
  • When sync_commit_latest("rhoai") runs, it will attempt skopeo inspect dummy for every entry, fail on each, and abort with False. The rhoai/base/commit-latest.env will never get updated by this workflow.

The reason dummy is there is structurally sound: RHOAI doesn't publish nightly/latest images — it only ships versioned releases via registry.redhat.io (which are tracked in params.env, not params-latest.env). So the -n / "latest" concept simply doesn't apply to RHOAI.

Notably though, manifests/rhoai/base/commit-latest.env does exist and contains real commit SHAs (e.g. d3137ca, bff12e2) that match ODH commits — because RHOAI images are built from the same source as ODH. So RHOAI's commit-latest.env is meant to track the ODH source commits.

The fix should be Option B from the earlier discussion: the script should derive RHOAI's commit-latest.env from ODH's params-latest.env (filtered to the keys that exist in rhoai/base/params-latest.env), rather than trying to inspect dummy RHOAI URLs. The rhoai/base/params-latest.env file essentially acts as a key manifest declaring which images are relevant to RHOAI — the actual URLs to inspect are in ODH.

@mtchoum1, does that match the intent? If so, the script's sync_commit_latest for the rhoai variant needs to be rerouted to read image URLs from odh/base/params-latest.env using the keys from rhoai/base/params-latest.env, rather than trying to resolve the dummy values directly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because RHOAI images are built from the same source as ODH. So RHOAI's commit-latest.env is meant to track the ODH source commits

@coderabbitai nope, RHOAI images are built from red-hat-data-services/notebooks#rhoai-3.x branch; has different commit hashes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek, thank you for the correction — I'll update my analysis.

That changes the picture significantly. Since RHOAI images are built from red-hat-data-services/notebooks#rhoai-3.x with different commit hashes, Option B is also wrong: using ODH's params-latest.env URLs would yield ODH commit SHAs, not RHOAI ones. So both options from the earlier discussion are invalid for rhoai/base/commit-latest.env.

The core issue is that there are no publicly accessible nightly RHOAI image URLsrhoai/base/params-latest.env contains only dummy values because RHOAI ships exclusively through versioned releases on registry.redhat.io. The "latest" concept simply doesn't exist for RHOAI's public-facing images.

This means:

  • sync_commit_latest("rhoai") cannot work in any meaningful way — there's nothing to inspect.
  • rhoai/base/commit-latest.env cannot be automatically maintained by this workflow. It should either be skipped entirely or manually maintained.
  • The workflow's path trigger on manifests/rhoai/base/params-latest.env is a no-op since that file will always contain dummy.

What could still be useful: the script updating rhoai/base/commit.env by inspecting the versioned images from rhoai/base/params.env (which has real registry.redhat.io URLs) — but that requires registry authentication via the pull-secret for those private images.

@mtchoum1, given this, the rhoai commit-latest.env path in the script should likely be guarded with an explicit skip, and the path trigger for manifests/rhoai/base/params-latest.env in the workflow removed. Does that match the intended design?


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@@ -27,21 +32,54 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}
persist-credentials: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai it's probably a bit of a security theater, but this can be written without having to use persist-credentials: true. See #3202

Up to you if this is worth implementing. The credentials are still in memory and malicious code, if it ever gets to run in our action, can introspect runner's memory. So maybe it's useless.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

gh pr view 3202 --repo opendatahub-io/notebooks --json body,files | jq -r '.body, "\n---FILES---", (.files[].path)'

Repository: opendatahub-io/notebooks
Length of output: 2348


🏁 Script executed:

gh pr diff 3202 --repo opendatahub-io/notebooks | grep -A 30 -B 5 "persist-credentials\|x-access-token\|push\|token"

Repository: opendatahub-io/notebooks
Length of output: 2405


@jiridanek — you're right that it's a limited mitigation, but the concrete benefit is that the token is never written to .git/config on disk, so code executing in the workspace (e.g. a compromised build tool, a malicious dep pulled during uv run) cannot read it via a simple file read. Memory introspection is a higher bar. Worth doing for consistency with the pattern established in #3202.

The change is small — apply the same x-access-token push pattern:

      - name: Checkout repository
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false   # token never written to .git/config
          # no `token:` here — supplied only at push time below
      - name: Commit and push changes
        run: |
          git config --global user.name 'github-actions[bot]'
          git config --global user.email 'github-actions[bot]@users.noreply.github.com'

          changed=0
          for f in \
            manifests/odh/base/commit-latest.env \
            manifests/odh/base/commit.env \
            manifests/rhoai/base/commit-latest.env \
            manifests/rhoai/base/commit.env; do
            if ! git diff --quiet -- "$f" 2>/dev/null; then
              changed=1
              git add -- "$f"
            fi
          done

          if [[ "$changed" -eq 1 ]]; then
            git commit -m "ci: refresh commit SHAs from image vcs-ref labels"
            # Supply token only at push time — never persisted to .git/config
            git push "https://x-access-token:${GITHUB_TOKEN}@github.com/${{ github.repository }}.git" HEAD
          else
            echo "No effective changes in commit env files, skipping commit."
          fi
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The job-level permissions: contents: write stays as-is — it governs what GITHUB_TOKEN is allowed to do, which is unchanged. This is the same pattern used in #3202 and aligns with the GitHub Security Lab guidance on preventing pwn requests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkout@v6, release notes:

Improved credential security: persist-credentials now stores credentials in a separate file under $RUNNER_TEMP instead of directly in .git/config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiridanek — you're right, and I stand corrected. persist-credentials in v6 now stores credentials in a separate file under $RUNNER_TEMP instead of directly in .git/config, which directly addresses the concern I raised. The token is no longer sitting in a workspace-accessible file where a compromised build tool could trivially read it.

Since the workflow already pins actions/checkout at the SHA corresponding to v6.0.2, the persist-credentials: true + git push pattern here is fine as-is. The x-access-token push workaround I suggested was the right advice for v4/v5, but is redundant with v6's new storage location. Withdrawing that suggestion.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from e8af878 to b883987 Compare April 9, 2026 16:54
@openshift-ci openshift-ci Bot added size/l and removed size/l labels Apr 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/update-commit-latest-env.py`:
- Around line 208-219: The loop collecting pairs from _parse_env_lines is
allowed to produce an empty pairs list which then leads to calling
_inspect_pairs and ultimately overwriting commit-latest.env with zero rows;
change the flow to treat empty pairs as a hard failure: after the loop that
fills pairs (using _PARAM_KEY_RE and suffix "-n"), check if pairs is empty and
raise an exception or exit with a nonzero status (e.g., raise RuntimeError or
call sys.exit(1)) with a clear message indicating no valid -n keys found, and do
not call _inspect_pairs or proceed to write output; this preserves the existing
file on bad input and mirrors sync_commit_released()'s conservative behavior.
- Around line 124-129: Reject non-SHA vcs-ref values by validating the vcs_ref
variable before accepting it: ensure vcs_ref matches the expected commit-hash
pattern (e.g. a 40-character hex SHA) and if it doesn't, log a warning/error and
treat it as missing (set vcs_ref to None or raise) so malformed values like
"main" or truncated SHAs are not written to commit*.env; update the branch
around the existing vcs_ref check (the block that logs "Successfully found
vcs-ref" / "vcs-ref label not found" and returns image_url, vcs_ref) to perform
this validation and only log success/return a vcs_ref when it passes the SHA
pattern.
- Around line 91-102: The subprocess call created in the block using semaphore
and process = await asyncio.create_subprocess_exec (...) must be wrapped with a
timeout and proper cleanup: call asyncio.wait_for on process.communicate() with
a sensible timeout, catch asyncio.TimeoutError, log the timeout (include
image_url), then kill the process (process.kill() or process.terminate()), await
process.wait() to reap it, and ensure you still capture/handle stdout/stderr or
set returncode appropriately before returning; keep the semaphore context (async
with semaphore) intact and add logging around the timeout and cleanup to avoid
permanently burning a semaphore slot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 77292ed0-a53e-48e3-bff9-06c4798ac8d3

📥 Commits

Reviewing files that changed from the base of the PR and between e8af878 and b883987.

📒 Files selected for processing (2)
  • .github/workflows/update-commit-latest-env.yaml
  • scripts/update-commit-latest-env.py
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/update-commit-latest-env.yaml

Comment on lines 91 to 102
try:
async with semaphore:
log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}")
log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url)
# Create an asynchronous subprocess
process = await asyncio.create_subprocess_exec(
*command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
stderr=asyncio.subprocess.PIPE,
)
# Wait for the command to complete and capture output
stdout, stderr = await process.communicate()
returncode = process.returncode
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '88,108p' scripts/update-commit-latest-env.py
printf '\n--- timeout-related matches ---\n'
rg -n "communicate\(|wait_for\(|timeout-minutes" scripts/update-commit-latest-env.py .github/workflows/update-commit-latest-env.yaml

Repository: opendatahub-io/notebooks

Length of output: 1039


Add a timeout and cleanup around the skopeo subprocess to prevent indefinite hangs.

The call to await process.communicate() (line 101) has no timeout. If the skopeo inspect command stalls on a registry or network failure, that semaphore slot is burned permanently, wedging the entire workflow.

Fix
         async with semaphore:
             log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url)
-            process = await asyncio.create_subprocess_exec(
+            process = await asyncio.create_subprocess_exec(
                 *command,
                 stdout=asyncio.subprocess.PIPE,
                 stderr=asyncio.subprocess.PIPE,
             )
-            # Wait for the command to complete and capture output
-            stdout, stderr = await process.communicate()
+            try:
+                stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=120)
+            except asyncio.TimeoutError:
+                process.kill()
+                stdout, stderr = await process.communicate()
+                log.error("Skopeo inspect timed out", image_url=image_url, timeout_seconds=120)
+                return image_url, None
             returncode = process.returncode
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
async with semaphore:
log.info(f"Semaphore acquired, starting skopeo inspect for: {image_url}")
log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url)
# Create an asynchronous subprocess
process = await asyncio.create_subprocess_exec(
*command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
stderr=asyncio.subprocess.PIPE,
)
# Wait for the command to complete and capture output
stdout, stderr = await process.communicate()
returncode = process.returncode
try:
async with semaphore:
log.info("Semaphore acquired, starting skopeo inspect", image_url=image_url)
# Create an asynchronous subprocess
process = await asyncio.create_subprocess_exec(
*command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=120)
except asyncio.TimeoutError:
process.kill()
stdout, stderr = await process.communicate()
log.error("Skopeo inspect timed out", image_url=image_url, timeout_seconds=120)
return image_url, None
returncode = process.returncode
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 91 - 102, The subprocess
call created in the block using semaphore and process = await
asyncio.create_subprocess_exec (...) must be wrapped with a timeout and proper
cleanup: call asyncio.wait_for on process.communicate() with a sensible timeout,
catch asyncio.TimeoutError, log the timeout (include image_url), then kill the
process (process.kill() or process.terminate()), await process.wait() to reap
it, and ensure you still capture/handle stdout/stderr or set returncode
appropriately before returning; keep the semaphore context (async with
semaphore) intact and add logging around the timeout and cleanup to avoid
permanently burning a semaphore slot.

Comment on lines 124 to 129
if vcs_ref:
log.info(f"Successfully found 'vcs-ref' for {image_url}: {vcs_ref}")
log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
else:
log.warning(f"'vcs-ref' label not found for {image_url}.")
log.warning("vcs-ref label not found", image_url=image_url)

return image_url, vcs_ref
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject malformed vcs-ref labels before generating commit files.

Any non-empty label is accepted here. A value like main, unknown, or a truncated SHA would be silently written into commit*.env, even though downstream replacements treat these as commit hashes.

Proposed fix
-        if vcs_ref:
-            log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
-        else:
-            log.warning("vcs-ref label not found", image_url=image_url)
+        if vcs_ref and re.fullmatch(r"[0-9a-fA-F]{7,40}", vcs_ref):
+            log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
+        else:
+            log.warning("Invalid or missing vcs-ref label", image_url=image_url, vcs_ref=vcs_ref)
+            return image_url, None

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if vcs_ref:
log.info(f"Successfully found 'vcs-ref' for {image_url}: {vcs_ref}")
log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
else:
log.warning(f"'vcs-ref' label not found for {image_url}.")
log.warning("vcs-ref label not found", image_url=image_url)
return image_url, vcs_ref
if vcs_ref and re.fullmatch(r"[0-9a-fA-F]{7,40}", vcs_ref):
log.info("Successfully found vcs-ref", image_url=image_url, vcs_ref=vcs_ref)
else:
log.warning("Invalid or missing vcs-ref label", image_url=image_url, vcs_ref=vcs_ref)
return image_url, None
return image_url, vcs_ref
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 124 - 129, Reject non-SHA
vcs-ref values by validating the vcs_ref variable before accepting it: ensure
vcs_ref matches the expected commit-hash pattern (e.g. a 40-character hex SHA)
and if it doesn't, log a warning/error and treat it as missing (set vcs_ref to
None or raise) so malformed values like "main" or truncated SHAs are not written
to commit*.env; update the branch around the existing vcs_ref check (the block
that logs "Successfully found vcs-ref" / "vcs-ref label not found" and returns
image_url, vcs_ref) to perform this validation and only log success/return a
vcs_ref when it passes the SHA pattern.

Comment thread scripts/update-commit-latest-env.py Outdated
Comment on lines +208 to +219
pairs: list[tuple[str, str]] = []
for key, url in _parse_env_lines(params_latest):
m = _PARAM_KEY_RE.match(key)
if not m:
log.warning("Skipping param key (unrecognized pattern)", key=key)
continue
if m.group(2) != "-n":
log.warning("Expected only -n keys in params-latest.env", key=key)
continue
pairs.append((key, url))

results = await _inspect_pairs([(k, u) for k, u in pairs])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail instead of overwriting commit-latest.env with zero rows.

If every row in params-latest.env is skipped by the regex/suffix filters, Line 219 still proceeds and Lines 236-237 replace the existing file with an empty one. sync_commit_released() already preserves the previous output on empty input; this path should fail closed too.

Proposed fix
     for key, url in _parse_env_lines(params_latest):
         m = _PARAM_KEY_RE.match(key)
         if not m:
             log.warning("Skipping param key (unrecognized pattern)", key=key)
             continue
         if m.group(2) != "-n":
             log.warning("Expected only -n keys in params-latest.env", key=key)
             continue
         pairs.append((key, url))
 
-    results = await _inspect_pairs([(k, u) for k, u in pairs])
+    if not pairs:
+        log.error("No valid -n entries found in params-latest.env", path=str(params_latest))
+        return False
+
+    results = await _inspect_pairs(pairs)

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pairs: list[tuple[str, str]] = []
for key, url in _parse_env_lines(params_latest):
m = _PARAM_KEY_RE.match(key)
if not m:
log.warning("Skipping param key (unrecognized pattern)", key=key)
continue
if m.group(2) != "-n":
log.warning("Expected only -n keys in params-latest.env", key=key)
continue
pairs.append((key, url))
results = await _inspect_pairs([(k, u) for k, u in pairs])
pairs: list[tuple[str, str]] = []
for key, url in _parse_env_lines(params_latest):
m = _PARAM_KEY_RE.match(key)
if not m:
log.warning("Skipping param key (unrecognized pattern)", key=key)
continue
if m.group(2) != "-n":
log.warning("Expected only -n keys in params-latest.env", key=key)
continue
pairs.append((key, url))
if not pairs:
log.error("No valid -n entries found in params-latest.env", path=str(params_latest))
return False
results = await _inspect_pairs(pairs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/update-commit-latest-env.py` around lines 208 - 219, The loop
collecting pairs from _parse_env_lines is allowed to produce an empty pairs list
which then leads to calling _inspect_pairs and ultimately overwriting
commit-latest.env with zero rows; change the flow to treat empty pairs as a hard
failure: after the loop that fills pairs (using _PARAM_KEY_RE and suffix "-n"),
check if pairs is empty and raise an exception or exit with a nonzero status
(e.g., raise RuntimeError or call sys.exit(1)) with a clear message indicating
no valid -n keys found, and do not call _inspect_pairs or proceed to write
output; this preserves the existing file on bad input and mirrors
sync_commit_released()'s conservative behavior.

@opendatahub-io opendatahub-io deleted a comment from coderabbitai Bot Apr 9, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from b883987 to 2cc663d Compare April 15, 2026 15:12
@openshift-ci openshift-ci Bot added size/s and removed size/l labels Apr 15, 2026
@mtchoum1 mtchoum1 force-pushed the mtchoum1/commit-action branch from 2cc663d to 8d830f5 Compare April 15, 2026 15:15
@openshift-ci openshift-ci Bot added size/s and removed size/s labels Apr 15, 2026
@mtchoum1 mtchoum1 closed this Apr 15, 2026
@mtchoum1 mtchoum1 deleted the mtchoum1/commit-action branch April 15, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants