ci: introduce renovate to deal with bumping our dependencies#1823
ci: introduce renovate to deal with bumping our dependencies#1823terrykong wants to merge 43 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
[Renovate]: migrate Renovate config
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Renovate can't checkout new submodule commits when .gitmodules has shallow=true, because it fetches with --depth=1 (hardcoded). Workaround: Pre-clone the repo to /tmp/renovate/repos/... with full submodule history BEFORE Renovate runs. Renovate will reuse this existing clone instead of cloning fresh with shallow submodules. This keeps shallow=true in .gitmodules for fast regular clones while allowing Renovate to update submodules. Signed-off-by: Terry Kong <terryk@nvidia.com>
The pre-clone step runs as the GitHub runner user, but Renovate's Docker container runs as a different user. Git refuses to operate on repos owned by different users for security. Fix by creating a .gitconfig that marks the directory as safe and passing it to Renovate via GIT_CONFIG_GLOBAL env var. Signed-off-by: Terry Kong <terryk@nvidia.com>
The /tmp/renovate/cache directory needs to be writable by Renovate's container user. Move gitconfig to /tmp/.gitconfig and use chmod 777 to ensure all directories are accessible. Signed-off-by: Terry Kong <terryk@nvidia.com>
The renovate container runs as a different user than the runner, causing 'dubious ownership' errors. Fix by: 1. Running the container as root with docker-user input 2. Adding GIT_CONFIG_GLOBAL to env-regex so it gets passed to container
Instead of trying to pass GIT_CONFIG_GLOBAL env var (which wasn't working), use a custom command wrapper script that runs 'git config --global --add safe.directory *' before invoking renovate. This properly handles the dubious ownership check for the pre-cloned repository.
…dition Renovate's built-in uv manager runs 'uv lock' BEFORE postUpgradeTasks, causing failures when submodule dependencies aren't synced yet. By disabling the uv manager, we rely solely on postUpgradeTasks which runs: 1. sync_submodule_dependencies.py (syncs CACHED_DEPENDENCIES) 2. uv lock (regenerates lock file) This ensures dependencies are synced before lock file generation.
…el config The 'uv' manager doesn't have a top-level config option. Disable it through packageRules matchManagers instead.
The regex (.*?) was matching the first ] it found, which could be inside a dependency string like 'megatron-core[dev,mlm]'. This caused the script to corrupt the setup.py by inserting content in the wrong place. Now uses proper bracket counting that: - Tracks nested [ ] brackets - Ignores brackets inside string literals - Handles escape sequences
The check was looking for pyproject.toml in all *-workspace/ directories, but some workspace members point to submodule paths (e.g., Automodel-workspace/Automodel) not the parent directory. This caused uv lock to be skipped unnecessarily. Just run uv lock directly - it will fail with a clear error if something is wrong.
…llowlist 1. Change Automodel submodule to track 'main' branch instead of 'nemo-rl-submodule' - The main branch now supports torch<=2.9.0 (was <=2.8.0) - This resolves the torch version conflict 2. Remove torch and ray from Renovate allowlist - Let uv lock resolve compatible versions naturally - Avoids version conflicts between packages 3. Disable Python version updates (.python-version) - Prevents pyenv manager from bumping to latest Python
…o-automodel), disable Python updates in GitHub Actions
TODO: Remove before final merge - upstream fix needed in nemo-automodel and megatron-bridge to align timm versions
- setup.py: strict validation by default, auto-sync when env var is set - renovate.yml: set NRL_AUTO_SYNC_DEPS=1 to allow submodule updates This preserves strict validation for developers while allowing Renovate to update submodules without manual intervention.
This ensures the env var is set inside the Docker container where uv lock actually runs, not just passed through env-regex.
Renovate spawns /bin/sh -c for commands which doesn't inherit env vars from the wrapper script. customEnvVariables passes the env var directly to all commands Renovate runs internally.
- update_docker: Enable Docker base image updates (default: false) - force_run: Ignore schedule restrictions (default: true for manual runs)
📝 WalkthroughWalkthroughIntroduces automated dependency update workflows using Renovate, including configuration, GitHub Actions workflow, synchronization scripts, and documentation. Establishes continuous dependency synchronization across submodules and lock files with optional auto-sync support and comprehensive package rules. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Workflow Scheduler
participant GHA as GitHub Actions
participant Scripts as Setup Scripts
participant Sync as sync_submodule_dependencies.py
participant UV as UV Tool
participant Git as Git Repository
participant Renovate as Renovate
Scheduler->>GHA: Trigger daily at 17:00 UTC
GHA->>GHA: Setup Python 3.12 & UV
GHA->>Scripts: Mark Renovate scripts executable
GHA->>Git: Clone repo with full submodule history
Git-->>GHA: Repository ready
GHA->>Scripts: Run renovate_post_update.sh (if triggered)
Scripts->>Sync: Execute sync_submodule_dependencies.py
Sync->>Git: Read submodule pyproject.toml
Sync->>Git: Update CACHED_DEPENDENCIES in setup.py
Git-->>Sync: Sync complete
Scripts->>UV: Ensure UV installed
UV->>UV: Run uv lock
UV-->>Scripts: Lock file regenerated
Scripts->>Git: Stage changes (setup.py, uv.lock)
Git-->>Scripts: Changes staged
GHA->>Renovate: Run Renovate with config
Renovate->>Git: Create consolidated PR with updates
Git-->>Renovate: PR created
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @.github/RENOVATE_SETUP.md:
- Around line 9-13: The README list of allowed Python dependencies is out of
sync with the actual Renovate config: update the bullet list in
RENOVATE_SETUP.md to match renovate.json by removing torch, ray, and
transformers and leaving only vllm, transformer-engine, and flash-attn; ensure
the surrounding text still states that everything else is frozen unless
explicitly requested and that the documented allowlist matches the renovate.json
entries.
In @.github/scripts/renovate_cmd.sh:
- Around line 1-3: Add the required NVIDIA copyright header to the top of the
shell script .github/scripts/renovate_cmd.sh (above the shebang or immediately
after it if policy requires the shebang first) so the script complies with
coding guidelines; update the file header in renovate_cmd.sh to include the
standard multi-line NVIDIA copyright notice used across the repo.
In @.github/workflows/renovate.yml:
- Around line 74-89: The GitHub Actions step writes to $GITHUB_OUTPUT without
quoting which triggers shellcheck SC2086; update the two redirections in the
conditional that builds RENOVATE_FORCE JSON (the echo lines that append
'force_json=...' to $GITHUB_OUTPUT) to redirect to the quoted variable
("$GITHUB_OUTPUT") to prevent word-splitting; leave the echo payloads unchanged
and only change the redirection target in that block.
🧹 Nitpick comments (5)
pyproject.toml (1)
210-217: Address the TODO before merging.The comment explicitly states "REMOVE timm override before final merge". This temporary hack should either be removed now or tracked with an issue to ensure it doesn't get forgotten.
Would you like me to open an issue to track the removal of this temporary
timmoverride once the upstream alignment is resolved?.github/scripts/sync_submodule_dependencies.py (1)
62-92: Consider triple-quoted string edge case.The
find_list_endfunction handles single and double quotes but doesn't account for triple-quoted strings ("""or'''). While unlikely inCACHED_DEPENDENCIES, this could cause incorrect bracket matching if encountered.Potential enhancement (optional)
If triple-quoted strings are a concern, you could add detection:
# After checking for quote start if char in '"\'': # Check for triple quote if content[i:i+3] in ('"""', "'''"): # Handle triple-quoted string ...However, since
CACHED_DEPENDENCIEStypically contains simple dependency strings, this is likely not needed..github/scripts/renovate_post_update.sh (2)
30-34: Redundant exit-code checks withset -e.With
set -euo pipefailenabled, the script already exits on any non-zero return. The manualif [ $? -ne 0 ]checks never execute because the script exits before reaching them. Either removeset -eand handle errors manually, or use||for custom error messages.Proposed fix using `||` for error messages
# Step 1: Sync submodule dependencies to setup.py files echo "Step 1: Syncing submodule dependencies..." -python3 "$SCRIPT_DIR/sync_submodule_dependencies.py" -if [ $? -ne 0 ]; then - echo "ERROR: Failed to sync submodule dependencies" - exit 1 -fi +python3 "$SCRIPT_DIR/sync_submodule_dependencies.py" || { + echo "ERROR: Failed to sync submodule dependencies" + exit 1 +} echo ""
52-59: Same redundant pattern foruv lock.Proposed fix
# Run uv lock to regenerate the lock file # Note: Workspace members are defined in pyproject.toml [tool.uv.workspace].members # Some members point to submodule paths (e.g., Automodel-workspace/Automodel), not the parent dir -uv lock -if [ $? -ne 0 ]; then - echo "ERROR: uv lock failed" - exit 1 -fi +uv lock || { + echo "ERROR: uv lock failed" + exit 1 +} echo "".github/workflows/renovate.yml (1)
91-123: Pre-clone setup is thorough;chmod -R 777is necessary but worth documenting.The pre-clone logic correctly handles submodule unshallowing and prepares the repository for Renovate's container. The
chmod -R 777on line 123 is required because Renovate runs as a different UID (1000) inside its container. While this is intentional, a brief inline comment explaining the security trade-off (temporary directory, isolated CI environment) would help future maintainers understand why this permissive setting is acceptable here.Optional: Add clarifying comment
# Fix permissions for Renovate container (runs as uid 1000) # Make everything world-writable so the container user can access it + # This is acceptable since /tmp/renovate is ephemeral and scoped to this CI run chmod -R 777 /tmp/renovate
| 2. **Updates a small allowlist of Python dependencies** in `pyproject.toml`: | ||
| - `vllm`, `torch`, and `ray` for the core training stack | ||
| - `transformer-engine` and `flash-attn` for xformers compatibility | ||
| - `transformers` so we can track upstream releases | ||
| - _Everything else is frozen unless explicitly requested._ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat .github/renovate.json | head -100Repository: NVIDIA-NeMo/RL
Length of output: 2409
🏁 Script executed:
rg -n "torch|ray" .github/renovate.json -B 2 -A 2Repository: NVIDIA-NeMo/RL
Length of output: 40
🏁 Script executed:
rg -n "matchPackageNames|matchPackagePatterns" .github/renovate.json -A 10Repository: NVIDIA-NeMo/RL
Length of output: 953
Fix documentation to match actual Renovate configuration.
The documentation incorrectly lists the allowlist. The actual renovate.json only allows updates for vllm, transformer-engine, and flash-attn. The mentioned packages torch, ray, and transformers are not in the allowlist and should be removed from the documentation (lines 10-13).
🤖 Prompt for AI Agents
In @.github/RENOVATE_SETUP.md around lines 9 - 13, The README list of allowed
Python dependencies is out of sync with the actual Renovate config: update the
bullet list in RENOVATE_SETUP.md to match renovate.json by removing torch, ray,
and transformers and leaving only vllm, transformer-engine, and flash-attn;
ensure the surrounding text still states that everything else is frozen unless
explicitly requested and that the documented allowlist matches the renovate.json
entries.
| #!/bin/bash | ||
| # Custom Renovate command that configures git safe.directory before running renovate | ||
| # This is needed because the pre-cloned repo is owned by a different user than the container user |
There was a problem hiding this comment.
Add the NVIDIA copyright header.
As per coding guidelines, shell scripts (excluding tests) require the NVIDIA copyright header at the top.
Proposed fix
#!/bin/bash
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
# Custom Renovate command that configures git safe.directory before running renovate
# This is needed because the pre-cloned repo is owned by a different user than the container user📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/bin/bash | |
| # Custom Renovate command that configures git safe.directory before running renovate | |
| # This is needed because the pre-cloned repo is owned by a different user than the container user | |
| #!/bin/bash | |
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| # Custom Renovate command that configures git safe.directory before running renovate | |
| # This is needed because the pre-cloned repo is owned by a different user than the container user |
🤖 Prompt for AI Agents
In @.github/scripts/renovate_cmd.sh around lines 1 - 3, Add the required NVIDIA
copyright header to the top of the shell script .github/scripts/renovate_cmd.sh
(above the shebang or immediately after it if policy requires the shebang first)
so the script complies with coding guidelines; update the file header in
renovate_cmd.sh to include the standard multi-line NVIDIA copyright notice used
across the repo.
| run: | | ||
| # Enable Docker updates if requested (modify config temporarily) | ||
| if [ "${{ inputs.update_docker }}" = "true" ]; then | ||
| echo "Enabling Docker base image updates..." | ||
| jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \ | ||
| .github/renovate.json > .github/renovate.json.tmp && \ | ||
| mv .github/renovate.json.tmp .github/renovate.json | ||
| fi | ||
|
|
||
| # Build RENOVATE_FORCE JSON if force_run is enabled | ||
| if [ "${{ inputs.force_run }}" = "true" ]; then | ||
| echo "Force run enabled - ignoring schedule restrictions" | ||
| echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT | ||
| else | ||
| echo 'force_json=' >> $GITHUB_OUTPUT | ||
| fi |
There was a problem hiding this comment.
Quote $GITHUB_OUTPUT to satisfy shellcheck SC2086.
The static analysis tool flagged unquoted variable expansion on lines 86 and 88. While $GITHUB_OUTPUT is set by GitHub Actions and unlikely to contain spaces, quoting it is a best practice to prevent potential word splitting issues.
Proposed fix
if [ "${{ inputs.force_run }}" = "true" ]; then
echo "Force run enabled - ignoring schedule restrictions"
- echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT
+ echo 'force_json={"schedule":[]}' >> "$GITHUB_OUTPUT"
else
- echo 'force_json=' >> $GITHUB_OUTPUT
+ echo 'force_json=' >> "$GITHUB_OUTPUT"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run: | | |
| # Enable Docker updates if requested (modify config temporarily) | |
| if [ "${{ inputs.update_docker }}" = "true" ]; then | |
| echo "Enabling Docker base image updates..." | |
| jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \ | |
| .github/renovate.json > .github/renovate.json.tmp && \ | |
| mv .github/renovate.json.tmp .github/renovate.json | |
| fi | |
| # Build RENOVATE_FORCE JSON if force_run is enabled | |
| if [ "${{ inputs.force_run }}" = "true" ]; then | |
| echo "Force run enabled - ignoring schedule restrictions" | |
| echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT | |
| else | |
| echo 'force_json=' >> $GITHUB_OUTPUT | |
| fi | |
| run: | | |
| # Enable Docker updates if requested (modify config temporarily) | |
| if [ "${{ inputs.update_docker }}" = "true" ]; then | |
| echo "Enabling Docker base image updates..." | |
| jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \ | |
| .github/renovate.json > .github/renovate.json.tmp && \ | |
| mv .github/renovate.json.tmp .github/renovate.json | |
| fi | |
| # Build RENOVATE_FORCE JSON if force_run is enabled | |
| if [ "${{ inputs.force_run }}" = "true" ]; then | |
| echo "Force run enabled - ignoring schedule restrictions" | |
| echo 'force_json={"schedule":[]}' >> "$GITHUB_OUTPUT" | |
| else | |
| echo 'force_json=' >> "$GITHUB_OUTPUT" | |
| fi |
🧰 Tools
🪛 actionlint (1.7.10)
74-74: shellcheck reported issue in this script: SC2086:info:12:40: Double quote to prevent globbing and word splitting
(shellcheck)
74-74: shellcheck reported issue in this script: SC2086:info:14:25: Double quote to prevent globbing and word splitting
(shellcheck)
🤖 Prompt for AI Agents
In @.github/workflows/renovate.yml around lines 74 - 89, The GitHub Actions step
writes to $GITHUB_OUTPUT without quoting which triggers shellcheck SC2086;
update the two redirections in the conditional that builds RENOVATE_FORCE JSON
(the echo lines that append 'force_json=...' to $GITHUB_OUTPUT) to redirect to
the quoted variable ("$GITHUB_OUTPUT") to prevent word-splitting; leave the echo
payloads unchanged and only change the redirection target in that block.
What does this PR do ?
See example: terrykong#18
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Chores
Documentation
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.