Skip to content

ci: introduce renovate to deal with bumping our dependencies#1823

Draft
terrykong wants to merge 43 commits intoNVIDIA-NeMo:mainfrom
terrykong:main
Draft

ci: introduce renovate to deal with bumping our dependencies#1823
terrykong wants to merge 43 commits intoNVIDIA-NeMo:mainfrom
terrykong:main

Conversation

@terrykong
Copy link
Collaborator

@terrykong terrykong commented Jan 23, 2026

What does this PR do ?

See example: terrykong#18

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Chores

    • Set up automated dependency management using Renovate with daily scheduled checks
    • Configured consolidated dependency update PRs with automatic lock file regeneration
    • Added dependency synchronization across submodules with optional automatic sync mode
    • Implemented GitHub Actions workflow for managing dependency updates
  • Documentation

    • Added setup guide documenting Renovate configuration, authentication, and troubleshooting
  • Refactor

    • Updated Torch dependency constraint to allow flexibility in version selection
    • Added temporary override for build tooling dependency

✏️ Tip: You can customize this high-level summary in your review settings.

terrykong and others added 30 commits November 20, 2025 00:10
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
[Renovate]: migrate Renovate config
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Renovate can't checkout new submodule commits when .gitmodules has
shallow=true, because it fetches with --depth=1 (hardcoded).

Workaround: Pre-clone the repo to /tmp/renovate/repos/... with full
submodule history BEFORE Renovate runs. Renovate will reuse this
existing clone instead of cloning fresh with shallow submodules.

This keeps shallow=true in .gitmodules for fast regular clones while
allowing Renovate to update submodules.

Signed-off-by: Terry Kong <terryk@nvidia.com>
The pre-clone step runs as the GitHub runner user, but Renovate's
Docker container runs as a different user. Git refuses to operate
on repos owned by different users for security.

Fix by creating a .gitconfig that marks the directory as safe and
passing it to Renovate via GIT_CONFIG_GLOBAL env var.

Signed-off-by: Terry Kong <terryk@nvidia.com>
The /tmp/renovate/cache directory needs to be writable by Renovate's
container user. Move gitconfig to /tmp/.gitconfig and use chmod 777
to ensure all directories are accessible.

Signed-off-by: Terry Kong <terryk@nvidia.com>
The renovate container runs as a different user than the runner, causing
'dubious ownership' errors. Fix by:
1. Running the container as root with docker-user input
2. Adding GIT_CONFIG_GLOBAL to env-regex so it gets passed to container
Instead of trying to pass GIT_CONFIG_GLOBAL env var (which wasn't working),
use a custom command wrapper script that runs 'git config --global --add
safe.directory *' before invoking renovate. This properly handles the
dubious ownership check for the pre-cloned repository.
…dition

Renovate's built-in uv manager runs 'uv lock' BEFORE postUpgradeTasks,
causing failures when submodule dependencies aren't synced yet. By
disabling the uv manager, we rely solely on postUpgradeTasks which runs:
1. sync_submodule_dependencies.py (syncs CACHED_DEPENDENCIES)
2. uv lock (regenerates lock file)

This ensures dependencies are synced before lock file generation.
…el config

The 'uv' manager doesn't have a top-level config option. Disable it through
packageRules matchManagers instead.
The regex (.*?) was matching the first ] it found, which could be inside
a dependency string like 'megatron-core[dev,mlm]'. This caused the script
to corrupt the setup.py by inserting content in the wrong place.

Now uses proper bracket counting that:
- Tracks nested [ ] brackets
- Ignores brackets inside string literals
- Handles escape sequences
The check was looking for pyproject.toml in all *-workspace/ directories,
but some workspace members point to submodule paths (e.g., Automodel-workspace/Automodel)
not the parent directory. This caused uv lock to be skipped unnecessarily.

Just run uv lock directly - it will fail with a clear error if something is wrong.
…llowlist

1. Change Automodel submodule to track 'main' branch instead of 'nemo-rl-submodule'
   - The main branch now supports torch<=2.9.0 (was <=2.8.0)
   - This resolves the torch version conflict

2. Remove torch and ray from Renovate allowlist
   - Let uv lock resolve compatible versions naturally
   - Avoids version conflicts between packages

3. Disable Python version updates (.python-version)
   - Prevents pyenv manager from bumping to latest Python
…o-automodel), disable Python updates in GitHub Actions
TODO: Remove before final merge - upstream fix needed in nemo-automodel and megatron-bridge to align timm versions
- setup.py: strict validation by default, auto-sync when env var is set
- renovate.yml: set NRL_AUTO_SYNC_DEPS=1 to allow submodule updates

This preserves strict validation for developers while allowing Renovate
to update submodules without manual intervention.
This ensures the env var is set inside the Docker container where
uv lock actually runs, not just passed through env-regex.
Renovate spawns /bin/sh -c for commands which doesn't inherit env vars
from the wrapper script. customEnvVariables passes the env var directly
to all commands Renovate runs internally.
- update_docker: Enable Docker base image updates (default: false)
- force_run: Ignore schedule restrictions (default: true for manual runs)
@terrykong terrykong requested review from a team as code owners January 23, 2026 18:46
@github-actions github-actions bot added the CI Relating to CI label Jan 23, 2026
@terrykong terrykong marked this pull request as draft January 23, 2026 18:46
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Introduces automated dependency update workflows using Renovate, including configuration, GitHub Actions workflow, synchronization scripts, and documentation. Establishes continuous dependency synchronization across submodules and lock files with optional auto-sync support and comprehensive package rules.

Changes

Cohort / File(s) Change Summary
Renovate Documentation
.github/RENOVATE_SETUP.md
New file documenting Renovate configuration, setup options (PAT/GitHub App), authentication, workflow usage, update strategy, troubleshooting, and customization guidance.
Renovate Configuration
.github/renovate.json
New configuration defining dependency update automation with submodule cloning, environment variables, post-upgrade hooks, packageRules (disable by default with allowlist for vllm/transformer-engine/flash-attn), single consolidated PR strategy, and git submodule support.
Renovate Workflow & Scripts
.github/workflows/renovate.yml, .github/scripts/renovate_cmd.sh, .github/scripts/renovate_post_update.sh, .github/scripts/sync_submodule_dependencies.py
New GitHub Actions workflow with daily scheduling and manual trigger; helper scripts for git config, environment setup, and Python-based dependency synchronization across Megatron-Bridge, Penguin, and Megatron-LM submodules with pyproject.toml parsing and CACHED_DEPENDENCIES list updates.
Submodule & Configuration Updates
.gitmodules, .pre-commit-config.yaml, 3rdparty/Megatron-Bridge-workspace/setup.py, 3rdparty/Automodel-workspace/Automodel
Branch reference update for Automodel submodule (nemo-rl-submodule → main); new JSON pre-commit hook for formatting; Megatron-Bridge setup.py now supports environment-variable-controlled auto-sync of CACHED_DEPENDENCIES; Automodel submodule pointer update.
Dependency Specification
pyproject.toml
Relaxed torch version constraint from "torch==2.8.0" to "torch"; added timm==1.0.16 override dependency with temporary hack note.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Workflow Scheduler
    participant GHA as GitHub Actions
    participant Scripts as Setup Scripts
    participant Sync as sync_submodule_dependencies.py
    participant UV as UV Tool
    participant Git as Git Repository
    participant Renovate as Renovate

    Scheduler->>GHA: Trigger daily at 17:00 UTC
    GHA->>GHA: Setup Python 3.12 & UV
    GHA->>Scripts: Mark Renovate scripts executable
    GHA->>Git: Clone repo with full submodule history
    Git-->>GHA: Repository ready
    GHA->>Scripts: Run renovate_post_update.sh (if triggered)
    Scripts->>Sync: Execute sync_submodule_dependencies.py
    Sync->>Git: Read submodule pyproject.toml
    Sync->>Git: Update CACHED_DEPENDENCIES in setup.py
    Git-->>Sync: Sync complete
    Scripts->>UV: Ensure UV installed
    UV->>UV: Run uv lock
    UV-->>Scripts: Lock file regenerated
    Scripts->>Git: Stage changes (setup.py, uv.lock)
    Git-->>Scripts: Changes staged
    GHA->>Renovate: Run Renovate with config
    Renovate->>Git: Create consolidated PR with updates
    Git-->>Renovate: PR created
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

CI, CI:L1, documentation

Suggested reviewers

  • parthchadha
  • chtruong814
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major changes without test results or testing documentation, with incomplete description and unresolved review comments indicating insufficient validation. Complete PR description with testing evidence, address review comments regarding headers and documentation, and re-test to confirm all issues are resolved.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: introducing Renovate for automated dependency version bumping across the repository.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @.github/RENOVATE_SETUP.md:
- Around line 9-13: The README list of allowed Python dependencies is out of
sync with the actual Renovate config: update the bullet list in
RENOVATE_SETUP.md to match renovate.json by removing torch, ray, and
transformers and leaving only vllm, transformer-engine, and flash-attn; ensure
the surrounding text still states that everything else is frozen unless
explicitly requested and that the documented allowlist matches the renovate.json
entries.

In @.github/scripts/renovate_cmd.sh:
- Around line 1-3: Add the required NVIDIA copyright header to the top of the
shell script .github/scripts/renovate_cmd.sh (above the shebang or immediately
after it if policy requires the shebang first) so the script complies with
coding guidelines; update the file header in renovate_cmd.sh to include the
standard multi-line NVIDIA copyright notice used across the repo.

In @.github/workflows/renovate.yml:
- Around line 74-89: The GitHub Actions step writes to $GITHUB_OUTPUT without
quoting which triggers shellcheck SC2086; update the two redirections in the
conditional that builds RENOVATE_FORCE JSON (the echo lines that append
'force_json=...' to $GITHUB_OUTPUT) to redirect to the quoted variable
("$GITHUB_OUTPUT") to prevent word-splitting; leave the echo payloads unchanged
and only change the redirection target in that block.
🧹 Nitpick comments (5)
pyproject.toml (1)

210-217: Address the TODO before merging.

The comment explicitly states "REMOVE timm override before final merge". This temporary hack should either be removed now or tracked with an issue to ensure it doesn't get forgotten.

Would you like me to open an issue to track the removal of this temporary timm override once the upstream alignment is resolved?

.github/scripts/sync_submodule_dependencies.py (1)

62-92: Consider triple-quoted string edge case.

The find_list_end function handles single and double quotes but doesn't account for triple-quoted strings (""" or '''). While unlikely in CACHED_DEPENDENCIES, this could cause incorrect bracket matching if encountered.

Potential enhancement (optional)

If triple-quoted strings are a concern, you could add detection:

# After checking for quote start
if char in '"\'':
    # Check for triple quote
    if content[i:i+3] in ('"""', "'''"):
        # Handle triple-quoted string
        ...

However, since CACHED_DEPENDENCIES typically contains simple dependency strings, this is likely not needed.

.github/scripts/renovate_post_update.sh (2)

30-34: Redundant exit-code checks with set -e.

With set -euo pipefail enabled, the script already exits on any non-zero return. The manual if [ $? -ne 0 ] checks never execute because the script exits before reaching them. Either remove set -e and handle errors manually, or use || for custom error messages.

Proposed fix using `||` for error messages
 # Step 1: Sync submodule dependencies to setup.py files
 echo "Step 1: Syncing submodule dependencies..."
-python3 "$SCRIPT_DIR/sync_submodule_dependencies.py"
-if [ $? -ne 0 ]; then
-    echo "ERROR: Failed to sync submodule dependencies"
-    exit 1
-fi
+python3 "$SCRIPT_DIR/sync_submodule_dependencies.py" || {
+    echo "ERROR: Failed to sync submodule dependencies"
+    exit 1
+}
 echo ""

52-59: Same redundant pattern for uv lock.

Proposed fix
 # Run uv lock to regenerate the lock file
 # Note: Workspace members are defined in pyproject.toml [tool.uv.workspace].members
 # Some members point to submodule paths (e.g., Automodel-workspace/Automodel), not the parent dir
-uv lock
-if [ $? -ne 0 ]; then
-    echo "ERROR: uv lock failed"
-    exit 1
-fi
+uv lock || {
+    echo "ERROR: uv lock failed"
+    exit 1
+}
 echo ""
.github/workflows/renovate.yml (1)

91-123: Pre-clone setup is thorough; chmod -R 777 is necessary but worth documenting.

The pre-clone logic correctly handles submodule unshallowing and prepares the repository for Renovate's container. The chmod -R 777 on line 123 is required because Renovate runs as a different UID (1000) inside its container. While this is intentional, a brief inline comment explaining the security trade-off (temporary directory, isolated CI environment) would help future maintainers understand why this permissive setting is acceptable here.

Optional: Add clarifying comment
           # Fix permissions for Renovate container (runs as uid 1000)
           # Make everything world-writable so the container user can access it
+          # This is acceptable since /tmp/renovate is ephemeral and scoped to this CI run
           chmod -R 777 /tmp/renovate

Comment on lines +9 to +13
2. **Updates a small allowlist of Python dependencies** in `pyproject.toml`:
- `vllm`, `torch`, and `ray` for the core training stack
- `transformer-engine` and `flash-attn` for xformers compatibility
- `transformers` so we can track upstream releases
- _Everything else is frozen unless explicitly requested._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat .github/renovate.json | head -100

Repository: NVIDIA-NeMo/RL

Length of output: 2409


🏁 Script executed:

rg -n "torch|ray" .github/renovate.json -B 2 -A 2

Repository: NVIDIA-NeMo/RL

Length of output: 40


🏁 Script executed:

rg -n "matchPackageNames|matchPackagePatterns" .github/renovate.json -A 10

Repository: NVIDIA-NeMo/RL

Length of output: 953


Fix documentation to match actual Renovate configuration.

The documentation incorrectly lists the allowlist. The actual renovate.json only allows updates for vllm, transformer-engine, and flash-attn. The mentioned packages torch, ray, and transformers are not in the allowlist and should be removed from the documentation (lines 10-13).

🤖 Prompt for AI Agents
In @.github/RENOVATE_SETUP.md around lines 9 - 13, The README list of allowed
Python dependencies is out of sync with the actual Renovate config: update the
bullet list in RENOVATE_SETUP.md to match renovate.json by removing torch, ray,
and transformers and leaving only vllm, transformer-engine, and flash-attn;
ensure the surrounding text still states that everything else is frozen unless
explicitly requested and that the documented allowlist matches the renovate.json
entries.

Comment on lines +1 to +3
#!/bin/bash
# Custom Renovate command that configures git safe.directory before running renovate
# This is needed because the pre-cloned repo is owned by a different user than the container user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the NVIDIA copyright header.

As per coding guidelines, shell scripts (excluding tests) require the NVIDIA copyright header at the top.

Proposed fix
 #!/bin/bash
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # Custom Renovate command that configures git safe.directory before running renovate
 # This is needed because the pre-cloned repo is owned by a different user than the container user
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/bin/bash
# Custom Renovate command that configures git safe.directory before running renovate
# This is needed because the pre-cloned repo is owned by a different user than the container user
#!/bin/bash
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Custom Renovate command that configures git safe.directory before running renovate
# This is needed because the pre-cloned repo is owned by a different user than the container user
🤖 Prompt for AI Agents
In @.github/scripts/renovate_cmd.sh around lines 1 - 3, Add the required NVIDIA
copyright header to the top of the shell script .github/scripts/renovate_cmd.sh
(above the shebang or immediately after it if policy requires the shebang first)
so the script complies with coding guidelines; update the file header in
renovate_cmd.sh to include the standard multi-line NVIDIA copyright notice used
across the repo.

Comment on lines +74 to +89
run: |
# Enable Docker updates if requested (modify config temporarily)
if [ "${{ inputs.update_docker }}" = "true" ]; then
echo "Enabling Docker base image updates..."
jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \
.github/renovate.json > .github/renovate.json.tmp && \
mv .github/renovate.json.tmp .github/renovate.json
fi

# Build RENOVATE_FORCE JSON if force_run is enabled
if [ "${{ inputs.force_run }}" = "true" ]; then
echo "Force run enabled - ignoring schedule restrictions"
echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT
else
echo 'force_json=' >> $GITHUB_OUTPUT
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote $GITHUB_OUTPUT to satisfy shellcheck SC2086.

The static analysis tool flagged unquoted variable expansion on lines 86 and 88. While $GITHUB_OUTPUT is set by GitHub Actions and unlikely to contain spaces, quoting it is a best practice to prevent potential word splitting issues.

Proposed fix
           if [ "${{ inputs.force_run }}" = "true" ]; then
             echo "Force run enabled - ignoring schedule restrictions"
-            echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT
+            echo 'force_json={"schedule":[]}' >> "$GITHUB_OUTPUT"
           else
-            echo 'force_json=' >> $GITHUB_OUTPUT
+            echo 'force_json=' >> "$GITHUB_OUTPUT"
           fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: |
# Enable Docker updates if requested (modify config temporarily)
if [ "${{ inputs.update_docker }}" = "true" ]; then
echo "Enabling Docker base image updates..."
jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \
.github/renovate.json > .github/renovate.json.tmp && \
mv .github/renovate.json.tmp .github/renovate.json
fi
# Build RENOVATE_FORCE JSON if force_run is enabled
if [ "${{ inputs.force_run }}" = "true" ]; then
echo "Force run enabled - ignoring schedule restrictions"
echo 'force_json={"schedule":[]}' >> $GITHUB_OUTPUT
else
echo 'force_json=' >> $GITHUB_OUTPUT
fi
run: |
# Enable Docker updates if requested (modify config temporarily)
if [ "${{ inputs.update_docker }}" = "true" ]; then
echo "Enabling Docker base image updates..."
jq '(.packageRules[] | select(.matchManagers == ["dockerfile"])).enabled = true' \
.github/renovate.json > .github/renovate.json.tmp && \
mv .github/renovate.json.tmp .github/renovate.json
fi
# Build RENOVATE_FORCE JSON if force_run is enabled
if [ "${{ inputs.force_run }}" = "true" ]; then
echo "Force run enabled - ignoring schedule restrictions"
echo 'force_json={"schedule":[]}' >> "$GITHUB_OUTPUT"
else
echo 'force_json=' >> "$GITHUB_OUTPUT"
fi
🧰 Tools
🪛 actionlint (1.7.10)

74-74: shellcheck reported issue in this script: SC2086:info:12:40: Double quote to prevent globbing and word splitting

(shellcheck)


74-74: shellcheck reported issue in this script: SC2086:info:14:25: Double quote to prevent globbing and word splitting

(shellcheck)

🤖 Prompt for AI Agents
In @.github/workflows/renovate.yml around lines 74 - 89, The GitHub Actions step
writes to $GITHUB_OUTPUT without quoting which triggers shellcheck SC2086;
update the two redirections in the conditional that builds RENOVATE_FORCE JSON
(the echo lines that append 'force_json=...' to $GITHUB_OUTPUT) to redirect to
the quoted variable ("$GITHUB_OUTPUT") to prevent word-splitting; leave the echo
payloads unchanged and only change the redirection target in that block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Relating to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants