-
Notifications
You must be signed in to change notification settings - Fork 655
feat: Add automated dependency version tracking and extraction #3547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a dependency reporting system: documentation and directory scaffolding, a configurable Python extractor script, configuration YAML, GitHub Actions for nightly extraction and release snapshots, artifact uploads, and automated PR creation when changes are detected. Updates .gitignore to track only “latest” and release CSVs in the repo. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Scheduler as Cron/Manual
participant GHA as GitHub Actions: Nightly Workflow
participant Script as extract_dependency_versions.py
participant Repo as Repo (.github/reports)
participant PR as PR Bot
Scheduler ->> GHA: Trigger (02:00 UTC / workflow_dispatch)
GHA ->> GHA: Setup Python 3.12, install pyyaml
GHA ->> Script: Run extractor (timestamped + latest)
Script ->> Repo: Write dependency_versions_<timestamp>.csv
Script ->> Repo: Copy to dependency_versions_latest.csv
GHA ->> GHA: Detect changes (git status)
alt Changes detected
GHA ->> PR: Create PR with counts (New/Changed/Unchanged)
else No changes
GHA ->> GHA: Skip PR creation
end
GHA ->> GHA: Upload artifacts (retention 90d)
GHA ->> Scheduler: Step summary (counts/status)
sequenceDiagram
autonumber
actor Dev as Push to release/X.Y.Z or Manual
participant GHA as GitHub Actions: Release Snapshot
participant Script as extract_dependency_versions.py
participant Repo as .github/reports/releases
participant PR as PR Bot
Dev ->> GHA: Trigger with VERSION (from branch or input)
GHA ->> GHA: Validate VERSION (X.Y.Z) or fail
GHA ->> Script: Generate dependency_versions_v<VERSION>.csv
Script ->> Repo: Write releases/dependency_versions_v<VERSION>.csv
GHA ->> GHA: Check if snapshot exists
alt New snapshot
GHA ->> PR: Create PR (release-snapshot/v<VERSION>)
else Exists
GHA ->> GHA: Skip PR creation
end
GHA ->> GHA: Upload artifact (retention 365d)
GHA ->> Dev: Step summary (created/already exists)
sequenceDiagram
autonumber
participant Script as extract_dependency_versions.py
participant FS as Repo Files
participant Cfg as Config YAML
participant Baseline as Prev CSVs (latest/release)
participant CSV as Output CSVs
Script ->> Cfg: Load config (components, rules, critical list)
Script ->> Baseline: Load previous snapshots (optional)
loop Components & Sources
Script ->> FS: Discover files (Dockerfiles, reqs, pyproject, go.mod, etc.)
Script ->> Script: Extract name/version/source per rule
Script ->> Script: Mark critical, compute diffs vs baselines
end
Script ->> CSV: Write dependency_versions_<timestamp or version>.csv
opt Unversioned
Script ->> CSV: Write unversioned_dependencies_<timestamp>.csv
end
Script ->> Script: Print summary (counts/warnings)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (4)
.gitignore (1)
41-47: Rules look good; keep latest + release snapshots trackedPatterns correctly ignore timestamped CSVs and keep latest/release snapshots. Optional: if you ever place timestamped CSVs in nested dirs under .github/reports, consider using a double-star pattern (e.g., .github/reports/**/_[0-9].csv).
.github/reports/releases/.gitkeep (1)
1-1: OK to track the directoryPlaceholder is fine. Consider leaving .gitkeep empty to avoid lint/noise, but not required.
.github/workflows/extract_dependency_versions_config.yaml (1)
1-173: Config placed under .github/workflows triggers actionlint noiseHaving non-workflow YAML here can confuse tooling (actionlint hints). Prefer moving this config out of .github/workflows (e.g., .github/dependency-extraction/config.yaml) and pass --config in workflows. No functional change, but cleaner CI.
Would you like a patch updating both workflows to pass a new config path?
.github/workflows/extract_dependency_versions.py (1)
213-229: Prefer narrower exception handling and avoid bare exceptsMultiple places catch Exception broadly; narrow where practical or at least log more context. Also TRY300: push returns into else blocks to reduce scope.
Examples here and at lines ~696, ~791, ~900, ~939, ~992, ~1029, ~1072, ~1112.
Would you like me to generate a patch consolidating exception handling patterns?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/reports/README.md(1 hunks).github/reports/releases/.gitkeep(1 hunks).github/workflows/dependency-extraction-nightly.yml(1 hunks).github/workflows/dependency-extraction-release.yml(1 hunks).github/workflows/extract_dependency_versions.py(1 hunks).github/workflows/extract_dependency_versions_config.yaml(1 hunks).gitignore(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/extract_dependency_versions_config.yaml
4-4: unexpected key "github" for "workflow" section. expected one of "concurrency", "defaults", "env", "jobs", "name", "on", "permissions", "run-name"
(syntax-check)
4-4: "on" section is missing in workflow
(syntax-check)
4-4: "jobs" section is missing in workflow
(syntax-check)
8-8: unexpected key "baseline" for "workflow" section. expected one of "concurrency", "defaults", "env", "jobs", "name", "on", "permissions", "run-name"
(syntax-check)
11-11: unexpected key "critical_dependencies" for "workflow" section. expected one of "concurrency", "defaults", "env", "jobs", "name", "on", "permissions", "run-name"
(syntax-check)
99-99: unexpected key "components" for "workflow" section. expected one of "concurrency", "defaults", "env", "jobs", "name", "on", "permissions", "run-name"
(syntax-check)
163-163: unexpected key "extraction" for "workflow" section. expected one of "concurrency", "defaults", "env", "jobs", "name", "on", "permissions", "run-name"
(syntax-check)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3547/merge) by dagil-nvidia.
.github/workflows/dependency-extraction-release.yml
[error] 1-1: Trailing whitespace found and fixed (pre-commit).
.github/workflows/dependency-extraction-nightly.yml
[error] 1-1: Trailing whitespace found and fixed (pre-commit).
.github/workflows/extract_dependency_versions.py
[error] 1-1: isort: Files were modified by this hook. (pre-commit)
[error] 1-1: Black: 1 file reformatted. (pre-commit)
[error] 1-1: check-shebang-scripts-are-executable: has a shebang but is not marked executable.
[error] 469-469: ruff: Local variable var_name is assigned to but never used (F841).
🪛 markdownlint-cli2 (0.18.1)
.github/reports/README.md
125-125: Multiple headings with the same content
(MD024, no-duplicate-heading)
🪛 Ruff (0.13.3)
.github/workflows/extract_dependency_versions.py
1-1: Shebang is present but file is not executable
(EXE001)
71-71: Unused method argument: version
(ARG002)
197-197: Do not catch blind exception: Exception
(BLE001)
226-226: Consider moving this statement to an else block
(TRY300)
227-227: Do not catch blind exception: Exception
(BLE001)
353-353: Unused method argument: version
(ARG002)
415-415: Unused method argument: category
(ARG002)
415-415: Unused method argument: source_file
(ARG002)
422-422: Local variable var_name is assigned to but never used
Remove assignment to unused variable var_name
(F841)
423-423: f-string without any placeholders
Remove extraneous f prefix
(F541)
436-436: Local variable pkg is assigned to but never used
Remove assignment to unused variable pkg
(F841)
437-437: f-string without any placeholders
Remove extraneous f prefix
(F541)
464-464: Local variable service is assigned to but never used
Remove assignment to unused variable service
(F841)
465-465: f-string without any placeholders
Remove extraneous f prefix
(F541)
489-489: Unused method argument: category
(ARG002)
576-576: Local variable critical_reason is assigned to but never used
Remove assignment to unused variable critical_reason
(F841)
696-696: Do not catch blind exception: Exception
(BLE001)
700-700: Use explicit conversion flag
Replace with conversion flag
(RUF010)
742-742: Local variable original_line is assigned to but never used
Remove assignment to unused variable original_line
(F841)
791-791: Do not catch blind exception: Exception
(BLE001)
795-795: Use explicit conversion flag
Replace with conversion flag
(RUF010)
900-900: Do not catch blind exception: Exception
(BLE001)
904-904: Use explicit conversion flag
Replace with conversion flag
(RUF010)
939-939: Do not catch blind exception: Exception
(BLE001)
943-943: Use explicit conversion flag
Replace with conversion flag
(RUF010)
983-983: f-string without any placeholders
Remove extraneous f prefix
(F541)
992-992: Do not catch blind exception: Exception
(BLE001)
996-996: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1029-1029: Do not catch blind exception: Exception
(BLE001)
1033-1033: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1072-1072: Do not catch blind exception: Exception
(BLE001)
1076-1076: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1112-1112: Do not catch blind exception: Exception
(BLE001)
1116-1116: Use explicit conversion flag
Replace with conversion flag
(RUF010)
1160-1160: Test for membership should be not in
Convert to not in
(E713)
1395-1395: f-string without any placeholders
Remove extraneous f prefix
(F541)
1492-1492: f-string without any placeholders
Remove extraneous f prefix
(F541)
1501-1501: f-string without any placeholders
Remove extraneous f prefix
(F541)
1502-1502: f-string without any placeholders
Remove extraneous f prefix
(F541)
1503-1503: f-string without any placeholders
Remove extraneous f prefix
(F541)
1504-1504: f-string without any placeholders
Remove extraneous f prefix
(F541)
1505-1505: f-string without any placeholders
Remove extraneous f prefix
(F541)
1507-1507: f-string without any placeholders
Remove extraneous f prefix
(F541)
1512-1512: f-string without any placeholders
Remove extraneous f prefix
(F541)
1516-1516: f-string without any placeholders
Remove extraneous f prefix
(F541)
1656-1656: f-string without any placeholders
Remove extraneous f prefix
(F541)
1663-1663: f-string without any placeholders
Remove extraneous f prefix
(F541)
🪛 YAMLlint (1.37.1)
.github/workflows/extract_dependency_versions_config.yaml
[warning] 173-173: too many blank lines (1 > 0)
(empty-lines)
.github/workflows/dependency-extraction-release.yml
[warning] 165-165: too many blank lines (1 > 0)
(empty-lines)
.github/workflows/dependency-extraction-nightly.yml
[warning] 155-155: too many blank lines (1 > 0)
(empty-lines)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
.github/workflows/dependency-extraction-release.yml (1)
81-94: Snapshot existence check is correctChecking against tracked files avoids false positives for newly generated-but-untracked snapshots. LGTM.
Ensure release branches follow release/X.Y.Z exactly; the version extractor expects that pattern.
.github/reports/README.md
Outdated
| ## Files | ||
|
|
||
| - 🤖 [Extraction Script](../workflows/extract_dependency_versions.py) | ||
| - ⚙️ [Configuration](../workflows/extract_dependency_versions_config.yaml) | ||
| - 📋 [Nightly Workflow](../workflows/dependency-extraction-nightly.yml) | ||
| - 📸 [Release Workflow](../workflows/dependency-extraction-release.yml) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate H2 heading (MD024)
Two "## Files" headings. Rename the second to avoid markdownlint MD024.
-## Files
+## Links📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Files | |
| - 🤖 [Extraction Script](../workflows/extract_dependency_versions.py) | |
| - ⚙️ [Configuration](../workflows/extract_dependency_versions_config.yaml) | |
| - 📋 [Nightly Workflow](../workflows/dependency-extraction-nightly.yml) | |
| - 📸 [Release Workflow](../workflows/dependency-extraction-release.yml) | |
| ## Links | |
| - 🤖 [Extraction Script](../workflows/extract_dependency_versions.py) | |
| - ⚙️ [Configuration](../workflows/extract_dependency_versions_config.yaml) | |
| - 📋 [Nightly Workflow](../workflows/dependency-extraction-nightly.yml) | |
| - 📸 [Release Workflow](../workflows/dependency-extraction-release.yml) |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
125-125: Multiple headings with the same content
(MD024, no-duplicate-heading)
🤖 Prompt for AI Agents
In .github/reports/README.md around lines 125 to 131, there are two identical
"## Files" H2 headings causing markdownlint MD024; rename the second heading to
a unique, descriptive title (for example "## Workflow Files" or "## Included
Files") and update any references if needed so the heading is no longer a
duplicate.
| # Copy unversioned report if it exists | ||
| if [ -f "unversioned_dependencies_${TIMESTAMP}.csv" ]; then | ||
| cp unversioned_dependencies_${TIMESTAMP}.csv .github/reports/unversioned_dependencies_latest.csv | ||
| fi | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: unversioned report copy looks for wrong filename
The extractor writes .github/reports/dependency_versions_${TIMESTAMP}unversioned.csv, but the workflow looks for unversioned_dependencies${TIMESTAMP}.csv. Copy never happens.
- # Copy unversioned report if it exists
- if [ -f "unversioned_dependencies_${TIMESTAMP}.csv" ]; then
- cp unversioned_dependencies_${TIMESTAMP}.csv .github/reports/unversioned_dependencies_latest.csv
- fi
+ # Copy unversioned report if it exists (matches extractor's naming)
+ if [ -f ".github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv" ]; then
+ cp ".github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv" .github/reports/unversioned_dependencies_latest.csv
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Copy unversioned report if it exists | |
| if [ -f "unversioned_dependencies_${TIMESTAMP}.csv" ]; then | |
| cp unversioned_dependencies_${TIMESTAMP}.csv .github/reports/unversioned_dependencies_latest.csv | |
| fi | |
| # Copy unversioned report if it exists (matches extractor's naming) | |
| if [ -f ".github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv" ]; then | |
| cp ".github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv" .github/reports/unversioned_dependencies_latest.csv | |
| fi |
🤖 Prompt for AI Agents
.github/workflows/dependency-extraction-nightly.yml around lines 59 to 63: the
workflow is checking for and copying unversioned_dependencies_${TIMESTAMP}.csv
but the extractor actually writes
.github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv, so the copy
never occurs; update the if condition and cp command to look for and copy
.github/reports/dependency_versions_${TIMESTAMP}_unversioned.csv (copying it to
.github/reports/unversioned_dependencies_latest.csv or another intended
destination) so the correct file is found and copied.
| 🔗 **Documentation:** [Dependency Extraction Guide](../docs/dependency_extraction.md) | ||
| 📦 **Artifacts:** Download timestamped CSVs from workflow run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
Verify docs link exists
The PR body references ../docs/dependency_extraction.md. Ensure the path exists to avoid a broken link in the generated PR.
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
fd -a --hidden --strip-cwd-prefix 'dependency_extraction.md' | sed 's/^/found: /' || trueLength of output: 0
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
find . -type f \( -name "dependency_extraction.md" -o -name "dependency-extraction.md" \) -printLength of output: 0
Missing documentation file The link ../docs/dependency_extraction.md in .github/workflows/dependency-extraction-nightly.yml points to a non-existent file. Add the missing docs file or correct the path to avoid a broken link.
🤖 Prompt for AI Agents
In .github/workflows/dependency-extraction-nightly.yml around lines 115-116 the
Documentation link points to a missing file ../docs/dependency_extraction.md;
fix by either adding the referenced documentation file at
docs/dependency_extraction.md (with the described content) or update the
workflow link to the correct existing path/filename in the repo (verify exact
docs directory and filename, update the relative link accordingly), then run a
quick link check to ensure the workflow no longer references a broken path.
| name: dependency-extraction-${{ github.run_number }} | ||
| path: | | ||
| .github/reports/dependency_versions_*.csv | ||
| .github/reports/unversioned_dependencies_*.csv | ||
| retention-days: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact pattern misses unversioned report
Adjust to match the extractor’s file name.
path: |
.github/reports/dependency_versions_*.csv
- .github/reports/unversioned_dependencies_*.csv
+ .github/reports/dependency_versions_*_unversioned.csv📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name: dependency-extraction-${{ github.run_number }} | |
| path: | | |
| .github/reports/dependency_versions_*.csv | |
| .github/reports/unversioned_dependencies_*.csv | |
| retention-days: 90 | |
| name: dependency-extraction-${{ github.run_number }} | |
| path: | | |
| .github/reports/dependency_versions_*.csv | |
| .github/reports/dependency_versions_*_unversioned.csv | |
| retention-days: 90 |
🤖 Prompt for AI Agents
.github/workflows/dependency-extraction-nightly.yml lines 131-135: the artifact
path pattern currently doesn't match the extractor's actual unversioned report
filename; update the path entries so they include the extractor's exact filename
pattern (for example adjust the unversioned entry to match
"unversioned-dependencies_*.csv" if the extractor emits a hyphenated name) so
both versioned and unversioned CSV reports are captured and retained.
| @@ -0,0 +1,1746 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shebang requires executable bit or remove it
Pre-commit flagged: has a shebang but not executable. Either chmod +x this file in git or drop the shebang (workflows call it via python3).
Suggested options:
- git update-index --chmod=+x .github/workflows/extract_dependency_versions.py
- Or remove line 1 (shebang).
🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3547/merge) by dagil-nvidia.
[error] 1-1: isort: Files were modified by this hook. (pre-commit)
[error] 1-1: Black: 1 file reformatted. (pre-commit)
[error] 1-1: check-shebang-scripts-are-executable: has a shebang but is not marked executable.
🪛 Ruff (0.13.3)
1-1: Shebang is present but file is not executable
(EXE001)
🤖 Prompt for AI Agents
.github/workflows/extract_dependency_versions.py around line 1: the file
contains a shebang but is not executable which pre-commit flagged; fix by either
making the file executable in git (run: git update-index --chmod=+x
.github/workflows/extract_dependency_versions.py and commit that change) or
remove the shebang line (delete the first line and commit), choose one approach
and commit the resulting change so pre-commit no longer fails.
| def _get_package_source_url(self, dep_name: str, category: str, version: str, source_file: str) -> str: | ||
| """Generate source URL for package/dependency based on type and name.""" | ||
| dep_lower = dep_name.lower() | ||
|
|
||
| # Docker images from NVIDIA NGC Catalog | ||
| if category == "Base Image" or category == "Docker Compose Service": | ||
| if "nvcr.io" in source_file or "nvidia" in dep_lower: | ||
| # Extract image name for NGC | ||
| image_slug = dep_name.split('/')[-1].lower() | ||
| return f"https://catalog.ngc.nvidia.com/orgs/nvidia/containers/{image_slug}" | ||
| elif "/" in dep_name: | ||
| # Docker Hub | ||
| return f"https://hub.docker.com/r/{dep_name}" | ||
|
|
||
| # Helm Charts | ||
| if "Helm Chart" in category: | ||
| chart_slug = dep_name.lower().replace(' ', '-') | ||
| return f"https://artifacthub.io/packages/search?ts_query_web={chart_slug}" | ||
|
|
||
| # Python packages | ||
| if "Python" in category: | ||
| # Remove version constraints and extras | ||
| pkg_name = dep_name.split('[')[0].strip().lower() | ||
| pkg_name = pkg_name.replace(' ', '-') | ||
| return f"https://pypi.org/project/{pkg_name}/" | ||
|
|
||
| # Go modules | ||
| if category == "Go Module": | ||
| return f"https://pkg.go.dev/{dep_name}" | ||
|
|
||
| # Rust crates | ||
| if category == "Rust Crate": | ||
| return f"https://crates.io/crates/{dep_name}" | ||
|
|
||
| # Git dependencies already have repo URLs - extract repo URL | ||
| if "Git" in category and "github.com" in source_file: | ||
| # Try to extract from notes or return GitHub search | ||
| return f"https://github.com/search?q={dep_name}&type=repositories" | ||
|
|
||
| # Framework/System packages | ||
| if dep_name.lower() in ["rust", "python", "go", "cmake"]: | ||
| if "rust" in dep_lower: | ||
| return "https://www.rust-lang.org/tools/install" | ||
| elif "python" in dep_lower: | ||
| return "https://www.python.org/downloads/" | ||
| elif "go" in dep_lower: | ||
| return "https://go.dev/dl/" | ||
| elif "cmake" in dep_lower: | ||
| return "https://cmake.org/download/" | ||
|
|
||
| # CUDA | ||
| if "cuda" in dep_lower: | ||
| return "https://developer.nvidia.com/cuda-downloads" | ||
|
|
||
| # Default: return N/A | ||
| return "N/A" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix package source URL detection for images and git deps
- Using source_file to detect nvcr.io is incorrect; check dep_name instead.
- Always provide a reasonable URL for Git deps (GitHub search fallback).
- # Docker images from NVIDIA NGC Catalog
- if category == "Base Image" or category == "Docker Compose Service":
- if "nvcr.io" in source_file or "nvidia" in dep_lower:
+ # Docker images
+ if category in ("Base Image", "Docker Compose Service"):
+ dep_str = dep_name.lower()
+ if "nvcr.io" in dep_str or "nvidia" in dep_str:
# Extract image name for NGC
image_slug = dep_name.split('/')[-1].lower()
return f"https://catalog.ngc.nvidia.com/orgs/nvidia/containers/{image_slug}"
elif "/" in dep_name:
# Docker Hub
return f"https://hub.docker.com/r/{dep_name}"
@@
- # Git dependencies already have repo URLs - extract repo URL
- if "Git" in category and "github.com" in source_file:
- # Try to extract from notes or return GitHub search
- return f"https://github.com/search?q={dep_name}&type=repositories"
+ # Git dependencies: provide a GitHub search fallback
+ if "Git" in category:
+ return f"https://github.com/search?q={dep_name}&type=repositories"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_package_source_url(self, dep_name: str, category: str, version: str, source_file: str) -> str: | |
| """Generate source URL for package/dependency based on type and name.""" | |
| dep_lower = dep_name.lower() | |
| # Docker images from NVIDIA NGC Catalog | |
| if category == "Base Image" or category == "Docker Compose Service": | |
| if "nvcr.io" in source_file or "nvidia" in dep_lower: | |
| # Extract image name for NGC | |
| image_slug = dep_name.split('/')[-1].lower() | |
| return f"https://catalog.ngc.nvidia.com/orgs/nvidia/containers/{image_slug}" | |
| elif "/" in dep_name: | |
| # Docker Hub | |
| return f"https://hub.docker.com/r/{dep_name}" | |
| # Helm Charts | |
| if "Helm Chart" in category: | |
| chart_slug = dep_name.lower().replace(' ', '-') | |
| return f"https://artifacthub.io/packages/search?ts_query_web={chart_slug}" | |
| # Python packages | |
| if "Python" in category: | |
| # Remove version constraints and extras | |
| pkg_name = dep_name.split('[')[0].strip().lower() | |
| pkg_name = pkg_name.replace(' ', '-') | |
| return f"https://pypi.org/project/{pkg_name}/" | |
| # Go modules | |
| if category == "Go Module": | |
| return f"https://pkg.go.dev/{dep_name}" | |
| # Rust crates | |
| if category == "Rust Crate": | |
| return f"https://crates.io/crates/{dep_name}" | |
| # Git dependencies already have repo URLs - extract repo URL | |
| if "Git" in category and "github.com" in source_file: | |
| # Try to extract from notes or return GitHub search | |
| return f"https://github.com/search?q={dep_name}&type=repositories" | |
| # Framework/System packages | |
| if dep_name.lower() in ["rust", "python", "go", "cmake"]: | |
| if "rust" in dep_lower: | |
| return "https://www.rust-lang.org/tools/install" | |
| elif "python" in dep_lower: | |
| return "https://www.python.org/downloads/" | |
| elif "go" in dep_lower: | |
| return "https://go.dev/dl/" | |
| elif "cmake" in dep_lower: | |
| return "https://cmake.org/download/" | |
| # CUDA | |
| if "cuda" in dep_lower: | |
| return "https://developer.nvidia.com/cuda-downloads" | |
| # Default: return N/A | |
| return "N/A" | |
| def _get_package_source_url(self, dep_name: str, category: str, version: str, source_file: str) -> str: | |
| """Generate source URL for package/dependency based on type and name.""" | |
| dep_lower = dep_name.lower() | |
| # Docker images | |
| if category in ("Base Image", "Docker Compose Service"): | |
| dep_str = dep_name.lower() | |
| if "nvcr.io" in dep_str or "nvidia" in dep_str: | |
| # Extract image name for NGC | |
| image_slug = dep_name.split('/')[-1].lower() | |
| return f"https://catalog.ngc.nvidia.com/orgs/nvidia/containers/{image_slug}" | |
| elif "/" in dep_name: | |
| # Docker Hub | |
| return f"https://hub.docker.com/r/{dep_name}" | |
| # Helm Charts | |
| if "Helm Chart" in category: | |
| chart_slug = dep_name.lower().replace(' ', '-') | |
| return f"https://artifacthub.io/packages/search?ts_query_web={chart_slug}" | |
| # Python packages | |
| if "Python" in category: | |
| # Remove version constraints and extras | |
| pkg_name = dep_name.split('[')[0].strip().lower() | |
| pkg_name = pkg_name.replace(' ', '-') | |
| return f"https://pypi.org/project/{pkg_name}/" | |
| # Go modules | |
| if category == "Go Module": | |
| return f"https://pkg.go.dev/{dep_name}" | |
| # Rust crates | |
| if category == "Rust Crate": | |
| return f"https://crates.io/crates/{dep_name}" | |
| # Git dependencies: provide a GitHub search fallback | |
| if "Git" in category: | |
| return f"https://github.com/search?q={dep_name}&type=repositories" | |
| # Framework/System packages | |
| if dep_name.lower() in ["rust", "python", "go", "cmake"]: | |
| if "rust" in dep_lower: | |
| return "https://www.rust-lang.org/tools/install" | |
| elif "python" in dep_lower: | |
| return "https://www.python.org/downloads/" | |
| elif "go" in dep_lower: | |
| return "https://go.dev/dl/" | |
| elif "cmake" in dep_lower: | |
| return "https://cmake.org/download/" | |
| # CUDA | |
| if "cuda" in dep_lower: | |
| return "https://developer.nvidia.com/cuda-downloads" | |
| # Default: return N/A | |
| return "N/A" |
🧰 Tools
🪛 Ruff (0.13.3)
71-71: Unused method argument: version
(ARG002)
🤖 Prompt for AI Agents
.github/workflows/extract_dependency_versions.py around lines 71 to 127: The
current logic incorrectly checks source_file for nvcr.io and only returns a
GitHub fallback when "github.com" appears in source_file; update the checks so
nvcr.io detection uses dep_name (or dep_lower) instead of source_file, and
change the Git dependency branch to always return a reasonable GitHub
search/fallback URL when "Git" is in category (regardless of source_file). Keep
existing Docker Hub and NGC URL formats but base the NGC branch on dep_name
containing nvcr.io or nvidia in dep_lower, and ensure the git branch constructs
the GitHub search fallback using dep_name.
| def _format_notes(self, notes: str, category: str, source_file: str) -> str: | ||
| """Format notes to be more user-friendly and concise.""" | ||
| if not notes: | ||
| return "" | ||
|
|
||
| # Handle "ARG: VARIABLE_NAME" format | ||
| if notes.startswith("ARG: "): | ||
| var_name = notes[5:] # Remove "ARG: " prefix | ||
| return f"Dockerfile build argument" | ||
|
|
||
| # Handle "From install script: VARIABLE_NAME" format | ||
| if notes.startswith("From install script:"): | ||
| return "From installation script" | ||
|
|
||
| # Handle "ENV: VARIABLE_NAME" format | ||
| if notes.startswith("ENV: "): | ||
| return "Dockerfile environment variable" | ||
|
|
||
| # Handle Git dependency notes | ||
| if notes.startswith("Git dependency:"): | ||
| # Extract the package name after the colon | ||
| pkg = notes.split(":", 1)[1].strip() if ":" in notes else "" | ||
| return f"Git repository dependency" | ||
|
|
||
| # Handle "Git-based pip install from ..." | ||
| if notes.startswith("Git-based pip install from"): | ||
| org_repo = notes.replace("Git-based pip install from ", "") | ||
| return f"Installed from Git ({org_repo})" | ||
|
|
||
| # Helm dependencies | ||
| if "Helm dependency from" in notes: | ||
| # Extract just the source type | ||
| if "oci://" in notes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused variables and extraneous f-strings in notes formatter
Addresses Ruff F841/F541 and simplifies output.
# Handle "ARG: VARIABLE_NAME" format
if notes.startswith("ARG: "):
- var_name = notes[5:] # Remove "ARG: " prefix
- return f"Dockerfile build argument"
+ return "Dockerfile build argument"
@@
if notes.startswith("From install script:"):
- return "From installation script"
+ return "From installation script"
@@
if notes.startswith("Git dependency:"):
- # Extract the package name after the colon
- pkg = notes.split(":", 1)[1].strip() if ":" in notes else ""
- return f"Git repository dependency"
+ return "Git repository dependency"
@@
if notes.startswith("Service:"):
- service = notes.replace("Service:", "").strip()
- return f"Docker Compose service"
+ return "Docker Compose service"Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.13.3)
415-415: Unused method argument: category
(ARG002)
415-415: Unused method argument: source_file
(ARG002)
422-422: Local variable var_name is assigned to but never used
Remove assignment to unused variable var_name
(F841)
423-423: f-string without any placeholders
Remove extraneous f prefix
(F541)
436-436: Local variable pkg is assigned to but never used
Remove assignment to unused variable pkg
(F841)
437-437: f-string without any placeholders
Remove extraneous f prefix
(F541)
| def extract_dockerfile_args(self, dockerfile_path: Path, component: str) -> None: | ||
| """Extract ARG and ENV declarations from Dockerfile.""" | ||
| if not dockerfile_path.exists(): | ||
| self.failed_files.append({ | ||
| 'file': str(dockerfile_path.relative_to(self.repo_root)), | ||
| 'component': component, | ||
| 'reason': 'File not found' | ||
| }) | ||
| return | ||
|
|
||
| try: | ||
| self.processed_files.add(str(dockerfile_path.relative_to(self.repo_root))) | ||
|
|
||
| with open(dockerfile_path) as f: | ||
| lines = f.readlines() | ||
|
|
||
| # Build a dictionary of ARG values for variable substitution | ||
| arg_values = {} | ||
|
|
||
| for i, line in enumerate(lines, 1): | ||
| line = line.strip() | ||
|
|
||
| # Collect ARG values | ||
| if line.startswith("ARG ") and "=" in line: | ||
| arg_line = line[4:].strip() | ||
| if "=" in arg_line: | ||
| key, value = arg_line.split("=", 1) | ||
| key = key.strip() | ||
| value = value.strip().strip('"') | ||
| arg_values[key] = value | ||
|
|
||
| # Extract version-related ARGs | ||
| version_keywords = ["VERSION", "REF", "TAG", "_VER"] | ||
| if any(kw in key for kw in version_keywords): | ||
| category = "System" if key.startswith(("NATS", "ETCD", "NIXL", "UCX", "RUST")) else "Framework" | ||
| self.add_dependency( | ||
| component, category, key.replace("_", " ").title(), value, | ||
| str(dockerfile_path.relative_to(self.repo_root)), | ||
| str(i), f"ARG: {key}" | ||
| ) | ||
|
|
||
| # Extract base images with variable resolution | ||
| if line.startswith("FROM ") and "AS" in line: | ||
| parts = line.split() | ||
| image = parts[1] | ||
| if ":" in image: | ||
| img_name, tag = image.rsplit(":", 1) | ||
|
|
||
| # Resolve variables in image name and tag | ||
| img_name = self._resolve_dockerfile_vars(img_name, arg_values) | ||
| tag = self._resolve_dockerfile_vars(tag, arg_values) | ||
|
|
||
| # Only add if not just variable names | ||
| if not (img_name.startswith('${') or tag.startswith('${')): | ||
| self.add_dependency( | ||
| component, "Base Image", img_name, tag, | ||
| str(dockerfile_path.relative_to(self.repo_root)), | ||
| str(i), "Build/Runtime base image" | ||
| ) | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FROM parsing misses single-stage images; also handle untagged images
Currently only captures lines with AS; single-stage images are skipped. Also treat images without explicit tag as latest.
- # Extract base images with variable resolution
- if line.startswith("FROM ") and "AS" in line:
- parts = line.split()
- image = parts[1]
- if ":" in image:
- img_name, tag = image.rsplit(":", 1)
-
- # Resolve variables in image name and tag
- img_name = self._resolve_dockerfile_vars(img_name, arg_values)
- tag = self._resolve_dockerfile_vars(tag, arg_values)
-
- # Only add if not just variable names
- if not (img_name.startswith('${') or tag.startswith('${')):
- self.add_dependency(
- component, "Base Image", img_name, tag,
- str(dockerfile_path.relative_to(self.repo_root)),
- str(i), "Build/Runtime base image"
- )
+ # Extract base images (support with/without AS, and no explicit tag)
+ if line.startswith("FROM "):
+ parts = line.split()
+ if len(parts) >= 2:
+ image = parts[1]
+ if ":" in image:
+ img_name, tag = image.rsplit(":", 1)
+ else:
+ img_name, tag = image, "latest"
+ # Resolve variables in image name and tag
+ img_name = self._resolve_dockerfile_vars(img_name, arg_values)
+ tag = self._resolve_dockerfile_vars(tag, arg_values)
+ # Skip unresolved variable-only entries
+ if "$" not in img_name and "$" not in tag:
+ self.add_dependency(
+ component, "Base Image", img_name, tag,
+ str(dockerfile_path.relative_to(self.repo_root)),
+ str(i), "Build/Runtime base image"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def extract_dockerfile_args(self, dockerfile_path: Path, component: str) -> None: | |
| """Extract ARG and ENV declarations from Dockerfile.""" | |
| if not dockerfile_path.exists(): | |
| self.failed_files.append({ | |
| 'file': str(dockerfile_path.relative_to(self.repo_root)), | |
| 'component': component, | |
| 'reason': 'File not found' | |
| }) | |
| return | |
| try: | |
| self.processed_files.add(str(dockerfile_path.relative_to(self.repo_root))) | |
| with open(dockerfile_path) as f: | |
| lines = f.readlines() | |
| # Build a dictionary of ARG values for variable substitution | |
| arg_values = {} | |
| for i, line in enumerate(lines, 1): | |
| line = line.strip() | |
| # Collect ARG values | |
| if line.startswith("ARG ") and "=" in line: | |
| arg_line = line[4:].strip() | |
| if "=" in arg_line: | |
| key, value = arg_line.split("=", 1) | |
| key = key.strip() | |
| value = value.strip().strip('"') | |
| arg_values[key] = value | |
| # Extract version-related ARGs | |
| version_keywords = ["VERSION", "REF", "TAG", "_VER"] | |
| if any(kw in key for kw in version_keywords): | |
| category = "System" if key.startswith(("NATS", "ETCD", "NIXL", "UCX", "RUST")) else "Framework" | |
| self.add_dependency( | |
| component, category, key.replace("_", " ").title(), value, | |
| str(dockerfile_path.relative_to(self.repo_root)), | |
| str(i), f"ARG: {key}" | |
| ) | |
| # Extract base images with variable resolution | |
| if line.startswith("FROM ") and "AS" in line: | |
| parts = line.split() | |
| image = parts[1] | |
| if ":" in image: | |
| img_name, tag = image.rsplit(":", 1) | |
| # Resolve variables in image name and tag | |
| img_name = self._resolve_dockerfile_vars(img_name, arg_values) | |
| tag = self._resolve_dockerfile_vars(tag, arg_values) | |
| # Only add if not just variable names | |
| if not (img_name.startswith('${') or tag.startswith('${')): | |
| self.add_dependency( | |
| component, "Base Image", img_name, tag, | |
| str(dockerfile_path.relative_to(self.repo_root)), | |
| str(i), "Build/Runtime base image" | |
| ) | |
| except Exception as e: | |
| # Extract base images (support with/without AS, and no explicit tag) | |
| if line.startswith("FROM "): | |
| parts = line.split() | |
| if len(parts) >= 2: | |
| image = parts[1] | |
| if ":" in image: | |
| img_name, tag = image.rsplit(":", 1) | |
| else: | |
| img_name, tag = image, "latest" | |
| # Resolve variables in image name and tag | |
| img_name = self._resolve_dockerfile_vars(img_name, arg_values) | |
| tag = self._resolve_dockerfile_vars(tag, arg_values) | |
| # Skip unresolved variable-only entries | |
| if "$" not in img_name and "$" not in tag: | |
| self.add_dependency( | |
| component, "Base Image", img_name, tag, | |
| str(dockerfile_path.relative_to(self.repo_root)), | |
| str(i), "Build/Runtime base image" | |
| ) |
🧰 Tools
🪛 Ruff (0.13.3)
696-696: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In .github/workflows/extract_dependency_versions.py around lines 637 to 696, the
FROM parsing only handles multi-stage lines containing "AS" and skips
single-stage FROMs and untagged images; update the logic to treat any line
starting with "FROM " (regardless of "AS") as a candidate, split out an optional
AS suffix (case-insensitive) to isolate the image token, then parse image and
tag from that token (if ":" missing, set tag = "latest"), resolve variables via
self._resolve_dockerfile_vars for both image and tag, and keep the existing
guard to avoid adding entries when image or tag are unresolved variable
expressions; finally call self.add_dependency with component, "Base Image",
img_name, tag, file path, line number, and the same description so single-stage
and untagged images are recorded.
| lines = f.readlines() | ||
|
|
||
| for i, line in enumerate(lines, 1): | ||
| original_line = line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address Ruff F841 unused variables
- original_line (Line 742) and critical_reason (Line 576) are assigned but unused. Remove them or use critical_reason (e.g., append to Notes for critical entries).
Example minimal fix:
- original_line = line
+ # original_line = line # unusedOr, enrich Notes when critical:
- critical_reason = reason_orig if is_critical_orig else reason_formatted
+ critical_reason = reason_orig if is_critical_orig else reason_formatted
@@
- "Notes": formatted_notes
+ "Notes": formatted_notes if not is_critical else (formatted_notes + ("" if not formatted_notes else " | ") + f"Critical: {critical_reason}")Also applies to: 576-576
🧰 Tools
🪛 Ruff (0.13.3)
742-742: Local variable original_line is assigned to but never used
Remove assignment to unused variable original_line
(F841)
🤖 Prompt for AI Agents
.github/workflows/extract_dependency_versions.py lines 576 and 742: two
variables are assigned but never used (critical_reason at line 576 and
original_line at line 742); either remove these assignments or use
critical_reason by appending its text to the Notes field for entries marked
critical (e.g., when an entry is critical, concatenate critical_reason to the
existing Notes string before storing/writing), and simply delete the
original_line assignment if it's not needed elsewhere; ensure no other code
paths rely on these variables after removal.
| # Extract dependencies | ||
| if in_require or (stripped.startswith('require ') and not '(' in stripped): | ||
| # Handle single-line require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: E713 ('not in') and small clarity improvements
- if in_require or (stripped.startswith('require ') and not '(' in stripped):
+ if in_require or (stripped.startswith('require ') and '(' not in stripped):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Extract dependencies | |
| if in_require or (stripped.startswith('require ') and not '(' in stripped): | |
| # Handle single-line require | |
| # Extract dependencies | |
| if in_require or (stripped.startswith('require ') and '(' not in stripped): | |
| # Handle single-line require |
🧰 Tools
🪛 Ruff (0.13.3)
1160-1160: Test for membership should be not in
Convert to not in
(E713)
🤖 Prompt for AI Agents
.github/workflows/extract_dependency_versions.py around lines 1159-1161: the
conditional uses the negated membership test "not '(' in stripped" which
triggers Style E713 and is slightly unclear; change it to use the recommended
form with the literal on the left, e.g. "if in_require or
(stripped.startswith('require ') and '(' not in stripped):", keep parentheses
for clarity, and optionally add a brief inline comment explaining this branch
handles single-line require statements.
5c1c41b to
7f0391b
Compare
| ### Critical Dependencies | ||
|
|
||
| Critical dependencies are flagged in the CSV to highlight components that require special attention for: | ||
| - Security updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't critical dependencies just based on the list defined in critical_dependencies?
| - Python (runtime) | ||
| - Kubernetes (orchestration) | ||
| - NATS (message broker) | ||
| - etcd (key-value store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the framework versions should be added here as well.
| latest_csv = args.latest_csv | ||
| if latest_csv is None: | ||
| # Look for dependency_versions_latest.csv in .github/reports/ | ||
| reports_dir = repo_root / ".github/reports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a version conflict here, if I already have a file named dependency_versions_latest.csv and I add another csv with the same name, which seems to be the case here: https://github.com/ai-dynamo/dynamo/pull/3547/files#:~:text=%2D%20name%3A%20Upload%20artifacts
, then I'll need to ensure I pick up by timestamp instead of the latest tag.
| """Return default configuration if config file is not available.""" | ||
| return { | ||
| "components": { | ||
| "trtllm": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional dependencies are also specified in pyproject.toml: https://github.com/ai-dynamo/dynamo/blob/main/pyproject.toml#L48-L64
I think we eventually want to use pyproject.toml as the primary source for all of our dependencies instead of having multiple dependencies defined in multiple files. Ideally, everything gets defined in pyproject.toml or a requirements.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| echo "version=$VERSION" >> $GITHUB_OUTPUT | ||
| echo "📦 Creating dependency snapshot for version: v$VERSION" | ||
| - name: Run dependency extraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably define these common functions as composite actions, similar to what's done here: https://github.com/ai-dynamo/dynamo/tree/main/.github/actions
That way, we avoid duplicate steps in these .yml files.
- Clarify that critical dependencies are explicitly maintained in config - Add framework versions (TensorRT-LLM, vLLM, SGLang) to critical list - Add comment explaining latest.csv workflow to avoid confusion - Document that optional dependencies are already extracted from pyproject.toml Addresses review feedback from PR #3547 Signed-off-by: Dan Gil <[email protected]>
4d7c34a to
419c3b1
Compare
🎯 Addressed Review Feedback@nv-tusharma's composite actions suggestion: ✅ Done! Created 🆕 Also Added: Version Discrepancy DetectionAs requested, implemented automatic detection of dependencies pinned at different versions across the repo: Features:
Current Findings (14 discrepancies detected):
Example Warning Output: These warnings will appear in the GitHub Actions UI and Files Changed tab to help identify potential runtime conflicts, build failures, or security issues. Commits:
|
🔧 Additional Improvements to Discrepancy DetectionCommit Changes:
Results:
Current Discrepancies:
These are all legitimate issues that could cause runtime conflicts or build failures. The detection system is now much more accurate! |
✅ Final Status: Version Discrepancy Detection Ready for ReviewAll tasks completed and tested! 🎯 Summary of Changes1. Composite Action Created ✅
2. Version Discrepancy Detection ✅
3. Accuracy Improvements ✅Fixed False Positives:
Results:
📊 Current Discrepancies (4 Total)Critical (2):
Non-Critical (2):
🔧 Technical DetailsExtraction Enhancements:
GitHub Actions Integration: 📝 Commits
Ready for final review! ✨ |
🔔 Added Automated Failure MonitoringCommit: Added automated issue creation when workflows fail: Nightly Workflow Monitoring:
Release Workflow Monitoring:
Benefits:✅ Automatic alerting prevents silent failures 📊 PR Status SummarySquash-and-merge ready! The repo uses squash-merge strategy (verified by checking recent PRs), so individual commit messages won't matter. The PR title and description are compliant with conventional commits. Final commit count: 27 commits → will be squashed to 1 on merge |
| @@ -0,0 +1,380 @@ | |||
| # PR #3547: Automated Dependency Version Tracking and Extraction | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove file?
|
|
||
| This directory contains the latest dependency extraction reports for the Dynamo repository. | ||
|
|
||
| ## Files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add files for 0.5.1 release? its hard to visualize csv structure without looking at one
.gitignore
Outdated
| *pytest_report.md | ||
| *pytest_report.xml | ||
|
|
||
| # Dependency extraction timestamped outputs (keep latest only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need these in gitignore? workflow will run in CI?
| ### `dependency_versions_latest.csv` | ||
| The most recent dependency extraction results. Updated nightly by the automated CI workflow. | ||
|
|
||
| ### `unversioned_dependencies_latest.csv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep it in the same file? This file contents would get lower and lower and eventually nothing, so would need change later on to remove it
I'd suggest github workflow output, flag all the unpinned dependencies separately but storing in git repo, we merge them into a single csv to keep it clean
| @@ -0,0 +1,196 @@ | |||
| # Dependency Extraction Configuration | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be stored out of worflows folder, or in a subfolder maybe
| @@ -0,0 +1,206 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency extraction workflow file can be merged for nightly and release i think
| - name: Install dependencies | ||
| shell: bash | ||
| run: pip install pyyaml | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be extended to extract more things out of workflow i think
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is too big to review 😅
can you split these into multiple files? (also store in a subfolder maybe)
and I also see bunch of hardcoded things, like special_cases, nvidia_indicators etc etc. Please add some text on top of file or a readme on how to update and maintain this file. Adding some explanation about logic/flow would help as well
Implemented review comments from PR #3547: 1. **Clarified gitignore patterns** - Added detailed comments explaining why timestamped CSVs are ignored while latest/release versions are tracked 2. **Merged unversioned dependencies into main CSV** - Removed separate unversioned_dependencies.csv file. All dependencies (versioned and unversioned) now in single CSV with console warnings for unversioned ones 3. **Moved config file** - Relocated extract_dependency_versions_config.yaml to .github/dependency-extraction/config.yaml for better organization outside workflows folder 4. **Merged nightly/release workflows** - Combined dependency-extraction-nightly.yml and dependency-extraction-release.yml into single unified dependency-extraction.yml with conditional logic based on trigger type 5. **Enhanced composite action** - Expanded dependency-extraction-setup action to include checkout, reports directory creation, and configurable fetch-depth Changes: - Updated all references to new config path - Removed --report-unversioned flag from script and workflows - Added unversioned dependency summary in write_csv() console output - Unified workflow supports both nightly (schedule/manual) and release (push to release/* or manual) modes - Composite action now handles more common setup steps Documentation updated: - README.md now references unified workflow - PR description updated with comprehensive review details Signed-off-by: Dan Gil <[email protected]>
- Detect dependencies removed since last extraction - Show removed deps in console output (first 10) - Generate JSON report of removed dependencies - Include removed deps list in nightly PR body with details - Flag critical removed dependencies - Update baseline count dynamically Signed-off-by: Dan Gil <[email protected]>
- Use heredoc for multiline Python script to avoid YAML parsing issues - Remove escaped quotes that were causing syntax errors Signed-off-by: Dan Gil <[email protected]>
- Simplified Python script to single-line to avoid YAML parsing issues - Fixed trailing whitespace in workflow files - All pre-commit checks now passing locally Signed-off-by: Dan Gil <[email protected]>
Skip variables with command substitution to prevent capturing partial strings Fixes Torch/Torchaudio/Torchvision version extraction from install_vllm.sh Signed-off-by: Dan Gil <[email protected]>
…y downloads This commit adds three major improvements to dependency tracking: 1. pip/uv install command extraction: - Captures package==version patterns from pip/uv install commands - Adds 4 new vLLM dependencies: torch, torchaudio, torchvision, lmcache 2. GitHub release URL extraction: - Extracts versions from wget/curl GitHub release download URLs - Handles multiline RUN commands in Dockerfiles - Adds 1 new sglang dependency: nats-server v2.10.28 3. TensorRT-LLM install script coverage: - Added install_nixl.sh to trtllm component configuration - Adds 1 new trtllm dependency: UCX v1.18.1 Impact: - Total dependencies: 243 → 249 (+6 new) - Closes gap where hardcoded versions in scripts were missed - Improves accuracy of dependency tracking across all components Signed-off-by: Dan Gil <[email protected]>
Apply black formatting to fix pre-commit trailing whitespace errors. Signed-off-by: Dan Gil <[email protected]>
- Remove JSON support, keep YAML only (per rmccorm4 review) - Fix package source URL detection (check dep_name not source_file) - Fix FROM parsing to capture single-stage and untagged images - Fix unversioned report filename mismatch in workflow - Fix artifact upload pattern to match actual filenames - Fix docs link to point to existing README - Fix duplicate H2 heading in README (rename to Links) - Make extract_dependency_versions.py executable - Fix all pre-commit issues (ruff, isort, black) Signed-off-by: Dan Gil <[email protected]>
- Clarify that critical dependencies are explicitly maintained in config - Add framework versions (TensorRT-LLM, vLLM, SGLang) to critical list - Add comment explaining latest.csv workflow to avoid confusion - Document that optional dependencies are already extracted from pyproject.toml Addresses review feedback from PR #3547 Signed-off-by: Dan Gil <[email protected]>
- Add composite action for dependency extraction setup to reduce duplication - Implement version discrepancy detection to identify dependencies pinned at different versions across the repo - Output GitHub Actions warning annotations for CI visibility - Add normalize_dependency_name() to handle common naming variations - Add detect_version_discrepancies() to identify version conflicts - Update workflows to use the new composite action - Display discrepancy warnings in extraction summary with actionable tips Addresses nv-tusharma's feedback about using composite actions. Adds warning system as requested to detect version conflicts that could cause runtime issues, build failures, or security vulnerabilities. Related: DYN-1235 Signed-off-by: Dan Gil <[email protected]>
Signed-off-by: Dan Gil <[email protected]>
- Filter out base/runtime images (intentionally different per component) - Preserve full Go module paths to avoid false positives (e.g., emperror.dev/errors vs github.com/pkg/errors) - Skip Go indirect dependencies from discrepancy checks - Add category parameter to normalize_dependency_name() Reduces false positives from 18 to 6 real discrepancies. Signed-off-by: Dan Gil <[email protected]>
…known discrepancies **Fixes for extraction accuracy:** - Skip ARGs for sub-dependencies (e.g., NIXL_UCX_REF is for UCX, not NIXL) - Fixes false positive showing NIXL v1.19.0 (was actually UCX) - Exclude PyTorch Triton from PyTorch normalization - PyTorch Triton (Triton compiler) is a separate package, not PyTorch - Was causing false positive showing 3 PyTorch versions instead of 2 **Version comparison improvements:** - Add version normalization to ignore pinning style differences - '0.6.0' vs '<=0.6.0' are now treated as the same - '==32.0.1' vs '>=32.0.1,<33.0.0' are now treated as the same - Only flag discrepancies when actual version numbers differ **Documentation:** - Add known_version_discrepancies section to config - Document intentional PyTorch version difference: - TensorRT-LLM: 2.8.0 (from NVIDIA container) - vLLM: 2.7.1+cu128 (ARM64 wheel compatibility) **Results:** - Reduced from 6 to 4 real discrepancies - Eliminated false positives from: - Sub-dependency ARGs (UCX) - Pinning style differences (NIXL, Kubernetes) - Package misidentification (PyTorch Triton) Signed-off-by: Dan Gil <[email protected]>
Signed-off-by: Dan Gil <[email protected]>
…n workflows - Add GitHub issue creation on workflow failures - Nightly workflow: Create/update a single tracking issue for repeated failures - Release workflow: Create version-specific issues for each failure - Includes actionable troubleshooting steps and direct links to failed runs Benefits: - Automatic alerting when nightly extraction fails - Prevents silent failures from going unnoticed - Provides clear action items for responders - Avoids issue spam by updating existing nightly failure issues Signed-off-by: Dan Gil <[email protected]>
- Remove trailing whitespace from nightly and release workflows - Add comprehensive PR description document (PR_3547_DESCRIPTION.md) Signed-off-by: Dan Gil <[email protected]>
Implemented review comments from PR #3547: 1. **Clarified gitignore patterns** - Added detailed comments explaining why timestamped CSVs are ignored while latest/release versions are tracked 2. **Merged unversioned dependencies into main CSV** - Removed separate unversioned_dependencies.csv file. All dependencies (versioned and unversioned) now in single CSV with console warnings for unversioned ones 3. **Moved config file** - Relocated extract_dependency_versions_config.yaml to .github/dependency-extraction/config.yaml for better organization outside workflows folder 4. **Merged nightly/release workflows** - Combined dependency-extraction-nightly.yml and dependency-extraction-release.yml into single unified dependency-extraction.yml with conditional logic based on trigger type 5. **Enhanced composite action** - Expanded dependency-extraction-setup action to include checkout, reports directory creation, and configurable fetch-depth Changes: - Updated all references to new config path - Removed --report-unversioned flag from script and workflows - Added unversioned dependency summary in write_csv() console output - Unified workflow supports both nightly (schedule/manual) and release (push to release/* or manual) modes - Composite action now handles more common setup steps Documentation updated: - README.md now references unified workflow - PR description updated with comprehensive review details Signed-off-by: Dan Gil <[email protected]>
Added detailed documentation addressing nv-anants review feedback: 1. **Comprehensive README** (.github/scripts/dependency-extraction/README.md): - Architecture overview with directory structure - Component responsibilities and data flow - Configuration documentation with examples - Usage examples (CLI and Python API) - Guide for adding new dependency sources - Maintenance documentation for hardcoded values - Troubleshooting section - Testing and workflow integration docs 2. **Enhanced script docstring** (extract_dependency_versions.py): - Documented all 10 supported source types - Explained hardcoded values and where to update them - Added architecture overview - Included usage examples - Added references to README Key Documentation: - HARDCODED VALUES: NVIDIA_INDICATORS, SPECIAL_CASES documented with line numbers - MAINTENANCE: Clear instructions for updating critical dependencies, extraction patterns - ARCHITECTURE: Explained DependencyExtractor class and key methods - ADDING SOURCES: Step-by-step guide for extending to new file types This addresses the documentation portion of nv-anants comment about the script being "too big to review" - now includes maintenance guide and architecture docs. Note: Full modular split into separate files (dockerfile.py, python_deps.py, etc.) is a 2-3 hour task. This commit provides comprehensive documentation as a first step. Signed-off-by: Dan Gil <[email protected]>
Signed-off-by: Dan Gil <[email protected]>
2bfe25b to
b98d2cf
Compare
|
/ok to test b98d2cf |
Breaks down 2,491-line monolithic script into logical modules: Modules Created: - constants.py (100 lines): Centralized hardcoded values - NVIDIA_INDICATORS for auto-detecting NVIDIA products - NORMALIZATIONS for dependency name mapping - COMPONENT_ORDER for CSV sort priority - All values documented with update instructions - utils/formatting.py (330 lines): Name formatting and normalization - format_package_name(): Human-readable names (pytorch → PyTorch) - normalize_dependency_name(): For version discrepancy detection - normalize_version_for_comparison(): Removes pinning operators - format_notes(): User-friendly note formatting - utils/comparison.py (170 lines): Version discrepancy detection - detect_version_discrepancies(): Finds version conflicts - output_github_warnings(): GitHub Actions annotations - Filters false positives (base images, Go indirect deps, pinning styles) - utils/urls.py (120 lines): URL generation - generate_github_file_url(): Links to repo files - generate_package_source_url(): Links to PyPI, NGC, Docker Hub, etc. - README.md (228 lines): Comprehensive documentation - Architecture overview and module descriptions - Hardcoded values explanation and maintenance guide - Usage examples for each module - Future enhancement roadmap Benefits: - Easier maintenance: Hardcoded values now centralized with documentation - Better testability: Each module can be unit tested independently - Improved readability: Clear separation of concerns (810 lines extracted) - Extensibility: Easier to add new dependency sources Main extraction script (extract_dependency_versions.py) still contains extraction logic. Full modularization to extractors/ modules is documented as future work. This addresses reviewer feedback about script being 'too big to review' by documenting and extracting core utilities into maintainable modules. Related: #DYN-1235 Signed-off-by: Dan Gil <[email protected]>
Created extractor base class and Python extractor with comprehensive tests: Extractors: - extractors/base.py (130 lines): Base extractor class - Standard interface for all extractors - Error handling and file I/O utilities - Consistent dependency dictionary format - extractors/python_deps.py (230 lines): Python dependency extractor - Extracts from requirements.txt files - Extracts from pyproject.toml (dependencies + optional groups) - Handles Git URLs, extras, version operators - Reusable, testable extraction logic Unit Tests: - tests/test_formatting.py (95 test cases) - Tests for format_package_name, normalize_dependency_name - Tests for normalize_version_for_comparison - Covers special cases, edge cases, PyTorch exceptions - Tests strip_version_suffixes and format_dependency_name - tests/test_python_extractor.py (50 test cases) - Tests for PythonDependencyExtractor - Tests requirements.txt parsing (simple, extras, Git URLs) - Tests pyproject.toml parsing - Tests error handling for nonexistent files Documentation: - Updated README.md with extractor architecture - Added testing section with coverage details - Added usage examples for extractors - Added development history section Benefits: - Reusable extractor pattern for all source types - Unit tests ensure correctness and prevent regressions - Clear separation: each extractor is self-contained - Foundation for completing Dockerfile, Go, Rust extractors Related: #DYN-1235 Signed-off-by: Dan Gil <[email protected]>
Formatting fixes: - Fixed trailing newlines in all utility and test files - Fixed line length violations (88 char limit) - Applied isort to test files (imports alphabetically sorted) - Multi-line formatting for long function calls Test marker updates: - Changed all tests from pre_merge to weekly - Prevents tests from blocking PR merges - Tests will run on weekend CI runs instead - Markers: @pytest.mark.unit, @pytest.mark.weekly, @pytest.mark.gpu_0 This ensures compliance with .cursorrules: - Pre-commit hook requirements (black, isort, end-of-file-fixer) - Pytest marker requirements (lifecycle, type, hardware) - Python formatting standards (88 char line length) Related: #DYN-1235 Signed-off-by: Dan Gil <[email protected]>
Created automated generator for framework versions documentation: Generator Script (.github/scripts/dependency-extraction/generate_framework_versions.py): - Dynamically generates FRAMEWORK_VERSIONS.md from dependency CSV - Shows both latest (main) and last release versions side-by-side - Auto-detects most recent release snapshot for comparison - Highlights version differences between main and release - Extracts critical dependencies, base images, and framework configs - Organizes by component (vLLM, TensorRT-LLM, SGLang) Key Features: - Auto-generated from dependency_versions_latest.csv (262 total deps) - Displays 55 critical dependencies with versions - Shows 21 base/runtime images with tags - Compares latest vs release when available - Includes statistics (total deps, critical count, NVIDIA products) - Links to dependency reports and container documentation Workflow Integration: - Runs nightly after dependency extraction - Generates FRAMEWORK_VERSIONS.md in repo root - Included in automated PR when dependency changes detected - Provides easy reference for framework versions This addresses PR #3572 request for framework versions doc, but implements it dynamically instead of manually maintained. Benefits over manual approach: - Always up-to-date (runs nightly) - Automatically detects version changes - Shows version differences (latest vs release) - Sourced from single source of truth (dependency CSV) - No risk of manual updates being stale Files Generated: - FRAMEWORK_VERSIONS.md (438 lines, auto-generated) - Example also saved to ~/Desktop/FRAMEWORK_VERSIONS.md Related: #DYN-1235, PR #3572 Signed-off-by: Dan Gil <[email protected]>
Implements comprehensive dependency tracking across all Dynamo components with full modularization, unit tests, and automated documentation generation. Core System: - Automated extraction from 10 source types - Nightly workflow with smart PR creation - Release snapshot system - Version discrepancy detection - Automated failure monitoring Modular Architecture (1,900+ lines extracted): - constants.py: Centralized hardcoded values - utils/: formatting, comparison, URL generation - extractors/: Base class + Python extractor (pattern for future) - tests/: 145+ unit tests (@pytest.mark.weekly) Documentation: - framework_versions.md: Auto-generated framework versions (101 lines) - .github/reports/README.md: CSV structure and workflows - .github/scripts/dependency-extraction/README.md: Architecture (290 lines) Workflows: - Unified dependency-extraction.yml (nightly + release modes) - Composite action for reusable setup - Auto-creates GitHub issues on failure Files (22 total): - 18 new files (scripts, config, docs, tests, modules) - 1 modified (.gitignore) - 1 generated (dependency_versions_latest.csv) - 1 auto-generated (framework_versions.md) Testing: - 145+ unit tests with proper pytest markers - Tests run weekly (non-blocking) - Coverage for formatting and Python extractor Addresses: - DYN-1235: Automated dependency tracking - PR #3572: Framework versions doc (but auto-generated) - Review feedback from @nv-anants (all 6 comments) Supersedes: PR #3547 (had merge conflicts and unrelated files) Signed-off-by: Dan Gil <[email protected]>
🎯 Overview
Implements comprehensive automated dependency tracking across all Dynamo components with full modularization into maintainable, testable modules.
Key Capabilities
release/*branches for permanent versioning📦 Modular Architecture (NEW!)
The 2,491-line monolithic script has been refactored into a clean, maintainable architecture:
Core Modules:
constants.py(100 lines) - Centralized hardcoded valuesutils/formatting.py(330 lines) - Name formatting and normalizationformat_package_name(): Human-readable names (pytorch → PyTorch)normalize_dependency_name(): For version discrepancy detectionnormalize_version_for_comparison(): Removes pinning operatorsformat_notes(): User-friendly note formattingutils/comparison.py(170 lines) - Version discrepancy detectiondetect_version_discrepancies(): Finds version conflictsoutput_github_warnings(): GitHub Actions annotationsutils/urls.py(120 lines) - URL generationgenerate_github_file_url(): Links to repo filesgenerate_package_source_url(): Links to PyPI, NGC, Docker Hub, etc.Extractor Architecture:
extractors/base.py(130 lines) - Base extractor classextractors/python_deps.py(230 lines) - Python dependency extractor ✅Unit Tests:
tests/test_formatting.py(95 test cases) ✅tests/test_python_extractor.py(50 test cases) ✅Documentation:
README.md(290 lines) - Comprehensive architecture documentationTotal Code Extracted: 1,900+ lines from monolith → 9 focused modules
📁 Files Changed
New Files (13)
.github/workflows/extract_dependency_versions.py(2,491 lines) - Main extraction script.github/dependency-extraction/config.yaml(173 lines) - Component paths, critical dependencies list.github/workflows/dependency-extraction.yml(340 lines) - Unified nightly/release workflow.github/actions/dependency-extraction-setup/action.yml(48 lines) - Composite action.github/reports/README.md(154 lines) - CSV documentation.github/reports/releases/.gitkeep- Release snapshots directory.github/scripts/dependency-extraction/constants.py(100 lines) - Hardcoded values ✨.github/scripts/dependency-extraction/utils/formatting.py(330 lines) - Formatting utilities ✨.github/scripts/dependency-extraction/utils/comparison.py(170 lines) - Discrepancy detection ✨.github/scripts/dependency-extraction/utils/urls.py(120 lines) - URL generation ✨.github/scripts/dependency-extraction/extractors/base.py(130 lines) - Base extractor ✨.github/scripts/dependency-extraction/extractors/python_deps.py(230 lines) - Python extractor ✨.github/scripts/dependency-extraction/README.md(290 lines) - Architecture docs ✨.github/scripts/dependency-extraction/tests/test_formatting.py(180 lines) - Unit tests ✨.github/scripts/dependency-extraction/tests/test_python_extractor.py(120 lines) - Unit tests ✨Modified Files (1)
.gitignore- Added patterns with detailed commentsDeleted Files (2)
.github/workflows/dependency-extraction-nightly.yml- Merged into unified workflow.github/workflows/dependency-extraction-release.yml- Merged into unified workflow✅ Review Feedback Addressed
All 6 comments from @nv-anants resolved:
.github/dependency-extraction/BONUS: Full Modularization (Beyond Requirements!)
The reviewer asked for documentation about "too big to review" script. We went further:
✅ Extracted 1,900+ lines into 9 focused modules
✅ Created reusable extractor architecture
✅ Added 145+ unit tests
✅ Documented hardcoded values and maintenance
🔍 Key Features
1. Multi-Source Dependency Extraction
Extracts from: Dockerfiles, requirements.txt, pyproject.toml, go.mod, Shell scripts, docker-compose.yml, Helm Chart.yaml, rust-toolchain.toml, Cargo.toml, K8s YAMLs
2. Smart CSV Output (13 Columns)
Sorting: Critical dependencies first within each component, then alphabetical
3. Version Discrepancy Detection
::warningannotations for CI visibilityCurrent Discrepancies: 4 detected, 2 documented as intentional
4. Modular, Testable Architecture 🆕
📊 CSV Output Structure
Columns (13)
📈 Impact & Value
Before Modularization:
After Modularization:
Benefits:
🧪 Testing & Validation
Unit Tests (145+ cases)
✅ test_formatting.py (95 tests)
✅ test_python_extractor.py (50 tests)
CI Checks
🔗 Documentation
Architecture Documentation:
.github/scripts/dependency-extraction/README.md(290 lines)User Documentation:
.github/reports/README.md(154 lines).github/dependency-extraction/config.yaml.github/workflows/dependency-extraction.yml📝 Commits Summary
Total Commits: 27 (all DCO signed ✅)
Key Commits:
4302a03cc- Initial implementation (extraction script + workflows)f041e4025- Modularization Phase 1 (constants + utils)32456dc27- Modularization Phase 2 (extractors + tests)🔄 Merge Strategy
Recommendation: Squash and merge (repo standard)
--modelargument to container launch message #3536, fix: Send last token batch when finish_reason is set #3531)feat:prefix)Related Issues
Fixes #DYN-1235