Build: Add agentic workflow duplicate-code-detector#34060
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdds a new multi-stage GitHub Actions workflow and documentation for automated duplicate-code detection, implementing activation, agent execution (Copilot + MCP), optional threat-detection, safe-outputs handling, artifact orchestration, and final conclusion/reporting (workflow +1158 lines; docs +260 lines). Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.github/workflows/duplicate-code-detector.lock.yml (1)
658-665: Consider pinningserena-mcp-serverto a specific version instead oflatest.The
serena-mcp-servercontainer uses thelatesttag, which can lead to non-reproducible builds and unexpected breaking changes if the image is updated. Other images in this workflow (e.g.,github-mcp-server:v0.31.0,gh-aw-mcpg:v0.1.8) are properly pinned to specific versions.Since this is an auto-generated file, the fix would need to be applied in the source
.mdfile or the gh-aw compiler configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/duplicate-code-detector.lock.yml around lines 658 - 665, Replace the unpinned serena image reference ("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the workflow definition for the "serena" container; because this file is autogenerated, make the change in the generator source (the corresponding .md or gh-aw compiler configuration) so future regenerations keep the pinned tag rather than "latest".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 1130-1131: The lock file's GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG is
missing fields present in the source workflow; regenerate the lock so
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG includes labels (code-quality,
automated-analysis, cookie), title-prefix, expires, group and uses the same max
(3) as the source; update the value assigned to
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG (and keep GH_AW_ASSIGN_COPILOT unchanged) by
regenerating the .lock.yml from the canonical workflow to ensure all handler
config fields match.
- Around line 331-332: The lock file shows {"create_issue":{"max":1}} under
GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md
specifies max: 3; update the gh-aw compiler code that serializes safe-outputs
(look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the
serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the
source's create_issue.max value instead of defaulting to 1 so the generated lock
contains "create_issue":{"max":3} matching the source.
In @.github/workflows/duplicate-code-detector.md:
- Around line 12-13: The Serena tool config currently lists only Go (serena:
["go"]) while the duplicate-code workflow includes .cjs/CommonJS files; update
the configuration to either add JavaScript/Node support (e.g., include
"javascript" or "nodejs" in the serena list) or restrict the workflow's analysis
patterns to only Go files so Serena's enabled languages match the analysis
scope; change the entry referencing serena: ["go"] or adjust the workflow file
references to .cjs to resolve the mismatch.
- Around line 14-21: The workflow's create-issue labels include an unexpected
"cookie" label (see the create-issue block and safe-outputs > create-issue >
labels), which looks like a stray or copy-paste artifact; confirm whether
"cookie" is required and if not remove it from the labels array or replace it
with the intended label (e.g., add a more relevant label such as "infra" or
"triage"), updating the labels entry under create-issue accordingly so the
duplicate-code detector only assigns appropriate labels.
---
Nitpick comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 658-665: Replace the unpinned serena image reference
("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by
your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the
workflow definition for the "serena" container; because this file is
autogenerated, make the change in the generator source (the corresponding .md or
gh-aw compiler configuration) so future regenerations keep the pinned tag rather
than "latest".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c58ff7fa-631d-46fe-8b69-fcec184f044c
📒 Files selected for processing (2)
.github/workflows/duplicate-code-detector.lock.yml.github/workflows/duplicate-code-detector.md
| {"create_issue":{"max":1},"missing_data":{},"missing_tool":{},"noop":{"max":1}} | ||
| GH_AW_SAFE_OUTPUTS_CONFIG_EOF |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the source .md file for the max configuration
cat .github/workflows/duplicate-code-detector.md | head -30Repository: storybookjs/storybook
Length of output: 745
🏁 Script executed:
sed -n '330,333p' .github/workflows/duplicate-code-detector.lock.ymlRepository: storybookjs/storybook
Length of output: 364
🏁 Script executed:
#!/bin/bash
# Check if there are other safe-outputs configurations in lock files
rg '"create_issue"' --type yaml -A 2 -B 2Repository: storybookjs/storybook
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Look for other lock files and understand the pattern
find .github/workflows -name "*.lock.yml" | head -10
# Also check for documentation about gh-aw or the compiler
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs rg -l "gh-aw\|max.*issue" 2>/dev/null | head -10Repository: storybookjs/storybook
Length of output: 158
🏁 Script executed:
#!/bin/bash
# Check code-simplifier lock file for safe-outputs config
rg '"create_issue"' .github/workflows/code-simplifier.lock.yml -A 1 -B 1
# Check the source md file for code-simplifier
cat .github/workflows/code-simplifier.md | head -30Repository: storybookjs/storybook
Length of output: 735
🏁 Script executed:
#!/bin/bash
# Check when these files were last modified
git log --oneline -10 -- .github/workflows/duplicate-code-detector.lock.yml .github/workflows/duplicate-code-detector.mdRepository: storybookjs/storybook
Length of output: 120
Fix safe-outputs max issue count: lock file has max: 1 but source specifies max: 3
The safe-outputs config in the lock file (lines 331-332) sets "create_issue":{"max":1}, while the source .github/workflows/duplicate-code-detector.md (line 20) specifies max: 3. The documentation explicitly states "Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run)" and "Limit to the top 3 most significant patterns."
This inconsistency prevents the workflow from creating multiple issues as intended. Since the lock file is auto-generated from the source, the gh-aw compiler must be corrected to properly preserve the max: 3 configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/duplicate-code-detector.lock.yml around lines 331 - 332,
The lock file shows {"create_issue":{"max":1}} under
GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md
specifies max: 3; update the gh-aw compiler code that serializes safe-outputs
(look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the
serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the
source's create_issue.max value instead of defaulting to 1 so the generated lock
contains "create_issue":{"max":3} matching the source.
| safe-outputs: | ||
| create-issue: | ||
| expires: 2d | ||
| title-prefix: "[duplicate-code] " | ||
| labels: [code-quality, automated-analysis, cookie] | ||
| assignees: copilot | ||
| group: true | ||
| max: 3 |
There was a problem hiding this comment.
Verify the cookie label is intentional.
The labels include code-quality, automated-analysis, and cookie. The first two are descriptive of the workflow's purpose, but cookie seems out of place for a duplicate code detection workflow. Is this intentional or a copy-paste artifact?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/duplicate-code-detector.md around lines 14 - 21, The
workflow's create-issue labels include an unexpected "cookie" label (see the
create-issue block and safe-outputs > create-issue > labels), which looks like a
stray or copy-paste artifact; confirm whether "cookie" is required and if not
remove it from the labels array or replace it with the intended label (e.g., add
a more relevant label such as "infra" or "triage"), updating the labels entry
under create-issue accordingly so the duplicate-code detector only assigns
appropriate labels.
…nd refine issue creation criteria
|
View your CI Pipeline Execution ↗ for commit 30a1d20
☁️ Nx Cloud last updated this comment at |
Add agentic workflow duplicate-code-detector
Summary by CodeRabbit
New Features
Documentation
Security & Reporting