Build: Add agentic workflow duplicate-code-detector by valentinpalkovic · Pull Request #34060 · storybookjs/storybook

valentinpalkovic · 2026-03-07T19:57:14Z

Add agentic workflow duplicate-code-detector

Summary by CodeRabbit

New Features
- Added an automated duplicate-code detection workflow with daily runs and manual triggering
- Detects exact, structural, functional, and copy-paste duplication patterns and reports up to three distinct findings
- Automatically files issues for detected duplicates using a standardized report template
Documentation
- Added detailed workflow documentation describing analysis scope, reporting thresholds, and operational guidance
Security & Reporting
- Adds optional threat-detection and safe-output processing, plus consolidated analysis artifacts and summaries

github-actions · 2026-03-07T19:57:35Z

	Fails
🚫	PR description is missing the mandatory "#### Manual testing" section. Please add it so that reviewers know how to manually test your changes.

Generated by 🚫 dangerJS against 30a1d20

coderabbitai · 2026-03-07T20:01:02Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a new multi-stage GitHub Actions workflow and documentation for automated duplicate-code detection, implementing activation, agent execution (Copilot + MCP), optional threat-detection, safe-outputs handling, artifact orchestration, and final conclusion/reporting (workflow +1158 lines; docs +260 lines).

Changes

Cohort / File(s)	Summary
Workflow file `.github/workflows/duplicate-code-detector.lock.yml`	New, large GitHub Actions workflow implementing: activation (setup, token/metadata, artifact upload), agent phase (install Copilot CLI/awf, start/configure MCP gateway, run Copilot agent, redact secrets), inline conditional threat-detection session, safe-outputs orchestration, extensive artifact upload/download and error handling.
Documentation `.github/workflows/duplicate-code-detector.md`	New documentation describing workflow triggers (manual, daily), analysis scope (ts/tsx files excluding tests/workflows), tool sequence (activate_project, list_dir, find_file, get_symbols_overview, read_file, search_for_pattern, find_symbol, find_referencing_symbols), duplication classification (exact, structural, functional, copy-paste), reporting thresholds and issue templates.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant Act as Activation step
participant AG as Agent (Copilot CLI)
participant MCP as MCP Gateway/Server
participant TS as Threat-Detection Session
participant AS as Artifact Storage
participant CL as Conclusion step

GH->>Act: start activation (setup, tokens, prompts)
Act->>AS: upload activation artifacts
GH->>MCP: provision MCP gateway/config
GH->>AG: install/run Copilot agent (connects to MCP)
AG->>MCP: fetch configs/images / execute analysis
AG->>AS: upload agent outputs (logs, redacted results)
GH->>TS: conditional threat-detection run (restricted tools)
TS->>AS: upload threat-detection logs
GH->>CL: download artifacts (activation, agent, threat)
CL->>AS: produce final summary & upload final artifacts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

.github/workflows/duplicate-code-detector.lock.yml (1)
658-665: Consider pinning serena-mcp-server to a specific version instead of latest.

The serena-mcp-server container uses the latest tag, which can lead to non-reproducible builds and unexpected breaking changes if the image is updated. Other images in this workflow (e.g., github-mcp-server:v0.31.0, gh-aw-mcpg:v0.1.8) are properly pinned to specific versions.

Since this is an auto-generated file, the fix would need to be applied in the source .md file or the gh-aw compiler configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/duplicate-code-detector.lock.yml around lines 658 - 665,
Replace the unpinned serena image reference
("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by
your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the
workflow definition for the "serena" container; because this file is
autogenerated, make the change in the generator source (the corresponding .md or
gh-aw compiler configuration) so future regenerations keep the pinned tag rather
than "latest".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 1130-1131: The lock file's GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG is
missing fields present in the source workflow; regenerate the lock so
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG includes labels (code-quality,
automated-analysis, cookie), title-prefix, expires, group and uses the same max
(3) as the source; update the value assigned to
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG (and keep GH_AW_ASSIGN_COPILOT unchanged) by
regenerating the .lock.yml from the canonical workflow to ensure all handler
config fields match.
- Around line 331-332: The lock file shows {"create_issue":{"max":1}} under
GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md
specifies max: 3; update the gh-aw compiler code that serializes safe-outputs
(look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the
serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the
source's create_issue.max value instead of defaulting to 1 so the generated lock
contains "create_issue":{"max":3} matching the source.

In @.github/workflows/duplicate-code-detector.md:
- Around line 12-13: The Serena tool config currently lists only Go (serena:
["go"]) while the duplicate-code workflow includes .cjs/CommonJS files; update
the configuration to either add JavaScript/Node support (e.g., include
"javascript" or "nodejs" in the serena list) or restrict the workflow's analysis
patterns to only Go files so Serena's enabled languages match the analysis
scope; change the entry referencing serena: ["go"] or adjust the workflow file
references to .cjs to resolve the mismatch.
- Around line 14-21: The workflow's create-issue labels include an unexpected
"cookie" label (see the create-issue block and safe-outputs > create-issue >
labels), which looks like a stray or copy-paste artifact; confirm whether
"cookie" is required and if not remove it from the labels array or replace it
with the intended label (e.g., add a more relevant label such as "infra" or
"triage"), updating the labels entry under create-issue accordingly so the
duplicate-code detector only assigns appropriate labels.

---

Nitpick comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 658-665: Replace the unpinned serena image reference
("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by
your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the
workflow definition for the "serena" container; because this file is
autogenerated, make the change in the generator source (the corresponding .md or
gh-aw compiler configuration) so future regenerations keep the pinned tag rather
than "latest".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c58ff7fa-631d-46fe-8b69-fcec184f044c

📥 Commits

Reviewing files that changed from the base of the PR and between e68b5f8 and 9ad05b8.

📒 Files selected for processing (2)

.github/workflows/duplicate-code-detector.lock.yml
.github/workflows/duplicate-code-detector.md

coderabbitai · 2026-03-07T20:01:05Z

+          {"create_issue":{"max":1},"missing_data":{},"missing_tool":{},"noop":{"max":1}}
+          GH_AW_SAFE_OUTPUTS_CONFIG_EOF


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the source .md file for the max configuration cat .github/workflows/duplicate-code-detector.md | head -30

Repository: storybookjs/storybook

Length of output: 745

🏁 Script executed:

sed -n '330,333p' .github/workflows/duplicate-code-detector.lock.yml

Repository: storybookjs/storybook

Length of output: 364

🏁 Script executed:

#!/bin/bash # Check if there are other safe-outputs configurations in lock files rg '"create_issue"' --type yaml -A 2 -B 2

Repository: storybookjs/storybook

Length of output: 47

🏁 Script executed:

#!/bin/bash # Look for other lock files and understand the pattern find .github/workflows -name "*.lock.yml" | head -10 # Also check for documentation about gh-aw or the compiler find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs rg -l "gh-aw\|max.*issue" 2>/dev/null | head -10

Repository: storybookjs/storybook

Length of output: 158

🏁 Script executed:

#!/bin/bash # Check code-simplifier lock file for safe-outputs config rg '"create_issue"' .github/workflows/code-simplifier.lock.yml -A 1 -B 1 # Check the source md file for code-simplifier cat .github/workflows/code-simplifier.md | head -30

Repository: storybookjs/storybook

Length of output: 735

🏁 Script executed:

#!/bin/bash # Check when these files were last modified git log --oneline -10 -- .github/workflows/duplicate-code-detector.lock.yml .github/workflows/duplicate-code-detector.md

Repository: storybookjs/storybook

Length of output: 120

Fix safe-outputs max issue count: lock file has max: 1 but source specifies max: 3

The safe-outputs config in the lock file (lines 331-332) sets "create_issue":{"max":1}, while the source .github/workflows/duplicate-code-detector.md (line 20) specifies max: 3. The documentation explicitly states "Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run)" and "Limit to the top 3 most significant patterns."

This inconsistency prevents the workflow from creating multiple issues as intended. Since the lock file is auto-generated from the source, the gh-aw compiler must be corrected to properly preserve the max: 3 configuration.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/duplicate-code-detector.lock.yml around lines 331 - 332, The lock file shows {"create_issue":{"max":1}} under GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md specifies max: 3; update the gh-aw compiler code that serializes safe-outputs (look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the source's create_issue.max value instead of defaulting to 1 so the generated lock contains "create_issue":{"max":3} matching the source.

coderabbitai · 2026-03-07T20:01:05Z

+safe-outputs:
+  create-issue:
+    expires: 2d
+    title-prefix: "[duplicate-code] "
+    labels: [code-quality, automated-analysis, cookie]
+    assignees: copilot
+    group: true
+    max: 3


⚠️ Potential issue | 🟡 Minor

Verify the cookie label is intentional.

The labels include code-quality, automated-analysis, and cookie. The first two are descriptive of the workflow's purpose, but cookie seems out of place for a duplicate code detection workflow. Is this intentional or a copy-paste artifact?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/duplicate-code-detector.md around lines 14 - 21, The workflow's create-issue labels include an unexpected "cookie" label (see the create-issue block and safe-outputs > create-issue > labels), which looks like a stray or copy-paste artifact; confirm whether "cookie" is required and if not remove it from the labels array or replace it with the intended label (e.g., add a more relevant label such as "infra" or "triage"), updating the labels entry under create-issue accordingly so the duplicate-code detector only assigns appropriate labels.

…nd refine issue creation criteria

nx-cloud · 2026-03-07T20:25:10Z

View your CI Pipeline Execution ↗ for commit 30a1d20

Command	Status	Duration	Result
`nx run-many -t compile,check,knip,test,pretty-d...`	❌ Failed	6m 12s	View ↗

☁️ Nx Cloud last updated this comment at 2026-03-07 20:32:56 UTC

Add agentic workflow duplicate-code-detector

9ad05b8

coderabbitai Bot reviewed Mar 7, 2026

View reviewed changes

Update duplicate code detector workflow to analyze TypeScript files a…

30a1d20

…nd refine issue creation criteria

valentinpalkovic changed the title ~~Add agentic workflow duplicate-code-detector~~ Build: Add agentic workflow duplicate-code-detector Mar 7, 2026

valentinpalkovic self-assigned this Mar 7, 2026

valentinpalkovic added this to Core Team Projects Mar 7, 2026

valentinpalkovic moved this to In Progress in Core Team Projects Mar 7, 2026

valentinpalkovic added build Internal-facing build tooling & test updates ci:normal labels Mar 7, 2026

valentinpalkovic merged commit 2a44aa7 into next Mar 7, 2026
14 of 24 checks passed

valentinpalkovic deleted the add-workflow-duplicate-code-detector-3466 branch March 7, 2026 20:25

github-project-automation Bot moved this from In Progress to Done in Core Team Projects Mar 7, 2026

This was referenced Mar 7, 2026

Release: Prerelease 10.3.0-alpha.15 #34047

Closed

[aw] No-Op Runs #34059

Open

Release: Prerelease 10.3.0-alpha.15 #34074

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build: Add agentic workflow duplicate-code-detector#34060

Build: Add agentic workflow duplicate-code-detector#34060
valentinpalkovic merged 2 commits into
nextfrom
add-workflow-duplicate-code-detector-3466

valentinpalkovic commented Mar 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 7, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 7, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 7, 2026

Uh oh!

Uh oh!

nx-cloud Bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		{"create_issue":{"max":1},"missing_data":{},"missing_tool":{},"noop":{"max":1}}
		GH_AW_SAFE_OUTPUTS_CONFIG_EOF

Uh oh!

Conversation

valentinpalkovic commented Mar 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nx-cloud Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

valentinpalkovic commented Mar 7, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Mar 7, 2026 •

edited

Loading

coderabbitai Bot commented Mar 7, 2026 •

edited

Loading

nx-cloud Bot commented Mar 7, 2026 •

edited

Loading