Skip to content

Build: Add agentic workflow duplicate-code-detector#34060

Merged
valentinpalkovic merged 2 commits into
nextfrom
add-workflow-duplicate-code-detector-3466
Mar 7, 2026
Merged

Build: Add agentic workflow duplicate-code-detector#34060
valentinpalkovic merged 2 commits into
nextfrom
add-workflow-duplicate-code-detector-3466

Conversation

@valentinpalkovic
Copy link
Copy Markdown
Contributor

@valentinpalkovic valentinpalkovic commented Mar 7, 2026

Add agentic workflow duplicate-code-detector

Summary by CodeRabbit

  • New Features

    • Added an automated duplicate-code detection workflow with daily runs and manual triggering
    • Detects exact, structural, functional, and copy-paste duplication patterns and reports up to three distinct findings
    • Automatically files issues for detected duplicates using a standardized report template
  • Documentation

    • Added detailed workflow documentation describing analysis scope, reporting thresholds, and operational guidance
  • Security & Reporting

    • Adds optional threat-detection and safe-output processing, plus consolidated analysis artifacts and summaries

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 7, 2026

Fails
🚫 PR description is missing the mandatory "#### Manual testing" section. Please add it so that reviewers know how to manually test your changes.

Generated by 🚫 dangerJS against 30a1d20

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 7, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a new multi-stage GitHub Actions workflow and documentation for automated duplicate-code detection, implementing activation, agent execution (Copilot + MCP), optional threat-detection, safe-outputs handling, artifact orchestration, and final conclusion/reporting (workflow +1158 lines; docs +260 lines).

Changes

Cohort / File(s) Summary
Workflow file
.github/workflows/duplicate-code-detector.lock.yml
New, large GitHub Actions workflow implementing: activation (setup, token/metadata, artifact upload), agent phase (install Copilot CLI/awf, start/configure MCP gateway, run Copilot agent, redact secrets), inline conditional threat-detection session, safe-outputs orchestration, extensive artifact upload/download and error handling.
Documentation
.github/workflows/duplicate-code-detector.md
New documentation describing workflow triggers (manual, daily), analysis scope (ts/tsx files excluding tests/workflows), tool sequence (activate_project, list_dir, find_file, get_symbols_overview, read_file, search_for_pattern, find_symbol, find_referencing_symbols), duplication classification (exact, structural, functional, copy-paste), reporting thresholds and issue templates.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant Act as Activation step
participant AG as Agent (Copilot CLI)
participant MCP as MCP Gateway/Server
participant TS as Threat-Detection Session
participant AS as Artifact Storage
participant CL as Conclusion step

GH->>Act: start activation (setup, tokens, prompts)
Act->>AS: upload activation artifacts
GH->>MCP: provision MCP gateway/config
GH->>AG: install/run Copilot agent (connects to MCP)
AG->>MCP: fetch configs/images / execute analysis
AG->>AS: upload agent outputs (logs, redacted results)
GH->>TS: conditional threat-detection run (restricted tools)
TS->>AS: upload threat-detection logs
GH->>CL: download artifacts (activation, agent, threat)
CL->>AS: produce final summary & upload final artifacts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
.github/workflows/duplicate-code-detector.lock.yml (1)

658-665: Consider pinning serena-mcp-server to a specific version instead of latest.

The serena-mcp-server container uses the latest tag, which can lead to non-reproducible builds and unexpected breaking changes if the image is updated. Other images in this workflow (e.g., github-mcp-server:v0.31.0, gh-aw-mcpg:v0.1.8) are properly pinned to specific versions.

Since this is an auto-generated file, the fix would need to be applied in the source .md file or the gh-aw compiler configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/duplicate-code-detector.lock.yml around lines 658 - 665,
Replace the unpinned serena image reference
("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by
your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the
workflow definition for the "serena" container; because this file is
autogenerated, make the change in the generator source (the corresponding .md or
gh-aw compiler configuration) so future regenerations keep the pinned tag rather
than "latest".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 1130-1131: The lock file's GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG is
missing fields present in the source workflow; regenerate the lock so
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG includes labels (code-quality,
automated-analysis, cookie), title-prefix, expires, group and uses the same max
(3) as the source; update the value assigned to
GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG (and keep GH_AW_ASSIGN_COPILOT unchanged) by
regenerating the .lock.yml from the canonical workflow to ensure all handler
config fields match.
- Around line 331-332: The lock file shows {"create_issue":{"max":1}} under
GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md
specifies max: 3; update the gh-aw compiler code that serializes safe-outputs
(look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the
serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the
source's create_issue.max value instead of defaulting to 1 so the generated lock
contains "create_issue":{"max":3} matching the source.

In @.github/workflows/duplicate-code-detector.md:
- Around line 12-13: The Serena tool config currently lists only Go (serena:
["go"]) while the duplicate-code workflow includes .cjs/CommonJS files; update
the configuration to either add JavaScript/Node support (e.g., include
"javascript" or "nodejs" in the serena list) or restrict the workflow's analysis
patterns to only Go files so Serena's enabled languages match the analysis
scope; change the entry referencing serena: ["go"] or adjust the workflow file
references to .cjs to resolve the mismatch.
- Around line 14-21: The workflow's create-issue labels include an unexpected
"cookie" label (see the create-issue block and safe-outputs > create-issue >
labels), which looks like a stray or copy-paste artifact; confirm whether
"cookie" is required and if not remove it from the labels array or replace it
with the intended label (e.g., add a more relevant label such as "infra" or
"triage"), updating the labels entry under create-issue accordingly so the
duplicate-code detector only assigns appropriate labels.

---

Nitpick comments:
In @.github/workflows/duplicate-code-detector.lock.yml:
- Around line 658-665: Replace the unpinned serena image reference
("ghcr.io/github/serena-mcp-server:latest") with a specific version tag used by
your release process (e.g., "ghcr.io/github/serena-mcp-server:vX.Y.Z") in the
workflow definition for the "serena" container; because this file is
autogenerated, make the change in the generator source (the corresponding .md or
gh-aw compiler configuration) so future regenerations keep the pinned tag rather
than "latest".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c58ff7fa-631d-46fe-8b69-fcec184f044c

📥 Commits

Reviewing files that changed from the base of the PR and between e68b5f8 and 9ad05b8.

📒 Files selected for processing (2)
  • .github/workflows/duplicate-code-detector.lock.yml
  • .github/workflows/duplicate-code-detector.md

Comment on lines +331 to +332
{"create_issue":{"max":1},"missing_data":{},"missing_tool":{},"noop":{"max":1}}
GH_AW_SAFE_OUTPUTS_CONFIG_EOF
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the source .md file for the max configuration
cat .github/workflows/duplicate-code-detector.md | head -30

Repository: storybookjs/storybook

Length of output: 745


🏁 Script executed:

sed -n '330,333p' .github/workflows/duplicate-code-detector.lock.yml

Repository: storybookjs/storybook

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Check if there are other safe-outputs configurations in lock files
rg '"create_issue"' --type yaml -A 2 -B 2

Repository: storybookjs/storybook

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Look for other lock files and understand the pattern
find .github/workflows -name "*.lock.yml" | head -10
# Also check for documentation about gh-aw or the compiler
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs rg -l "gh-aw\|max.*issue" 2>/dev/null | head -10

Repository: storybookjs/storybook

Length of output: 158


🏁 Script executed:

#!/bin/bash
# Check code-simplifier lock file for safe-outputs config
rg '"create_issue"' .github/workflows/code-simplifier.lock.yml -A 1 -B 1
# Check the source md file for code-simplifier
cat .github/workflows/code-simplifier.md | head -30

Repository: storybookjs/storybook

Length of output: 735


🏁 Script executed:

#!/bin/bash
# Check when these files were last modified
git log --oneline -10 -- .github/workflows/duplicate-code-detector.lock.yml .github/workflows/duplicate-code-detector.md

Repository: storybookjs/storybook

Length of output: 120


Fix safe-outputs max issue count: lock file has max: 1 but source specifies max: 3

The safe-outputs config in the lock file (lines 331-332) sets "create_issue":{"max":1}, while the source .github/workflows/duplicate-code-detector.md (line 20) specifies max: 3. The documentation explicitly states "Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run)" and "Limit to the top 3 most significant patterns."

This inconsistency prevents the workflow from creating multiple issues as intended. Since the lock file is auto-generated from the source, the gh-aw compiler must be corrected to properly preserve the max: 3 configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/duplicate-code-detector.lock.yml around lines 331 - 332,
The lock file shows {"create_issue":{"max":1}} under
GH_AW_SAFE_OUTPUTS_CONFIG_EOF but the source duplicate-code-detector.md
specifies max: 3; update the gh-aw compiler code that serializes safe-outputs
(look for the function/method that emits GH_AW_SAFE_OUTPUTS_CONFIG_EOF or the
serializeSafeOutputs/generateSafeOutputsLockEntry routine) to preserve the
source's create_issue.max value instead of defaulting to 1 so the generated lock
contains "create_issue":{"max":3} matching the source.

Comment thread .github/workflows/duplicate-code-detector.lock.yml
Comment thread .github/workflows/duplicate-code-detector.md Outdated
Comment on lines +14 to +21
safe-outputs:
create-issue:
expires: 2d
title-prefix: "[duplicate-code] "
labels: [code-quality, automated-analysis, cookie]
assignees: copilot
group: true
max: 3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify the cookie label is intentional.

The labels include code-quality, automated-analysis, and cookie. The first two are descriptive of the workflow's purpose, but cookie seems out of place for a duplicate code detection workflow. Is this intentional or a copy-paste artifact?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/duplicate-code-detector.md around lines 14 - 21, The
workflow's create-issue labels include an unexpected "cookie" label (see the
create-issue block and safe-outputs > create-issue > labels), which looks like a
stray or copy-paste artifact; confirm whether "cookie" is required and if not
remove it from the labels array or replace it with the intended label (e.g., add
a more relevant label such as "infra" or "triage"), updating the labels entry
under create-issue accordingly so the duplicate-code detector only assigns
appropriate labels.

@valentinpalkovic valentinpalkovic changed the title Add agentic workflow duplicate-code-detector Build: Add agentic workflow duplicate-code-detector Mar 7, 2026
@valentinpalkovic valentinpalkovic self-assigned this Mar 7, 2026
@valentinpalkovic valentinpalkovic moved this to In Progress in Core Team Projects Mar 7, 2026
@valentinpalkovic valentinpalkovic added build Internal-facing build tooling & test updates ci:normal labels Mar 7, 2026
@valentinpalkovic valentinpalkovic merged commit 2a44aa7 into next Mar 7, 2026
14 of 24 checks passed
@valentinpalkovic valentinpalkovic deleted the add-workflow-duplicate-code-detector-3466 branch March 7, 2026 20:25
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Core Team Projects Mar 7, 2026
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Mar 7, 2026

View your CI Pipeline Execution ↗ for commit 30a1d20

Command Status Duration Result
nx run-many -t compile,check,knip,test,pretty-d... ❌ Failed 6m 12s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-07 20:32:56 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Internal-facing build tooling & test updates ci:normal

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant