Skip to content

fix(core): Replace strip-comments with GoManipulator to resolve parser hang and preserve Go directives#814

Merged
yamadashy merged 6 commits intomainfrom
fix/go-strip-comments
Aug 30, 2025
Merged

fix(core): Replace strip-comments with GoManipulator to resolve parser hang and preserve Go directives#814
yamadashy merged 6 commits intomainfrom
fix/go-strip-comments

Conversation

@yamadashy
Copy link
Owner

@yamadashy yamadashy commented Aug 30, 2025

This PR resolves a critical parser hang issue in the strip-comments library and implements a custom Go language comment processor that correctly preserves Go compiler directives while removing regular comments.

Problem Analysis

Strip-comments Library Issues

The strip-comments library (v2.0.1) has several fundamental issues when processing Go code:

  1. Parser Hang Issue:

    • Occurs when Go code contains backtick-delimited raw string literals within function calls
    • Caused by incomplete regex pattern: /^(['"])((?:\\1|[^\1])+?)(\1)/`
    • The [^\1] pattern doesn't work as expected in character classes, being interpreted as [^1]
    • Results in infinite loops and memory consumption spikes
  2. Language Support Mismatch:

    • Go language support maps to JavaScript rules: exports.c = exports.javascript
    • Go's raw string literals (backticks) don't exist in C/JavaScript syntax
    • Multiline string handling is inadequate for Go's syntax
  3. Problematic Go Code Pattern:

    func example() {
        fmt.Fprintln(opts.io.StdOut, heredoc.Doc(`
            --------------------------------------------
            Comments / Notes
            --------------------------------------------
            `))
    }

Go Directives Issue

Additionally, strip-comments removes critical Go compiler directives:

  • //go:build (build constraints/tags)
  • //go:generate (code generation commands)
  • //go:embed (file embedding)
  • Other //go: directives

These are compiler instructions, not regular comments, and must be preserved.

Solution

Implemented GoManipulator class that addresses both issues:

Parser Stability

  • Eliminates hang issues: Uses state machine parsing instead of problematic regex patterns
  • Handles Go syntax correctly: Properly processes raw string literals with backticks
  • Performance optimized: Efficient string operations with controlled memory usage

Go Language Compliance

  • Preserves Go directives: Keeps //go: prefixed comments intact
  • Removes regular comments: Strips standard // and /* */ comments appropriately
  • Follows Go specification: Block comments do not nest (first */ closes the comment)
  • Protects string literals: Preserves comment-like text within strings/literals

Implementation Details

  • State management: Tracks parser state (Normal, InLineComment, InBlockComment, InDoubleQuoteString, InRawString, InRuneLiteral)
  • Directive detection: Identifies //go: at line start or after whitespace-only content
  • Block comment parsing: Correctly implements Go spec where block comments do not nest
  • String protection: Handles all Go string literal types including raw strings with backticks
  • Robust parsing: Eliminates regex-based parsing issues that caused hangs

Before/After Comparison

Problematic Input (Previously Caused Hang):

//go:build linux
//go:generate something

func example() {
    // Regular comment
    fmt.Fprintln(out, heredoc.Doc(`
        Multi-line raw string
        with // comment-like content
        `))
    /* Block comment */
}

Output (With GoManipulator):

//go:build linux  
//go:generate something

func example() {

    fmt.Fprintln(out, heredoc.Doc(`
        Multi-line raw string
        with // comment-like content
        `))

}

Technical Benefits

  1. Eliminates parser hangs: Resolves infinite loop issues in strip-comments
  2. Memory efficient: Controlled memory usage without accumulation
  3. Go-compliant parsing: Follows official Go language specification
  4. Preserves functionality: Maintains essential Go compiler directives
  5. Performance improvement: Faster processing compared to problematic regex patterns

Test Coverage

Added comprehensive test cases covering:

  • Parser hang scenarios (resolved)
  • Go directives preservation
  • Block comment handling (spec-compliant)
  • Raw string literals with backticks
  • Mixed directives and regular comments
  • String literals with comment-like content
  • Edge cases and complex scenarios

Checklist

  • Run npm run test
  • Run npm run lint
  • Verified parser stability with problematic Go code patterns
  • Confirmed Go directive preservation
  • Performance testing completed

…moval

Implement custom Go language comment processor that preserves important
//go: directives (build tags, generate commands) while removing regular
comments. This addresses the issue where strip-comments library was
removing all comments including critical Go compiler directives.

Features:
- Preserve //go:build, //go:generate and other //go: directives
- Remove regular // and /* */ comments appropriately
- Handle nested block comments (Go-specific feature)
- Protect comment-like text within strings, raw strings, and rune literals
- Comprehensive test coverage for edge cases

The GoManipulator uses a state machine to accurately parse Go syntax
and distinguish between directives that must be preserved and comments
that should be removed.
Copilot AI review requested due to automatic review settings August 30, 2025 11:54
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces a Go-specific comment remover via a new GoManipulator class and updates the .go file handling to use it. Adds tests covering Go directives, nested block comments, mixed directives/comments, and string literals containing //, verifying correct preservation and removal behaviors.

Changes

Cohort / File(s) Summary of changes
Go comment manipulation implementation
src/core/file/fileManipulate.ts
Added GoManipulator with state-machine-based removeComments handling Go directives, nested block comments, strings, runes, and line trimming. Updated manipulators map to use GoManipulator for .go instead of C-style stripper.
Go-specific tests
tests/core/file/fileManipulate.test.ts
Added four tests validating directive preservation, nested block comment removal, mixed directives/comments handling, and correct treatment of // inside standard and raw string literals.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as FileProcessor
  participant Map as manipulators[.ext]
  participant GoMan as GoManipulator
  participant Trim as rtrimLines

  Caller->>Map: select manipulator for ".go"
  Map-->>Caller: GoManipulator instance
  Caller->>GoMan: removeComments(content)
  Note over GoMan: State machine parses:<br/>- code<br/>- inline/block comments<br/>- strings/runes<br/>- //go: directives
  GoMan->>Trim: rtrimLines(processed)
  Trim-->>GoMan: trimmed
  GoMan-->>Caller: result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/go-strip-comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 30, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 575ae2b
Status: ✅  Deploy successful!
Preview URL: https://08837508.repomix.pages.dev
Branch Preview URL: https://fix-go-strip-comments.repomix.pages.dev

View logs

@claude
Copy link
Contributor

claude bot commented Aug 30, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yamadashy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a specialized comment manipulation logic for Go language files. Its primary goal is to accurately remove standard comments while ensuring that critical Go compiler directives, such as //go:build and //go:generate, are preserved, which were previously incorrectly stripped by the generic comment remover.

Highlights

  • Go-specific Comment Handling: Implemented a new GoManipulator class designed specifically for processing comments in Go source files.
  • Preservation of Go Directives: Ensures that essential //go: compiler directives (e.g., //go:build, //go:generate, //go:embed) are identified and retained during comment removal.
  • Accurate Comment Stripping: Correctly removes standard single-line (//) and multi-line (/* */) comments, including handling Go's unique nested block comment syntax.
  • Context-Aware Parsing: Utilizes a state machine to differentiate between actual comments and comment-like text appearing within string literals, raw strings, and rune literals, preventing unintended modifications.
  • Integration: Replaces the generic comment stripper with the new GoManipulator for .go files in the core file manipulation utility.
  • Enhanced Test Coverage: Added comprehensive unit tests to validate the GoManipulator's behavior across various Go comment and directive scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a specialized GoManipulator class to handle Go file comment processing that preserves essential Go compiler directives (like //go:build, //go:generate, //go:embed) while removing regular comments.

  • Implements state-machine-based parsing to accurately distinguish between Go directives and regular comments
  • Adds proper handling for Go-specific syntax including nested block comments and string literals
  • Replaces the generic strip-comments library with Go-specific logic for .go files

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/core/file/fileManipulate.ts Adds GoManipulator class and updates file extension mapping to use it for .go files
tests/core/file/fileManipulate.test.ts Adds comprehensive test cases covering Go directive preservation, nested comments, and string literals

@codecov
Copy link

codecov bot commented Aug 30, 2025

Codecov Report

❌ Patch coverage is 98.19820% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.59%. Comparing base (9e285ff) to head (575ae2b).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/core/file/fileManipulate.ts 98.19% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
+ Coverage   87.41%   87.59%   +0.17%     
==========================================
  Files         113      113              
  Lines        6493     6603     +110     
  Branches     1331     1372      +41     
==========================================
+ Hits         5676     5784     +108     
- Misses        817      819       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a GoManipulator to properly handle comment removal in Go files while preserving compiler directives like //go:build. The implementation uses a state machine, which is a robust approach for this kind of parsing. The new functionality is well-supported by a comprehensive set of test cases.

My review includes a couple of suggestions to improve the code's structure and maintainability by moving an enum definition and reducing some code duplication. Overall, this is a solid contribution that addresses an important issue for Go projects.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
tests/core/file/fileManipulate.test.ts (4)

736-782: Good coverage for directive preservation; consider adding embed/whitespace/EOF cases.

  • Add a case for //go:embed since it is common and sensitive to removal.
  • Add a case where directives are indented with spaces/tabs (spec allows leading whitespace).
  • Add a case where a directive is the last line without a trailing newline to validate the lineEnd === -1 path.

I can append these as additional test vectors if you want.


784-806: Clarify nested block comment semantics; test intent is fine.

Go block comments do not nest per spec; you’re intentionally handling nested sequences for robustness. A short comment in the test name or description would avoid confusion for readers.


808-820: Add a legacy build-tag case (// +build) to prevent regressions.

Some repos still use legacy // +build tags. Consider a test ensuring such lines are preserved (or explicitly document that they’re removed).

Happy to add a test variant preserving // +build.


822-832: Nice string/rune coverage; add a raw-string edge and trailing-comment variant.

  • Add a raw string ending at EOF to ensure state closes without newline.
  • Add a line with code followed by //go: to confirm it’s treated as a normal comment (not a directive).
src/core/file/fileManipulate.ts (3)

71-78: Avoid recreating enum per call; hoist or use const enum.

Defining enum State inside removeComments allocates at runtime on each call. Hoist it to module scope or use a const enum to inline values.

Apply (hoist to module scope):

-  removeComments(content: string): string {
-    if (!content) return '';
-
-    enum State {
+enum GoState {
       Normal = 0,
       InLineComment = 1,
       InBlockComment = 2,
       InDoubleQuoteString = 3,
       InRawString = 4,
       InRuneLiteral = 5,
     }
+class GoManipulator extends BaseManipulator {
+  removeComments(content: string): string {
+    if (!content) return '';
-    let state: State = State.Normal;
+    let state: GoState = GoState.Normal;

If you prefer not to change scope, switch to const enum State for zero runtime overhead (subject to TS config).


94-113: Directive detection is correct; optionally preserve legacy // +build.

Current logic preserves //go: when the line start (or only preceding whitespace) matches. Consider also preserving legacy // +build tags to avoid breaking older codebases.

Apply minimal change:

-              if (restOfLine.startsWith('//go:')) {
+              if (restOfLine.startsWith('//go:') || restOfLine.startsWith('// +build')) {

If you intentionally don't support legacy tags, add a brief comment stating that choice.


146-166: Fix misleading comment about nesting.

Go block comments do not nest; you’re adding nesting support for resilience. Update the comment to avoid confusion.

-          // Handle nested block comments (Go supports them)
+          // Handle nested block comment sequences for robustness (Go block comments do not nest per spec)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9e285ff and 4b6fc1a.

📒 Files selected for processing (2)
  • src/core/file/fileManipulate.ts (2 hunks)
  • tests/core/file/fileManipulate.test.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Build and run (windows-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 23.x)
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Build and run (macos-latest, 21.x)
  • GitHub Check: Build and run (macos-latest, 24.x)
  • GitHub Check: Build and run (windows-latest, 21.x)
  • GitHub Check: Test (macos-latest, 22.x)
  • GitHub Check: Build and run (ubuntu-latest, 21.x)
  • GitHub Check: Test (macos-latest, 23.x)
  • GitHub Check: Test (ubuntu-latest, 23.x)
  • GitHub Check: Build and run with Bun (windows-latest, latest)
🔇 Additional comments (3)
src/core/file/fileManipulate.ts (3)

168-199: String and rune handling LGTM; minor newline note.

Double-quoted strings can’t contain raw newlines, so not updating lineStart here is fine. Raw strings do hit the bottom \n check—good.


202-208: Return trimming is consistent with other manipulators.

rtrimLines aligns outputs with tests and prior behavior.


344-344: Mapping switch to GoManipulator looks good.

.go now uses the Go-specific parser as intended.

Address code review feedback:
- Move GoParserState enum to module level to avoid re-declaration overhead
- Optimize string operations by using content.startsWith() instead of substring
- Improve code readability by hoisting duplicate result += char statements
- Fix comment capitalization for consistency

These changes improve performance by reducing unnecessary string operations
and memory allocations while maintaining the same functionality.
Remove the unused lineStart variable and its assignments from the
GoManipulator class. After optimization with hasNonWhitespaceOnLine
tracking, lineStart is no longer needed for functionality.

This resolves the final lint warning about unused variables.
@yamadashy
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new GoManipulator to handle comment removal in Go files while preserving compiler directives like //go:build. The implementation uses a state machine to parse the file content, which is a good approach. However, there is a critical issue in the handling of block comments: the code incorrectly implements support for nested block comments, which is not a feature of the Go language. This can lead to incorrect comment stripping. I've provided a detailed comment with a suggested fix to align the behavior with the Go specification. The rest of the implementation, including the handling of directives and string literals, appears correct, and the added tests are comprehensive.

Go block comments do not nest according to the language specification.
The first */ sequence should close the comment, regardless of any /*
sequences within it. This change removes the blockCommentDepth tracking
and ensures correct parsing behavior for Go code containing sequences
like /* comment with /* nested */ part */.

Updated test expectations to reflect the correct Go language behavior.
Fixed formatting issue identified by Biome linter to ensure consistent
code style across the project.
Removed the Go nested block comments test case as it was unnecessary
and potentially misleading. Go block comments do not nest according
to the language specification, so testing this behavior is not needed
and could cause confusion about the expected behavior.

The remaining tests adequately cover Go comment parsing functionality.
@yamadashy yamadashy merged commit 56264ee into main Aug 30, 2025
57 checks passed
@yamadashy yamadashy deleted the fix/go-strip-comments branch August 30, 2025 15:44
@yamadashy yamadashy changed the title feat(core): Add GoManipulator to preserve Go directives in comment removal fix(core): Replace strip-comments with GoManipulator to resolve parser hang and preserve Go directives Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants