Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code compression #336

Merged
merged 15 commits into from
Feb 16, 2025
Merged

Code compression #336

merged 15 commits into from
Feb 16, 2025

Conversation

huy-trn
Copy link
Contributor

@huy-trn huy-trn commented Feb 4, 2025

Hi @yamadashy,

This PR is still a work in progress, but as per your suggestion, I'm submitting it early for easier discussion.

I would greatly appreciate your feedback, particularly on the CLI options and how we should handle non-parseable content from users' code.

I'll resolve the merge conflicts soon when I get a chance.

Need help:
I'm facing some challenges with the Tree-sitter queries, as I'm not very familiar with either Tree-sitter queries or the wide range of languages we're aiming to support. For instance, the parser is currently unable to identify arrow function declarations in TypeScript code.

Ref:

Checklist

  • Run npm run test
  • Run npm run lint

Copy link
Contributor

coderabbitai bot commented Feb 4, 2025

📝 Walkthrough

Walkthrough

The changes introduce several new features and modifications across the project. Two new dependencies, tree-sitter-wasms and web-tree-sitter, are added to support Tree-sitter’s WASM-based parsing. A new boolean configuration property compress (defaulted to false) is added to the output configuration in both the CLI and the config schema. The CLI has been updated with a new --compress command-line option, and the buildCliConfig function now conditionally adds the compress property. In the file processing workflow, the logic now checks the compress flag—if true, it routes the file content through the new parseFile function to apply Tree-sitter-based parsing. Several new functions and modules are added, including loadLanguage and a new LanguageParser class that handles parser initialization and language guessing. In addition, there are new Tree-sitter query definitions for multiple languages and enhancements in tests to validate the new configuration and functionality.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Config as buildCliConfig
    participant Worker as fileProcessWorker
    participant Parser as parseFile

    CLI->>Config: Run command with --compress option
    Config-->>CLI: Return configuration with compress:true
    CLI->>Worker: Execute file processing with config
    alt compress is true
        Worker->>Parser: Call parseFile(content, filePath, config)
        Parser-->>Worker: Return parsed/compressed content
    else compress is false
        Worker->>Worker: Process content with default formatting
    end
Loading
sequenceDiagram
    participant App as Caller
    participant LP as LanguageParser
    participant LL as loadLanguage

    App->>LP: init()
    LP->>LP: Execute Parser.init()
    App->>LP: getParserForLang(language)
    alt Parser not cached
        LP->>LL: prepareLang(language)
        LL-->>LP: Return Parser.Language instance
    else Parser cached
        LP-->>App: Return cached parser instance
    end
    App->>LP: getQueryForLang(language)
    LP-->>App: Return corresponding query instance
Loading

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (30)
src/core/tree-sitter/LanguageParser.ts (2)

24-30: Validate loadLanguage error handling.
If loadLanguage(name) fails or throws, consider wrapping this logic in a try/catch to handle the error more gracefully, rather than allowing the exception to bubble up.


56-60: Prevent re-initialization race conditions.
Calls to getReady() in parallel might cause potential race conditions if invoked from different parts of the code. Even though Parser.init() is guarded by isParserInitialized, consider using an internal Promise or a synchronization mechanism to ensure graceful handling if multiple calls take place concurrently.

src/core/tree-sitter/queries/python.ts (1)

5-11: Query enhancements for nested classes or decorators.
While this query captures class and function definitions, consider whether nested classes or decorated functions also need explicit handling. Tree-sitter queries can be extended to match additional constructs if needed.

src/core/tree-sitter/queries/php.ts (1)

6-15: Distinguish methods from functions.
Currently, both function definitions and method declarations are tagged as @definition.function. If you need to differentiate methods, consider using a distinct label for the method declaration. For example:

-  name: (name) @name.definition.function) @definition.function
+  name: (name) @name.definition.method) @definition.method
src/core/tree-sitter/ext2lang.ts (1)

1-20: Consider adding support for additional file extensions.

The mapping is comprehensive but could be enhanced by adding support for more file extensions:

  • JavaScript: .mjs (ES modules)
  • TypeScript: .cts, .mts (modules)
  • C++: .cc, .hxx, .cxx
  • Objective-C/C++: .m, .mm
  • Other languages: .scala, .kt, .ex, .exs, .dart, .lua, .r, .pl, .pm, .sh, .bash
src/core/tree-sitter/queries/java.ts (1)

1-15: Consider adding queries for additional Java constructs.

The queries cover basic Java constructs but could be enhanced by adding support for:

  • Enum declarations
  • Record declarations (Java 16+)
  • Annotation declarations
  • Constructor declarations
  • Field declarations

Example queries:

(enum_declaration
  name: (identifier) @name.definition.enum) @definition.enum

(record_declaration
  name: (identifier) @name.definition.record) @definition.record

(annotation_declaration
  name: (identifier) @name.definition.annotation) @definition.annotation

(constructor_declaration
  name: (identifier) @name.definition.constructor) @definition.constructor

(field_declaration
  declarator: (variable_declarator
    name: (identifier) @name.definition.field)) @definition.field
src/core/tree-sitter/queries/rust.ts (1)

1-16: Consider adding queries for additional Rust constructs.

The queries cover basic Rust constructs but could be enhanced by adding support for:

  • Enum declarations
  • Trait declarations
  • Impl blocks
  • Type aliases
  • Const declarations
  • Static declarations

Example queries:

(enum_item
  name: (type_identifier) @name.definition.enum) @definition.enum

(trait_item
  name: (type_identifier) @name.definition.trait) @definition.trait

(impl_item) @definition.impl

(type_item
  name: (type_identifier) @name.definition.type) @definition.type

(const_item
  name: (identifier) @name.definition.const) @definition.const

(static_item
  name: (identifier) @name.definition.static) @definition.static
src/core/tree-sitter/loadLanguage.ts (1)

10-12: Consider making the WASM path configurable.

The path to WASM modules is hard-coded and lacks validation.

Consider:

  1. Making the path configurable through environment variables or configuration files.
  2. Adding validation to ensure the WASM file exists before attempting to load it.

Example implementation:

import fs from 'node:fs/promises';

const WASM_PATH = process.env.TREE_SITTER_WASM_PATH || path.join('node_modules', 'tree-sitter-wasms', 'out');

async function getWasmPath(langName: string): Promise<string> {
  const wasmPath = path.join(WASM_PATH, `tree-sitter-${langName}.wasm`);
  try {
    await fs.access(wasmPath);
    return wasmPath;
  } catch {
    throw new Error(`WASM file not found for language ${langName}`);
  }
}
src/core/tree-sitter/queries/c.ts (1)

1-6: Enhance documentation with query syntax examples.

The documentation lists the types of declarations but could be more helpful with examples of the C code patterns that each query matches.

Add examples like:

 /*
 - struct declarations
+    Example: struct Point { int x, y; }
 - union declarations
+    Example: union Data { int i; float f; }
 - function declarations
+    Example: void foo(int x);
 - typedef declarations
+    Example: typedef int Integer;
 */
src/core/tree-sitter/queries/c_sharp.ts (2)

1-6: Enhance documentation with query syntax examples.

The documentation lists the types of declarations but could be more helpful with examples of the C# code patterns that each query matches.

Add examples like:

 /*
 - class declarations
+    Example: public class MyClass { }
 - interface declarations
+    Example: public interface IMyInterface { }
 - method declarations
+    Example: public void MyMethod() { }
 - namespace declarations
+    Example: namespace MyNamespace { }
 */

8-22: Add support for additional C# declarations.

The current implementation covers basic declarations but misses some important C# language constructs.

Consider adding patterns for:

  • Property declarations (e.g., public int MyProperty { get; set; })
  • Event declarations (e.g., public event EventHandler MyEvent;)
  • Delegate declarations (e.g., public delegate void MyDelegate();)
  • Enum declarations (e.g., public enum MyEnum { })
 export default `
 (class_declaration
  name: (identifier) @name.definition.class
 ) @definition.class

 (interface_declaration
  name: (identifier) @name.definition.interface
 ) @definition.interface

 (method_declaration
  name: (identifier) @name.definition.method
 ) @definition.method

 (namespace_declaration
  name: (identifier) @name.definition.module
 ) @definition.module
+
+(property_declaration
+ name: (identifier) @name.definition.property
+) @definition.property
+
+(event_field_declaration
+ name: (identifier) @name.definition.event
+) @definition.event
+
+(delegate_declaration
+ name: (identifier) @name.definition.delegate
+) @definition.delegate
+
+(enum_declaration
+ name: (identifier) @name.definition.enum
+) @definition.enum
 `;
src/core/tree-sitter/lang2Query.ts (2)

1-14: Consider using TypeScript's type system for imports.

The imports look good but could benefit from TypeScript's type system.

Consider adding a type for the query functions:

+type QueryFunction = string;
+
 import {
-  cQuery,
+  cQuery: QueryFunction,
-  cppQuery,
+  cppQuery: QueryFunction,
   // ... rest of the imports
 } from './queries/index.js';

16-29: Consider making the mapping type-safe.

The language-to-query mapping looks good but could benefit from TypeScript's type system.

Consider adding type safety:

+type SupportedLanguage =
+  | 'javascript'
+  | 'typescript'
+  | 'c'
+  | 'cpp'
+  | 'python'
+  | 'rust'
+  | 'go'
+  | 'c_sharp'
+  | 'ruby'
+  | 'java'
+  | 'php'
+  | 'swift';
+
-export const lang2Query = {
+export const lang2Query: Record<SupportedLanguage, QueryFunction> = {
   javascript: javascriptQuery,
   typescript: typescriptQuery,
   // ... rest of the mappings
 };
src/core/tree-sitter/queries/index.ts (1)

1-12: Consider using TypeScript's module resolution.

The exports look good but could benefit from TypeScript's module resolution.

Consider:

  1. Using TypeScript's module resolution by removing the .js extension
  2. Adding type information for the exports
-export { default as phpQuery } from './php.js';
+export { default as phpQuery } from './php';
+export type { QueryFunction } from './types';
 // ... rest of the exports

Also, consider adding JSDoc comments to document the purpose of each query:

+/** Query for extracting PHP declarations */
 export { default as phpQuery } from './php.js';
+/** Query for extracting TypeScript declarations */
 export { default as typescriptQuery } from './typescript.js';
 // ... rest of the exports
src/core/tree-sitter/queries/go.ts (1)

25-26: Consider adding more Go-specific patterns.

While the current patterns are correct, consider adding patterns for other Go constructs to make the query more comprehensive:

  • Interface declarations
  • Struct declarations
  • Constant and variable declarations

Example additions:

+(interface_declaration
+  name: (type_identifier) @name.definition.interface) @definition.interface
+
+(struct_declaration
+  name: (type_identifier) @name.definition.class) @definition.class
+
+(const_declaration
+  (const_spec name: (identifier) @name.definition.constant)) @definition.constant
src/core/tree-sitter/queries/cpp.ts (2)

10-10: Consider using distinct captures for structs and unions.

Currently, both structs and unions are captured as @definition.class. Consider using more specific captures to differentiate them:

-(struct_specifier name: (type_identifier) @name.definition.class body:(_)) @definition.class
+(struct_specifier name: (type_identifier) @name.definition.struct body:(_)) @definition.struct

-(declaration type: (union_specifier name: (type_identifier) @name.definition.class)) @definition.class
+(declaration type: (union_specifier name: (type_identifier) @name.definition.union)) @definition.union

Also applies to: 12-12


1-23: Consider adding more C++-specific patterns.

Consider adding patterns for other C++ constructs to make the query more comprehensive:

+// Namespace declarations
+(namespace_definition
+  name: (identifier) @name.definition.namespace) @definition.namespace
+
+// Template declarations
+(template_declaration
+  (class_specifier name: (type_identifier) @name.definition.class)) @definition.class
+
+// Constructor declarations
+(function_declarator
+  declarator: (qualified_identifier
+    name: (identifier) @name.definition.constructor)) @definition.constructor
+  (#match? @name.definition.constructor "^[A-Z]")
+
+// Destructor declarations
+(function_declarator
+  declarator: (qualified_identifier
+    name: (identifier) @name.definition.destructor)) @definition.destructor
+  (#match? @name.definition.destructor "^~")
src/core/tree-sitter/queries/swift.ts (1)

1-45: Consider adding more Swift-specific patterns.

Consider adding patterns for other Swift constructs to make the query more comprehensive:

+// Enum declarations
+(enum_declaration
+  name: (type_identifier) @name.definition.enum) @definition.enum
+
+// Extension declarations
+(extension_declaration
+  name: (type_identifier) @name.definition.extension) @definition.extension
+
+// Struct declarations
+(struct_declaration
+  name: (type_identifier) @name.definition.struct) @definition.struct
+
+// Operator declarations
+(operator_declaration
+  name: (custom_operator) @name.definition.operator) @definition.operator
src/core/tree-sitter/queries/typescript.ts (1)

1-37: Consider adding more TypeScript-specific patterns.

Consider adding patterns for other TypeScript constructs to make the query more comprehensive:

+// Interface declarations
+(interface_declaration
+  name: (type_identifier) @name.definition.interface) @definition.interface
+
+// Type alias declarations
+(type_alias_declaration
+  name: (type_identifier) @name.definition.type) @definition.type
+
+// Enum declarations
+(enum_declaration
+  name: (identifier) @name.definition.enum) @definition.enum
+
+// Decorator declarations
+(decorator
+  "@" @punctuation.decorator
+  ((identifier) @name.definition.decorator)) @definition.decorator
tests/core/tree-sitter/LanguageParser.test.ts (2)

13-22: Add error handling test cases for getReady.

Consider adding test cases to verify error handling when Parser.init fails.

 describe('getReady', () => {
   it('should initialize the parser', async () => {
     // Mock Parser.init to avoid actual initialization during testing
     Parser.init = vi.fn().mockResolvedValue(undefined);

     await parser.getReady();

     expect(Parser.init).toHaveBeenCalled();
   });
+
+  it('should handle initialization errors', async () => {
+    // Mock Parser.init to simulate failure
+    Parser.init = vi.fn().mockRejectedValue(new Error('Init failed'));
+
+    await expect(parser.getReady()).rejects.toThrow('Init failed');
+  });
 });

24-38: Add edge cases for guessTheLang.

Consider adding test cases for:

  1. Files with multiple extensions (e.g., file.test.js)
  2. Case sensitivity in file extensions (e.g., .JS)
  3. Files with no extension
 describe('guessTheLang', () => {
   it('should return the correct language based on file extension', () => {
     const filePath = 'file.js';
     const lang = parser.guessTheLang(filePath);

     expect(lang).toBe('javascript');
   });

   it('should return undefined for unsupported extensions', () => {
     const filePath = 'file.txt';
     const lang = parser.guessTheLang(filePath);

     expect(lang).toBeUndefined();
   });
+
+  it('should handle files with multiple extensions', () => {
+    const filePath = 'file.test.js';
+    const lang = parser.guessTheLang(filePath);
+
+    expect(lang).toBe('javascript');
+  });
+
+  it('should be case-insensitive for file extensions', () => {
+    const filePath = 'file.JS';
+    const lang = parser.guessTheLang(filePath);
+
+    expect(lang).toBe('javascript');
+  });
+
+  it('should handle files without extension', () => {
+    const filePath = 'Dockerfile';
+    const lang = parser.guessTheLang(filePath);
+
+    expect(lang).toBeUndefined();
+  });
 });
src/core/tree-sitter/queries/ruby.ts (2)

1-5: Enhance documentation for Tree-sitter query.

Consider adding more detailed comments explaining:

  1. The purpose and behavior of #strip! and #select-adjacent! predicates
  2. The structure of captured nodes
  3. Examples of Ruby code that matches each pattern
 /*
 - method definitions (including singleton methods and aliases, with associated comments)
 - class definitions (including singleton classes, with associated comments)
 - module definitions
+
+ Query predicates:
+ - #strip!: Removes leading comment characters and whitespace
+ - #select-adjacent!: Associates comments with their corresponding definitions
+
+ Example matches:
+ ```ruby
+ # This is a method comment
+ def foo(bar)
+   # Method body
+ end
+
+ # This is a class comment
+ class MyClass
+   # Class body
+ end
+
+ # This is a module comment
+ module MyModule
+   # Module body
+ end
+ ```
 */

6-52: Consider adding support for more Ruby-specific constructs.

The query could be enhanced to capture:

  1. Module functions (module_function)
  2. Include/extend statements
  3. Attribute accessors (attr_*)
 export default `
 (
   (comment)* @doc
   .
   [
     (method
       name: (_) @name.definition.method) @definition.method
     (singleton_method
       name: (_) @name.definition.method) @definition.method
   ]
   (#strip! @doc "^#\\s*")
   (#select-adjacent! @doc @definition.method)
 )

 (alias
   name: (_) @name.definition.method) @definition.method

 (
   (comment)* @doc
   .
   [
     (class
       name: [
         (constant) @name.definition.class
         (scope_resolution
           name: (_) @name.definition.class)
       ]) @definition.class
     (singleton_class
       value: [
         (constant) @name.definition.class
         (scope_resolution
           name: (_) @name.definition.class)
       ]) @definition.class
   ]
   (#strip! @doc "^#\\s*")
   (#select-adjacent! @doc @definition.class)
 )

 (
   (module
     name: [
       (constant) @name.definition.module
       (scope_resolution
         name: (_) @name.definition.module)
     ]) @definition.module
 )
+
+(
+  (comment)* @doc
+  .
+  (call
+    method: (identifier) @name
+    (#match? @name "^(attr_reader|attr_writer|attr_accessor)$")
+  ) @definition.attribute
+  (#strip! @doc "^#\\s*")
+  (#select-adjacent! @doc @definition.attribute)
+)
+
+(
+  (comment)* @doc
+  .
+  (call
+    method: (identifier) @name
+    (#eq? @name "module_function")
+  ) @definition.module_function
+  (#strip! @doc "^#\\s*")
+  (#select-adjacent! @doc @definition.module_function)
+)
 `;
src/core/tree-sitter/queries/javascript.ts (1)

1-6: Enhance documentation and add TypeScript support.

Consider:

  1. Adding examples of code that matches each pattern
  2. Adding support for TypeScript-specific constructs
  3. Adding support for decorators and static methods
 /*
 - class definitions
 - method definitions
 - named function declarations
 - arrow functions and function expressions assigned to variables
+
+ Examples:
+ ```typescript
+ // Class with decorators and static methods
+ @decorator
+ class Example {
+   static staticMethod() {}
+   method() {}
+ }
+
+ // TypeScript interface
+ interface IExample {
+   method(): void;
+ }
+
+ // TypeScript type alias
+ type Handler = (event: Event) => void;
+ ```
+
+ Note: This query currently supports JavaScript. For TypeScript support,
+ additional patterns for interfaces, type aliases, and decorators are needed.
 */
src/core/tree-sitter/parseFile.ts (2)

51-53: Enhance error handling and logging.

The current error handling is too generic. Consider:

  1. Using specific error types
  2. Adding error context
  3. Using a proper logging framework
-    } catch (error) {
-      console.log(`Error parsing file: ${error}\n`);
+    } catch (error: unknown) {
+      const context = {
+        filePath,
+        language: lang,
+        error: error instanceof Error ? error.message : String(error)
+      };
+      console.error('Error parsing file:', context);
+      throw new Error(`Failed to parse ${filePath}: ${context.error}`);

5-5: Consider renaming the variable for clarity.

The variable name magicBlob is not descriptive of its purpose.

-  const magicBlob = new LanguageParser();
+  const languageParser = new LanguageParser();
src/cli/cliRun.ts (1)

34-34: Enhance the CLI option description.

The description "remove everything except definitions" could be more specific about what constitutes a "definition" and which file types are supported.

-      .option('--only-defs', 'remove everything except definitions')
+      .option('--only-defs', 'extract only type definitions, interfaces, and classes from TypeScript files')

Also applies to: 57-57

src/core/file/fileProcess.ts (2)

8-8: Consider using a more descriptive function name.

The name getFn_parseFile is not very descriptive and doesn't follow TypeScript naming conventions.

-import { getFn_parseFile } from '../tree-sitter/parseFile.js';
+import { getTypeScriptParser } from '../tree-sitter/parseFile.js';

59-61: Add progress indication for parsing operation.

The parsing operation could be time-consuming for large files, but there's no progress indication.

   if (config.output.onlyDefs && filePath.endsWith('.ts')) {
+    logger.trace(`Parsing definitions from ${filePath}`);
     const parseFile = await getFn_parseFile();
     processedContent = (await parseFile(processedContent, filePath, config)) ?? processedContent;
+    logger.trace(`Completed parsing definitions from ${filePath}`);
   }
tests/core/tree-sitter/parseFile.test.ts (1)

5-192: Consider enhancing the test coverage.

While the test suite provides good coverage for basic parsing functionality across multiple languages, consider adding:

  1. Error cases (e.g., invalid syntax, unsupported file extensions)
  2. Tests for the onlyDefs configuration
  3. More complex code examples that test nested functions, classes, and methods

Example test case for onlyDefs:

+test('should respect onlyDefs configuration', async () => {
+  const fileContent = `
+    // A comment that should be ignored
+    function helper() { return 42; }
+    function main() { return helper(); }
+  `;
+  const filePath = 'dummy.js';
+  const config = { output: { onlyDefs: true } };
+  const parseFile = await getFn_parseFile();
+  const result = await parseFile(fileContent, filePath, config as RepomixConfigMerged);
+  expect(typeof result).toBe('string');
+  expect(result).toContain('helper');
+  expect(result).toContain('main');
+  expect(result).not.toContain('comment');
+});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 836abcd and bfc007f.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • package.json (1 hunks)
  • repomix.config.json (1 hunks)
  • src/cli/actions/defaultAction.ts (1 hunks)
  • src/cli/cliRun.ts (2 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/file/fileProcess.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2lang.ts (1 hunks)
  • src/core/tree-sitter/lang2Query.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/c_sharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/index.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (1 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
🔇 Additional comments (19)
src/core/tree-sitter/LanguageParser.ts (1)

20-22: Consider edge cases with files lacking extensions.
Currently, getFileExtension returns an empty string for files with no extension or a trailing dot. Ensure that calling workflows handle these edge cases gracefully.

src/core/tree-sitter/queries/c.ts (1)

8-14: Verify query patterns against Tree-sitter C grammar.

The query patterns look correct but should be verified against the Tree-sitter C grammar to ensure they capture all valid C declaration forms.

src/core/tree-sitter/queries/go.ts (2)

8-14: LGTM! Function declaration pattern looks good.

The pattern correctly captures function names and associated comments, with proper stripping of comment whitespace.


17-23: LGTM! Method declaration pattern looks good.

The pattern correctly captures method names and associated comments, maintaining consistency with function declarations.

src/core/tree-sitter/queries/cpp.ts (1)

14-18: LGTM! Function patterns look good.

The patterns correctly handle both simple identifiers and field identifiers for functions.

src/core/tree-sitter/queries/swift.ts (2)

8-12: LGTM! Class and protocol patterns look good.

The patterns correctly capture class and protocol names with appropriate definition types.


14-27: LGTM! Method patterns are comprehensive.

The patterns handle various method types including regular functions, subscripts, initializers, and deinitializers within class scope.

src/core/tree-sitter/queries/typescript.ts (2)

9-10: LGTM! Function patterns are comprehensive.

The patterns correctly handle function signatures, declarations, and variable declarations containing functions.

Also applies to: 24-25, 33-36


12-16: LGTM! Method patterns look good.

The patterns handle both regular and abstract method signatures appropriately.

Also applies to: 27-28

src/config/configSchema.ts (1)

28-28: LGTM! Schema changes are well-structured.

The onlyDefs property is correctly added to both base and default schemas with appropriate types and defaults.

Also applies to: 68-68

src/cli/actions/defaultAction.ts (1)

134-136: LGTM! CLI option handling is consistent.

The implementation follows the established pattern for boolean options and maintains immutability.

tests/config/configSchema.test.ts (3)

72-72: LGTM!

The onlyDefs property is correctly added to the valid configuration test case for repomixConfigDefaultSchema with a default value of false.


157-157: LGTM!

The onlyDefs property is correctly added to the valid configuration test case for repomixConfigMergedSchema with a default value of false.


196-196: LGTM!

The onlyDefs property is correctly included in the invalid configuration test case for type validation.

tests/cli/cliRun.test.ts (1)

57-57: LGTM!

The onlyDefs property is correctly added to the mocked configuration objects with a default value of false for both default and remote actions.

Also applies to: 96-96

tests/core/tree-sitter/parseFile.test.ts (1)

1-4: LGTM!

The test file setup and imports are correctly configured.

repomix.config.json (2)

3-4: Updated Output Configuration to Plain Text Format
The changes to "filePath" and "style" correctly reflect the new plain text output format. This update appears consistent with the intended simplification of the output.


11-11: Added "onlyDefs" Property for Definition-Only Output
The new "onlyDefs": true property has been added to enable output of definitions only. Please ensure that the documentation and CLI help texts are updated to reflect this new option. Also, verify that this override is intentional given that the default in the schema is set to false.

package.json (1)

82-83: New Dependencies for Tree-sitter Integration
The addition of "tree-sitter-wasms": "^0.1.12" and "web-tree-sitter": "^0.24.7" in the dependencies is consistent with the enhancements for parsing capabilities using Tree-sitter. Please ensure that these dependencies are documented in the project README and that any version-specific caveats are noted.

@huy-trn huy-trn marked this pull request as draft February 4, 2025 20:05
@huy-trn huy-trn changed the title Good for rag Code compression Feb 4, 2025
@yamadashy
Copy link
Owner

@huy-trn
I haven't reviewed the entire PR yet, but it looks really promising!

I'll take a more detailed look tomorrow.

Thank you so much for your constant support and contributions! Really appreciate it! 🚀

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 81.86528% with 35 lines in your changes missing coverage. Please review.

Project coverage is 89.56%. Comparing base (acc4121) to head (169ea22).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
src/core/file/workers/fileProcessWorker.ts 15.38% 11 Missing ⚠️
src/core/tree-sitter/languageParser.ts 84.00% 8 Missing ⚠️
src/core/tree-sitter/loadLanguage.ts 69.56% 7 Missing ⚠️
src/core/tree-sitter/parseFile.ts 86.79% 7 Missing ⚠️
src/cli/actions/defaultAction.ts 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
- Coverage   90.12%   89.56%   -0.57%     
==========================================
  Files          49       66      +17     
  Lines        2724     2912     +188     
  Branches      562      594      +32     
==========================================
+ Hits         2455     2608     +153     
- Misses        269      304      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
src/core/tree-sitter/parseFile.ts (3)

4-58: 🛠️ Refactor suggestion

Optimize memory usage and improve type safety.

Consider using generators for chunks and adding type guards.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 12-13: src/core/tree-sitter/parseFile.ts#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 17-18: src/core/tree-sitter/parseFile.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 47-47: src/core/tree-sitter/parseFile.ts#L47
Added line #L47 was not covered by tests


[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests


8-8: 🛠️ Refactor suggestion

Add input validation.

The function lacks input validation for its parameters.

-  const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+  const parseFile = async (
+    fileContent: string,
+    filePath: string,
+    config: RepomixConfigMerged
+  ): Promise<string | undefined> => {
+    // Validate input
+    if (!fileContent || !filePath) {
+      throw new Error('Invalid input: fileContent and filePath are required');
+    }

52-54: ⚠️ Potential issue

Improve error handling.

The current error handling only logs to console and swallows the error. This could hide issues and make debugging difficult.

-    } catch (error) {
-      console.log(`Error parsing file: ${error}\n`);
+    } catch (error: unknown) {
+      const context = {
+        filePath,
+        language: lang,
+        error: error instanceof Error ? error.message : String(error)
+      };
+      logger.error('Error parsing file:', context);
+      throw new Error(`Failed to parse ${filePath}: ${context.error}`);
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests

🧹 Nitpick comments (10)
src/core/tree-sitter/ext2lang.ts (1)

1-20: Add JSDoc documentation for better maintainability.

The mapping is comprehensive and well-structured. Consider adding JSDoc documentation to describe the purpose and usage of this mapping.

+/**
+ * Maps file extensions to their corresponding programming languages.
+ * Used by LanguageParser to determine the language of a file based on its extension.
+ * @example
+ * ext2lang.js // returns 'javascript'
+ * ext2lang.py // returns 'python'
+ */
 export const ext2lang = {
src/core/tree-sitter/lang2Query.ts (2)

1-14: Use import type for better type safety.

Since the imported types are only used for type checking, use import type to ensure they are removed during compilation.

-import {
+import type {
   cQuery,
   cppQuery,
   csharpQuery,
   goQuery,
   javaQuery,
   javascriptQuery,
   phpQuery,
   pythonQuery,
   rubyQuery,
   rustQuery,
   swiftQuery,
   typescriptQuery,
 } from './queries/index.js';

16-29: Add JSDoc documentation for better maintainability.

The mapping is well-structured. Consider adding JSDoc documentation to describe the purpose and usage of this mapping.

+/**
+ * Maps programming languages to their corresponding query functions.
+ * Used by LanguageParser to retrieve the appropriate query for a language.
+ * @example
+ * lang2Query.javascript // returns javascriptQuery
+ * lang2Query.python // returns pythonQuery
+ */
 export const lang2Query = {
src/core/tree-sitter/queries/ruby.ts (1)

1-5: Enhance documentation for better maintainability.

Consider expanding the documentation to include:

  • Examples of the patterns being matched
  • Description of the capture groups
  • Explanation of the Tree-sitter features being used
-/*
-- method definitions (including singleton methods and aliases, with associated comments)
-- class definitions (including singleton classes, with associated comments)
-- module definitions
-*/
+/**
+ * Tree-sitter query patterns for Ruby code.
+ * 
+ * Captures:
+ * - Method definitions:
+ *   - Regular methods with associated comments
+ *   - Singleton methods with associated comments
+ *   - Method aliases
+ * - Class definitions:
+ *   - Regular classes with associated comments
+ *   - Singleton classes with associated comments
+ * - Module definitions
+ * 
+ * Features:
+ * - Uses #strip! to remove leading comment characters
+ * - Uses #select-adjacent! to link comments to their definitions
+ * 
+ * @example
+ * # This is a method comment
+ * def foo(bar)
+ *   # Method body
+ * end
+ */
src/core/tree-sitter/LanguageParser.ts (3)

1-6: Use import type for better type safety.

Since SupportedLang is only used for type checking, use import type to ensure it is removed during compilation.

 import * as path from 'node:path';
 import Parser from 'web-tree-sitter';
 
 import { ext2lang } from './ext2lang.js';
-import { lang2Query, SupportedLang } from './lang2Query.js';
+import { lang2Query } from './lang2Query.js';
+import type { SupportedLang } from './lang2Query.js';
 import { loadLanguage } from './loadLanguage.js';
🧰 Tools
🪛 Biome (1.9.4)

[error] 5-5: Some named imports are only used as types.

This import is only used as a type.

Importing the types with import type ensures that they are removed by the compilers and avoids loading unnecessary modules.
Safe fix: Add inline type keywords.

(lint/style/useImportType)


17-20: Handle edge cases in getFileExtension.

Consider handling edge cases such as files without extensions or with multiple dots.

 private getFileExtension(filePath: string) {
-  return path.extname(filePath).toLowerCase().slice(1);
+  const ext = path.extname(filePath).toLowerCase().slice(1);
+  return ext || '';  // Return empty string for files without extension
 }

44-51: Improve type safety in guessTheLang.

Consider using type assertions more safely and handling potential type mismatches.

 public guessTheLang(filePath: string) {
   const ext = this.getFileExtension(filePath);
-  if (!Object.keys(ext2lang).includes(ext)) {
+  if (!(ext in ext2lang)) {
     return undefined;
   }
-  const lang = ext2lang[ext as keyof typeof ext2lang] as keyof typeof lang2Query;
+  const lang = ext2lang[ext as keyof typeof ext2lang];
+  if (!(lang in lang2Query)) {
+    return undefined;
+  }
   return lang;
 }
tests/core/tree-sitter/parseFile.test.ts (3)

1-4: Remove unused import.

The vi import from 'vitest' is not used in the test file.

-import { describe, expect, test, vi } from 'vitest';
+import { describe, expect, test } from 'vitest';

7-25: Enhance JavaScript/JSX test coverage.

The current tests only cover basic function declarations. Consider adding test cases for:

  • Arrow functions: const greet = (name) => console.log("Hello, " + name);
  • Class methods: class Greeter { sayHello(name) { console.log("Hello, " + name); } }
  • Async functions: async function greet(name) { await delay(100); console.log("Hello, " + name); }
  • Generator functions: function* counter() { yield 1; yield 2; }
  • Edge cases: empty files, invalid syntax

28-46: Add TypeScript-specific test cases.

The current tests don't cover TypeScript-specific features. Consider adding test cases for:

  • Interfaces: interface Greeter { greet(name: string): void; }
  • Type aliases: type GreetFn = (name: string) => void;
  • Abstract classes: abstract class BaseGreeter { abstract greet(name: string): void; }
  • Generics: function identity<T>(arg: T): T { return arg; }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a94e854 and a5215be.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • package.json (1 hunks)
  • repomix.config.json (1 hunks)
  • src/cli/actions/defaultAction.ts (1 hunks)
  • src/cli/cliRun.ts (1 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2lang.ts (1 hunks)
  • src/core/tree-sitter/lang2Query.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/c_sharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/index.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (1 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (20)
  • src/core/tree-sitter/queries/c.ts
  • src/core/tree-sitter/queries/python.ts
  • src/core/tree-sitter/loadLanguage.ts
  • src/core/tree-sitter/queries/rust.ts
  • src/core/tree-sitter/queries/typescript.ts
  • tests/config/configSchema.test.ts
  • src/core/tree-sitter/queries/swift.ts
  • src/core/tree-sitter/queries/php.ts
  • src/core/tree-sitter/queries/java.ts
  • src/cli/cliRun.ts
  • package.json
  • tests/cli/cliRun.test.ts
  • repomix.config.json
  • tests/cli/actions/defaultAction.test.ts
  • src/core/tree-sitter/queries/go.ts
  • src/core/tree-sitter/queries/c_sharp.ts
  • src/core/tree-sitter/queries/javascript.ts
  • src/core/tree-sitter/queries/cpp.ts
  • tests/core/tree-sitter/LanguageParser.test.ts
  • src/core/tree-sitter/queries/index.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/file/workers/fileProcessWorker.ts

[warning] 38-39: src/core/file/workers/fileProcessWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

src/core/tree-sitter/LanguageParser.ts

[warning] 31-32: src/core/tree-sitter/LanguageParser.ts#L31-L32
Added lines #L31 - L32 were not covered by tests

src/core/tree-sitter/parseFile.ts

[warning] 12-13: src/core/tree-sitter/parseFile.ts#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 17-18: src/core/tree-sitter/parseFile.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 47-47: src/core/tree-sitter/parseFile.ts#L47
Added line #L47 was not covered by tests


[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests

src/cli/actions/defaultAction.ts

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

🪛 Biome (1.9.4)
src/core/tree-sitter/LanguageParser.ts

[error] 5-5: Some named imports are only used as types.

This import is only used as a type.

Importing the types with import type ensures that they are removed by the compilers and avoids loading unnecessary modules.
Safe fix: Add inline type keywords.

(lint/style/useImportType)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Test (windows-latest, 18.0.0)
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 21.x)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (7)
src/core/tree-sitter/lang2Query.ts (1)

31-31: LGTM!

The type definition ensures type safety when working with supported languages.

src/core/tree-sitter/queries/ruby.ts (1)

6-52: LGTM!

The query patterns are well-structured and comprehensive, covering all major Ruby constructs including regular and singleton methods/classes, along with their associated comments.

src/core/tree-sitter/LanguageParser.ts (2)

8-16: LGTM!

The class structure with private caches for parsers and queries is well-designed.


29-34: Add test coverage for error handling.

The error handling path in getParserForLang is not covered by tests.

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Check test coverage for error handling in getParserForLang

# Search for test files related to LanguageParser
fd -e ts -e js "LanguageParser\.test" | xargs rg -l "getParserForLang.*error"

Length of output: 79


Missing Error Handling Test Coverage in getParserForLang
Our search across test files produced no evidence that error scenarios (such as failures from prepareLang) are being exercised. Please add tests that simulate errors from prepareLang so that the error handling path in getParserForLang is covered.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 31-32: src/core/tree-sitter/LanguageParser.ts#L31-L32
Added lines #L31 - L32 were not covered by tests

src/core/file/workers/fileProcessWorker.ts (1)

37-40: Add test coverage for compressed mode.

The new code path for compressed output mode is not covered by tests.

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Check test coverage for compressed mode
fd -e test.ts -e spec.ts | xargs rg -l "compressed.*true"

Length of output: 59


Let's run an improved search to verify if any tests set the configuration with compressed: true:


🏁 Script executed:

#!/bin/bash
# Searching for test cases that have a configuration object with output.compressed set to true
fd -e ts -e spec.ts | xargs rg -Hn "output\s*:\s*\{[^}]*compressed\s*:\s*true"

Length of output: 80


Action Required: Add tests for compressed mode

The searches did not reveal any tests setting config.output.compressed to true. This confirms that the new code branch in src/core/file/workers/fileProcessWorker.ts handling compressed output is not covered by tests.

  • Location: src/core/file/workers/fileProcessWorker.ts (lines 37–40)
  • Issue: No tests currently trigger the if (config.output.compressed) block.
  • Recommendation: Please add test cases that set config.output.compressed to true to ensure this branch is exercised.
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 38-39: src/core/file/workers/fileProcessWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

src/config/configSchema.ts (1)

28-28: LGTM!

The schema changes for the compressed option are well-structured and consistent with the existing pattern.

Also applies to: 68-68

src/cli/actions/defaultAction.ts (1)

170-172: Add test coverage for compressed CLI option.

The new code path for handling the compressed CLI option is not covered by tests.

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Check test coverage for compressed CLI option
fd -e test.ts -e spec.ts | xargs rg -l "compressed.*options"

Length of output: 62


Missing test coverage for the compressed CLI option
Our search did not locate any tests exercising the code path that updates the CLI configuration with the compressed option in src/cli/actions/defaultAction.ts (lines 170–172). Please add tests that invoke the CLI with the compressed option to verify that it behaves as expected.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
src/core/tree-sitter/LanguageParser.ts (2)

21-27: ⚠️ Potential issue

Add error handling in prepareLang.

The method should handle potential errors from loadLanguage and parser.setLanguage.

 private async prepareLang(name: SupportedLang) {
-  const lang = await loadLanguage(name);
-  const parser = new Parser();
-  parser.setLanguage(lang);
-  this.loadedParsers[name] = parser;
-  this.loadedQueries[name] = lang.query(lang2Query[name]);
+  try {
+    const lang = await loadLanguage(name);
+    const parser = new Parser();
+    parser.setLanguage(lang);
+    this.loadedParsers[name] = parser;
+    this.loadedQueries[name] = lang.query(lang2Query[name]);
+  } catch (error) {
+    throw new Error(`Failed to prepare language ${name}: ${error.message}`);
+  }
 }

53-55: 🛠️ Refactor suggestion

Add cleanup method for proper resource management.

Consider adding a cleanup method to properly dispose of parsers when they are no longer needed.

 public async init() {
   await Parser.init();
 }
+
+/**
+ * Cleanup resources by disposing of all loaded parsers.
+ * Call this method when the parser is no longer needed.
+ */
+public cleanup() {
+  Object.values(this.loadedParsers).forEach(parser => {
+    parser.delete();
+  });
+  this.loadedParsers = {};
+  this.loadedQueries = {};
+}
src/core/tree-sitter/parseFile.ts (1)

8-56: 🛠️ Refactor suggestion

Optimize memory usage and improve type safety.

The function needs input validation, better error handling, and memory optimization.

-const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+const parseFile = async (
+  fileContent: string,
+  filePath: string,
+  config: RepomixConfigMerged
+): Promise<string | undefined> => {
+  // Validate input
+  if (!fileContent || !filePath) {
+    throw new Error('Invalid input: fileContent and filePath are required');
+  }

   // Split the file content into individual lines
   const lines = fileContent.split('\n');
   if (lines.length < 1) {
     return '';
   }

   const lang = magicBlob.guessTheLang(filePath);
   if (lang === undefined) {
     // Language not supported
     return undefined;
   }

   const query = await magicBlob.getQueryForLang(lang);
   const parser = await magicBlob.getParserForLang(lang);

-  const chunks = [];
+  // Use a generator to yield chunks
+  async function* generateChunks() {
     try {
       const tree = parser.parse(fileContent);
       const captures = query.captures(tree.rootNode);
       captures.sort((a, b) => a.node.startPosition.row - b.node.startPosition.row);

       for (const capture of captures) {
         const { node, name } = capture;
         const startRow = node.startPosition.row;
         const endRow = node.endPosition.row;

         if (!name.includes('name') || !lines[startRow]) {
           continue;
         }

         const selectedLines = lines.slice(startRow, endRow + 1);
         if (selectedLines.length < 1) {
           continue;
         }

-        const chunk = selectedLines.join('\n');
-        chunks.push(chunk);
+        yield selectedLines.join('\n');
       }
-    } catch (error) {
-      console.log(`Error parsing file: ${error}\n`);
+    } catch (error: unknown) {
+      const context = {
+        filePath,
+        language: lang,
+        error: error instanceof Error ? error.message : String(error)
+      };
+      console.error('Error parsing file:', context);
+      throw new Error(`Failed to parse ${filePath}: ${context.error}`);
     }
-    return chunks.join('\n');
+  }

+  // Collect chunks with reduced memory usage
+  const chunks: string[] = [];
+  for await (const chunk of generateChunks()) {
+    chunks.push(chunk);
+  }
+  return chunks.join('\n');
 };
src/core/file/workers/fileProcessWorker.ts (1)

37-40: ⚠️ Potential issue

Add error handling for parseFile result.

The compressed mode needs better error handling.

 if (config.output.compressed) {
   const parseFile = await getFn_parseFile();
-  processedContent = (await parseFile(processedContent, rawFile.path, config)) ?? processedContent;
+  try {
+    const parsedContent = await parseFile(processedContent, rawFile.path, config);
+    if (parsedContent === undefined) {
+      logger.warn(`Failed to parse ${rawFile.path} in compressed mode. Using original content.`);
+    }
+    processedContent = parsedContent ?? processedContent;
+  } catch (error) {
+    logger.error(`Error parsing ${rawFile.path} in compressed mode: ${error}`);
+    throw error;
+  }
 } else if (config.output.showLineNumbers) {
tests/core/tree-sitter/parseFile.test.ts (1)

5-192: 🛠️ Refactor suggestion

Improve test organization and coverage.

The tests need better organization and more comprehensive coverage.

  1. Test Configuration:

    • Add tests with compressed: true in the config
    • Consider testing other configuration options
  2. Test Setup:

    • Use test fixtures or helper functions to reduce duplicate code:
    function createParseTest(language: string, content: string, expectedIdentifier: string) {
      test(`should parse ${language} correctly`, async () => {
        const filePath = `dummy.${language}`;
        const config = {};
        const parseFile = await getFn_parseFile();
        const result = await parseFile(content, filePath, config as RepomixConfigMerged);
        expect(typeof result).toBe('string');
        expect(result).toContain(expectedIdentifier);
      });
    }
  3. Add tests for:

    • Empty files
    • Files with syntax errors
    • Files with multiple declarations
    • Files with comments and documentation
    • Language-specific features (classes, interfaces, etc.)
🧹 Nitpick comments (3)
src/core/tree-sitter/LanguageParser.ts (1)

5-5: Use import type for type-only imports.

The SupportedLang import is only used as a type. Consider using import type to ensure it's removed during compilation.

-import { ext2lang } from './ext2lang.js';
-import { lang2Query, SupportedLang } from './lang2Query.js';
+import { ext2lang } from './ext2lang.js';
+import { lang2Query } from './lang2Query.js';
+import type { SupportedLang } from './lang2Query.js';
🧰 Tools
🪛 Biome (1.9.4)

[error] 5-5: Some named imports are only used as types.

This import is only used as a type.

Importing the types with import type ensures that they are removed by the compilers and avoids loading unnecessary modules.
Safe fix: Add inline type keywords.

(lint/style/useImportType)

src/core/tree-sitter/parseFile.ts (2)

7-7: Address the TODO comment about unused config parameter.

The config parameter is currently unused. Consider either using it or removing it if not needed.

Would you like me to help implement the configuration handling or remove the unused parameter?


5-5: Use a more descriptive variable name.

The variable name magicBlob is not descriptive. Consider using a more meaningful name like languageParser that reflects its purpose.

-const magicBlob = new LanguageParser();
+const languageParser = new LanguageParser();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5215be and 548790e.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • package.json (1 hunks)
  • repomix.config.json (1 hunks)
  • src/cli/actions/defaultAction.ts (1 hunks)
  • src/cli/cliRun.ts (1 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2lang.ts (1 hunks)
  • src/core/tree-sitter/lang2Query.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/c_sharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/index.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (1 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (25)
  • src/core/tree-sitter/queries/php.ts
  • tests/cli/cliRun.test.ts
  • tests/config/configSchema.test.ts
  • src/core/tree-sitter/queries/ruby.ts
  • src/config/configSchema.ts
  • src/core/tree-sitter/queries/java.ts
  • src/core/tree-sitter/lang2Query.ts
  • src/core/tree-sitter/queries/rust.ts
  • src/core/tree-sitter/queries/swift.ts
  • src/cli/actions/defaultAction.ts
  • src/core/tree-sitter/queries/c_sharp.ts
  • tests/cli/actions/defaultAction.test.ts
  • src/cli/cliRun.ts
  • src/core/tree-sitter/queries/javascript.ts
  • src/core/tree-sitter/queries/python.ts
  • src/core/tree-sitter/queries/cpp.ts
  • repomix.config.json
  • package.json
  • src/core/tree-sitter/queries/typescript.ts
  • src/core/tree-sitter/queries/go.ts
  • src/core/tree-sitter/loadLanguage.ts
  • src/core/tree-sitter/ext2lang.ts
  • src/core/tree-sitter/queries/c.ts
  • tests/core/tree-sitter/LanguageParser.test.ts
  • src/core/tree-sitter/queries/index.ts
🧰 Additional context used
🪛 Biome (1.9.4)
src/core/tree-sitter/LanguageParser.ts

[error] 5-5: Some named imports are only used as types.

This import is only used as a type.

Importing the types with import type ensures that they are removed by the compilers and avoids loading unnecessary modules.
Safe fix: Add inline type keywords.

(lint/style/useImportType)

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Test (macos-latest, 21.x)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Test coverage
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
src/core/tree-sitter/LanguageParser.ts (1)

44-51: LGTM!

The method correctly handles file extension extraction and type-safe language lookup.

tests/core/tree-sitter/parseFile.test.ts (1)

5-192: Add error handling tests.

The test suite needs to verify error handling scenarios.

Add tests for:

  • Malformed code
  • Unsupported file extensions
  • Null or undefined inputs
  • Invalid configurations

Run the following script to verify current error handling:

#!/bin/bash
# Description: Verify error handling across different scenarios

# Test 1: Check if getFn_parseFile is exported correctly
rg -A 5 "export.*getFn_parseFile" src/

# Test 2: Check Tree-sitter query patterns for each language
fd -e scm -x cat {} \; -x echo "---"

# Test 3: Check for any existing error handling
rg "throw|catch|Error" src/ -g "!*.test.ts"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/core/tree-sitter/parseFile.ts (1)

52-54: 🛠️ Refactor suggestion

Improve error handling and logging.

The current error handling simply logs to console and continues. Consider:

  1. Using proper error logging with severity levels
  2. Adding error context
  3. Allowing error handling configuration
-    } catch (error) {
-      console.log(`Error parsing file: ${error}\n`);
+    } catch (error: unknown) {
+      const context = {
+        filePath,
+        language: lang,
+        error: error instanceof Error ? error.message : String(error)
+      };
+      logger.error('Error parsing file:', context);
+      if (config.failOnError) {
+        throw new Error(`Failed to parse ${filePath}: ${context.error}`);
+      }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests

🧹 Nitpick comments (17)
src/core/tree-sitter/queries/python.ts (1)

1-14: Consider enhancing Python query coverage.

While the current queries cover basic class/function definitions and calls, consider adding support for:

  • Decorated functions/classes
  • Lambda functions
  • Async functions/methods
  • Class methods and static methods

Here's an example of enhanced queries:

 export default `
 (class_definition
   name: (identifier) @name.definition.class) @definition.class

+(decorated_definition
+  definition: (class_definition
+    name: (identifier) @name.definition.class)) @definition.class
+
 (function_definition
   name: (identifier) @name.definition.function) @definition.function

+(decorated_definition
+  definition: (function_definition
+    name: (identifier) @name.definition.function)) @definition.function
+
+(lambda) @definition.function
+
+(async_function_definition
+  name: (identifier) @name.definition.function) @definition.function
+
 (call
   function: [
       (identifier) @name.reference.call
       (attribute
         attribute: (identifier) @name.reference.call)
   ]) @reference.call
 `;
src/core/tree-sitter/queries/java.ts (1)

1-22: Enhance Java query coverage for generics and annotations.

The current queries handle basic Java constructs well, but could be improved by adding support for:

  • Generic type parameters in class/interface declarations
  • Constructor declarations
  • Annotated elements
  • Generic method declarations

Here's an example of enhanced queries:

 export default `
 (class_declaration
   name: (identifier) @name.definition.class) @definition.class

+(generic_type
+  type: (type_identifier) @name.reference.type) @reference.type
+
+(constructor_declaration
+  name: (identifier) @name.definition.constructor) @definition.constructor
+
+(annotation
+  name: (identifier) @name.reference.annotation) @reference.annotation
+
 (method_declaration
   name: (identifier) @name.definition.method) @definition.method

+(generic_method
+  name: (identifier) @name.definition.method) @definition.method
+
 (method_invocation
   name: (identifier) @name.reference.call
   arguments: (argument_list) @reference.call)

 (interface_declaration
   name: (identifier) @name.definition.interface) @definition.interface

 (type_list
   (type_identifier) @name.reference.implementation) @reference.implementation

 (object_creation_expression
   type: (type_identifier) @name.reference.class) @reference.class

 (superclass (type_identifier) @name.reference.class) @reference.class
 `;
src/core/tree-sitter/queries/php.ts (1)

1-28: Enhance PHP query coverage for static and namespaced elements.

The queries handle most PHP constructs well, but could be improved by adding support for:

  • Static method calls and properties
  • Namespaced class and function references
  • Anonymous functions and arrow functions

Here's an example of enhanced queries:

 export default `
 (class_declaration
   name: (name) @name.definition.class) @definition.class

+(namespace_definition
+  name: (namespace_name) @name.definition.namespace) @definition.namespace
+
+(namespace_use_declaration
+  (namespace_name) @name.reference.namespace) @reference.namespace
+
 (function_definition
   name: (name) @name.definition.function) @definition.function

+(arrow_function) @definition.function
+
+(anonymous_function) @definition.function
+
 (method_declaration
   name: (name) @name.definition.function) @definition.function

 (object_creation_expression
   [
     (qualified_name (name) @name.reference.class)
     (variable_name (name) @name.reference.class)
   ]) @reference.class

+(static_call_expression
+  class: [
+    (qualified_name (name) @name.reference.class)
+    (variable_name (name) @name.reference.class)
+  ]
+  name: (name) @name.reference.call) @reference.call
+
 (function_call_expression
   function: [
     (qualified_name (name) @name.reference.call)
     (variable_name (name)) @name.reference.call
   ]) @reference.call

 (scoped_call_expression
   name: (name) @name.reference.call) @reference.call

 (member_call_expression
   name: (name) @name.reference.call) @reference.call
 `;
src/core/tree-sitter/lang2query.ts (2)

1-13: Consider using path aliases for imports.

The relative imports could be simplified using path aliases, making the code more maintainable if the file structure changes.

Example configuration in tsconfig.json:

{
  "compilerOptions": {
    "baseUrl": ".",
    "paths": {
      "@queries/*": ["src/core/tree-sitter/queries/*"]
    }
  }
}

Then update imports:

-import c from './queries/c.js';
-import c_sharp from './queries/c_sharp.js';
+import c from '@queries/c.js';
+import c_sharp from '@queries/c_sharp.js';
// ... and so on

14-27: Consider using Record type for better type safety.

The language mapping could benefit from using TypeScript's Record type for better type safety.

+type QueryModule = string;
+
-export const lang2Query = {
+export const lang2Query: Record<string, QueryModule> = {
   javascript: javascript,
   typescript: typescript,
   // ... rest of the mappings
 };
src/core/tree-sitter/queries/cpp.ts (1)

1-17: Enhance C++ query patterns for better coverage.

The current queries cover basic C++ declarations but could be improved by adding patterns for:

  1. Template declarations and specializations
  2. Namespace declarations
  3. Constructor and destructor declarations
  4. Comments and documentation strings (like in other language queries)

Here's an example of additional patterns to consider:

+// Template declarations
+(template_declaration
+  declaration: (class_specifier name: (type_identifier) @name.definition.class)) @definition.class
+
+// Namespace declarations
+(namespace_definition
+  name: (identifier) @name.definition.module) @definition.module
+
+// Constructor declarations
+(function_declarator
+  declarator: (qualified_identifier
+    scope: (namespace_identifier)
+    name: (identifier) @name.definition.constructor)) @definition.constructor
+
+// Comments
+(
+  (comment)* @doc
+  .
+  (class_specifier name: (type_identifier) @name.definition.class) @definition.class
+  (#strip! @doc "^//\\s*")
+  (#set-adjacent! @doc @definition.class)
+)
src/core/tree-sitter/queries/go.ts (1)

1-32: Add patterns for Go interfaces and structs.

The queries are well-structured but could be enhanced by adding patterns for:

  1. Interface declarations
  2. Struct declarations
  3. Type alias declarations

Here's an example of additional patterns to consider:

+// Interface declarations
+(
+  (comment)* @doc
+  .
+  (type_spec
+    name: (type_identifier) @name.definition.interface
+    type: (interface_type)) @definition.interface
+  (#strip! @doc "^//\\s*")
+  (#set-adjacent! @doc @definition.interface)
+)
+
+// Struct declarations
+(
+  (comment)* @doc
+  .
+  (type_spec
+    name: (type_identifier) @name.definition.class
+    type: (struct_type)) @definition.class
+  (#strip! @doc "^//\\s*")
+  (#set-adjacent! @doc @definition.class)
+)
+
+// Type alias declarations
+(type_spec
+  name: (type_identifier) @name.definition.type
+  type: (type_identifier) @name.reference.type) @definition.type
src/core/tree-sitter/queries/c_sharp.ts (1)

1-48: Add patterns for C# attributes, properties, and events.

The queries are comprehensive but could be enhanced by adding:

  1. Attribute declarations and usage
  2. Property declarations
  3. Event declarations
  4. XML documentation comments

Here's an example of additional patterns to consider:

+// Attribute declarations
+(attribute
+  name: (identifier) @name.reference.class) @reference.class
+
+// Property declarations
+(property_declaration
+  name: (identifier) @name.definition.property) @definition.property
+
+// Event declarations
+(event_declaration
+  name: (identifier) @name.definition.event) @definition.event
+
+// XML documentation comments
+(
+  (documentation_comment)* @doc
+  .
+  [
+    (class_declaration
+      name: (identifier) @name.definition.class) @definition.class
+    (method_declaration
+      name: (identifier) @name.definition.method) @definition.method
+  ]
+  (#strip! @doc "^///\\s*")
+  (#select-adjacent! @doc [@definition.class @definition.method])
+)
src/core/tree-sitter/queries/ruby.ts (1)

1-66: Add patterns for Ruby blocks, procs, and lambdas.

The queries are well-structured but could be enhanced by adding patterns for:

  1. Block declarations
  2. Proc and lambda declarations
  3. Instance and class variable declarations

Here's an example of additional patterns to consider:

+// Block declarations
+(block
+  parameters: (block_parameters
+    (identifier) @name.definition.parameter)) @definition.block
+
+// Proc declarations
+(call
+  method: (identifier) @name
+  arguments: (argument_list
+    (lambda))) @definition.lambda
+(#eq? @name "proc")
+
+// Instance and class variables
+(instance_variable) @name.reference.instance_variable
+(class_variable) @name.reference.class_variable

Additionally, consider adding a pattern to exclude more Ruby keywords in the call pattern on line 64:

-(#not-match? @name.reference.call "^(lambda|load|require|require_relative|__FILE__|__LINE__)$")
+(#not-match? @name.reference.call "^(lambda|load|require|require_relative|__FILE__|__LINE__|include|extend|attr_reader|attr_writer|attr_accessor)$")
src/core/tree-sitter/queries/rust.ts (3)

4-11: Consider adding visibility modifiers to ADT patterns.

The ADT patterns could be more specific by including visibility modifiers (pub, pub(crate), etc.) to better distinguish between public and private types.

 (struct_item
+    visibility: (_)?  ; Optional visibility modifier
     name: (type_identifier) @name.definition.class) @definition.class

 (enum_item
+    visibility: (_)?  ; Optional visibility modifier
     name: (type_identifier) @name.definition.class) @definition.class

 (union_item
+    visibility: (_)?  ; Optional visibility modifier
     name: (type_identifier) @name.definition.class) @definition.class

20-22: Add pattern for associated functions.

The method definition pattern only captures instance methods. Consider adding a pattern for associated functions (static methods).

 (declaration_list
     (function_item
-        name: (identifier) @name.definition.method)) @definition.method
+        name: (identifier) @name.definition.method
+        parameters: (parameters
+            (self_parameter)? @is_method  ; Optional self parameter
+        ))) @definition.method

56-61: Add pattern for generic type parameters.

The implementation patterns could be enhanced to capture generic type parameters.

 (impl_item
+    type_parameters: (type_parameters)?  ; Optional generic parameters
     trait: (type_identifier) @name.reference.implementation) @reference.implementation

 (impl_item
+    type_parameters: (type_parameters)?  ; Optional generic parameters
     type: (type_identifier) @name.reference.implementation
     !trait) @reference.implementation
src/core/tree-sitter/LanguageParser.ts (1)

44-51: Improve type safety for file extensions.

The file extension handling could be more type-safe by using a predefined set of supported extensions.

+type SupportedExtension = keyof typeof ext2lang;
+
 public guessTheLang(filePath: string) {
   const ext = this.getFileExtension(filePath);
-  if (!Object.keys(ext2lang).includes(ext)) {
+  if (!this.isSupportedExtension(ext)) {
     return undefined;
   }
-  const lang = ext2lang[ext as keyof typeof ext2lang] as keyof typeof lang2Query;
+  const lang = ext2lang[ext as SupportedExtension] as keyof typeof lang2Query;
   return lang;
 }
+
+private isSupportedExtension(ext: string): ext is SupportedExtension {
+  return ext in ext2lang;
+}
src/core/tree-sitter/parseFile.ts (1)

7-7: Implement the unused config parameter.

The TODO comment indicates that the config parameter is not used. Consider implementing configuration options such as:

  • Minimum line threshold for compression
  • Custom patterns to include/exclude
  • Compression level settings

Would you like me to help implement these configuration options?

src/core/tree-sitter/queries/javascript.ts (3)

1-10: Consider handling getters and setters in method definitions.

The method definition pattern excludes constructors but doesn't specifically handle getters and setters. Consider adding patterns for these special method types.

 (method_definition
   name: (property_identifier) @name.definition.method) @definition.method
 (#not-eq? @name.definition.method "constructor")
+  (#not-match? @name.definition.method "^(get|set)\\s")

25-38: Enhance function declaration patterns for async/generator functions.

The function declaration pattern could be more specific about async and generator functions to provide better context in the output.

 [
   (function_declaration
     name: (identifier) @name.definition.function)
   (generator_function
     name: (identifier) @name.definition.function)
   (generator_function_declaration
     name: (identifier) @name.definition.function)
+  (async_function_declaration
+    name: (identifier) @name.definition.function)
 ] @definition.function

75-79: Improve call expression pattern robustness.

The call expression pattern could be more robust by handling additional cases like new.target, super(), etc.

 (
   (call_expression
     function: (identifier) @name.reference.call) @reference.call
   (#not-match? @name.reference.call "^(require)$")
+  (#not-match? @name.reference.call "^(super|new\\.target)$")
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 548790e and 9cb5469.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • package.json (1 hunks)
  • repomix.config.json (1 hunks)
  • src/cli/actions/defaultAction.ts (1 hunks)
  • src/cli/cliRun.ts (1 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2lang.ts (1 hunks)
  • src/core/tree-sitter/lang2query.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/README.md (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/c_sharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (1 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/core/tree-sitter/queries/README.md
🚧 Files skipped from review as they are similar to previous changes (13)
  • src/cli/cliRun.ts
  • package.json
  • repomix.config.json
  • tests/cli/cliRun.test.ts
  • src/core/tree-sitter/queries/swift.ts
  • tests/cli/actions/defaultAction.test.ts
  • src/core/tree-sitter/queries/typescript.ts
  • src/config/configSchema.ts
  • src/core/tree-sitter/ext2lang.ts
  • src/core/tree-sitter/queries/c.ts
  • src/core/tree-sitter/loadLanguage.ts
  • tests/core/tree-sitter/LanguageParser.test.ts
  • tests/config/configSchema.test.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/tree-sitter/parseFile.ts

[warning] 12-13: src/core/tree-sitter/parseFile.ts#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 17-18: src/core/tree-sitter/parseFile.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 47-47: src/core/tree-sitter/parseFile.ts#L47
Added line #L47 was not covered by tests


[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests

src/core/tree-sitter/LanguageParser.ts

[warning] 31-32: src/core/tree-sitter/LanguageParser.ts#L31-L32
Added lines #L31 - L32 were not covered by tests

src/cli/actions/defaultAction.ts

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

src/core/file/workers/fileProcessWorker.ts

[warning] 38-39: src/core/file/workers/fileProcessWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
src/core/tree-sitter/lang2query.ts (1)

29-29: LGTM!

The type definition using keyof typeof is a good practice for type safety.

src/core/tree-sitter/LanguageParser.ts (2)

21-27: Add error handling in prepareLang.

The method should handle potential errors from loadLanguage and parser.setLanguage.


53-55: Add cleanup method for proper resource management.

Consider adding a cleanup method to properly dispose of parsers when they are no longer needed.

src/core/file/workers/fileProcessWorker.ts (2)

37-40: Add error handling for parseFile result.

The compressed mode needs better error handling. The current implementation silently falls back to the original content when parsing fails.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 38-39: src/core/file/workers/fileProcessWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests


37-40: Add test coverage for compress functionality.

The new compress functionality is not covered by tests. Consider adding test cases for:

  • Successful compression
  • Failed compression with fallback
  • Unsupported file types
❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Check test coverage for compress functionality
rg -l "test.*compress" "tests/"

Length of output: 31


Missing Test Coverage for Compress Functionality

After re-verifying the tests by searching for the keyword "compress" in the tests directory, there are no test cases covering the compress functionality. The review comment is correct and highlights the need to add tests for:

  • Successful compression
  • Failed compression with fallback
  • Unsupported file types

Please add the necessary tests.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 38-39: src/core/file/workers/fileProcessWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

src/core/tree-sitter/parseFile.ts (1)

8-56: Optimize memory usage and improve type safety.

Consider using a generator to yield chunks instead of storing them in memory.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 12-13: src/core/tree-sitter/parseFile.ts#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 17-18: src/core/tree-sitter/parseFile.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 47-47: src/core/tree-sitter/parseFile.ts#L47
Added line #L47 was not covered by tests


[warning] 53-54: src/core/tree-sitter/parseFile.ts#L53-L54
Added lines #L53 - L54 were not covered by tests

src/cli/actions/defaultAction.ts (1)

169-172: Add test coverage for compress option.

The new compress option handling is not covered by tests. Add test cases to verify the behavior when compress option is set to true/false.

❌ Verification inconclusive

🏁 Script executed:

#!/bin/bash
# Description: Verify test coverage for compress option

# Check existing test coverage
rg -l "compress.*options" tests/

Length of output: 34


Test Coverage Missing for Compress Option

The repository lacks tests verifying the new compress option behavior (both true and false scenarios). Please add test cases that explicitly check that the CLI configuration is updated correctly when the compress option is provided.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

tests/core/tree-sitter/parseFile.test.ts (3)

1-15: Add test helper to reduce code duplication.

The test setup is repeated across all test cases. Consider extracting it to a helper function.

function createParseTest(language: string, content: string, expectedIdentifier: string) {
  test(`should parse ${language} correctly`, async () => {
    const filePath = `dummy.${language}`;
    const config = {};
    const parseFile = await getFn_parseFile();
    const result = await parseFile(content, filePath, config as RepomixConfigMerged);
    expect(typeof result).toBe('string');
    expect(result).toContain(expectedIdentifier);
  });
}

48-57: Add error handling tests.

The test suite lacks error handling scenarios. Add tests for malformed code, unsupported file extensions, and invalid inputs.

test('should handle malformed code gracefully', async () => {
  const fileContent = 'def greet(name: print(f"Hello, {name}")'; // Missing closing parenthesis
  const filePath = 'dummy.py';
  const config = {};
  const parseFile = await getFn_parseFile();
  await expect(parseFile(fileContent, filePath, config as RepomixConfigMerged)).rejects.toThrow();
});

test('should handle unsupported file extensions', async () => {
  const fileContent = 'function test() {}';
  const filePath = 'dummy.unsupported';
  const config = {};
  const parseFile = await getFn_parseFile();
  await expect(parseFile(fileContent, filePath, config as RepomixConfigMerged)).rejects.toThrow();
});

1-192: Add tests for compressed output format.

The test suite should verify the behavior when compression is enabled.

test('should handle compressed output format', async () => {
  const fileContent = 'function test() { console.log("test"); }';
  const filePath = 'dummy.js';
  const config = { output: { compress: true } };
  const parseFile = await getFn_parseFile();
  const result = await parseFile(fileContent, filePath, config as RepomixConfigMerged);
  expect(result).not.toContain('\n');
  expect(result).toContain('test');
});

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/core/tree-sitter/loadLanguage.ts (1)

7-10: ⚠️ Potential issue

Add type safety, validation, and error handling.

The function needs improvements in several areas:

  1. Missing return type annotation
  2. No input validation
  3. No error handling
  4. No Parser initialization check

Apply this diff:

-export async function loadLanguage(langName: string) {
+export async function loadLanguage(langName: string): Promise<Parser.Language> {
+  if (!langName || typeof langName !== 'string') {
+    throw new Error('Invalid language name');
+  }
+
+  try {
+    if (!Parser.isInitialized) {
+      await Parser.init();
+    }
+
     const wasmPath = require.resolve(`tree-sitter-wasms/out/tree-sitter-${langName}.wasm`);
-    return await Parser.Language.load(wasmPath);
+    return await Parser.Language.load(wasmPath);
+  } catch (error) {
+    throw new Error(`Failed to load language ${langName}: ${error.message}`);
+  }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cb5469 and 00e677f.

📒 Files selected for processing (1)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/core/tree-sitter/loadLanguage.ts

[error] 3-3: A Node.js builtin module should be imported with the node: protocol.

Using the node: protocol is more explicit and signals that the imported module belongs to Node.js.
Unsafe fix: Add the node: protocol.

(lint/style/useNodejsImportProtocol)

🪛 GitHub Actions: CI
src/core/tree-sitter/loadLanguage.ts

[error] 3-3: A Node.js builtin module should be imported with the node: protocol.

🔇 Additional comments (1)
src/core/tree-sitter/loadLanguage.ts (1)

5-5: LGTM!

The require initialization is correctly implemented using createRequire with import.meta.url.

@huy-trn huy-trn marked this pull request as ready for review February 13, 2025 03:48
@huy-trn huy-trn marked this pull request as draft February 13, 2025 04:02
@huy-trn huy-trn marked this pull request as ready for review February 13, 2025 04:19
@huy-trn
Copy link
Contributor Author

huy-trn commented Feb 13, 2025

Hi @yamadashy

I've got some update:

  • The new cli option added is now named "--compress"

  • I have resolved the merge conflicts.

  • Thanks to Aider's tree-sitter queries, the quality got some improvements on typescript code parsing (I haven't tried on other langs).

  • I'm still struggling to make it good enough for a release. The PR is almost good to merge, except the queries, and the algorithm in "parseFile" function which is frankly the most important thing, because it decide s quality of the output.

@yamadashy
Copy link
Owner

yamadashy commented Feb 13, 2025

@huy-trn
Thank you so much for this amazing work! 🎉

I really appreciate the effort you put into this!

In the future, we could extend this by allowing compression types to be specified, something like:

# Default - extracts function/class definitions (current implementation)
repomix --compress

# Most minimal output - only essential structure
repomix --compress minimal

# Extract only API interfaces and types
repomix --compress api

...

But that's for future enhancement. For now, let's focus on getting this solid foundation merged and released.

Great work! I'll take a closer look at the code later, but this is looking really promising! 👍

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/core/tree-sitter/loadLanguage.ts (2)

8-19: LGTM! Consider improving error handling.

The function correctly validates input and handles errors. However, consider adding more specific error types for better error handling.

Apply this diff to improve error handling:

+class LanguageLoadError extends Error {
+  constructor(langName: string, cause: Error) {
+    super(`Failed to load language ${langName}: ${cause.message}`);
+    this.name = 'LanguageLoadError';
+    this.cause = cause;
+  }
+}

 export async function loadLanguage(langName: string): Promise<Parser.Language> {
   if (!langName) {
-    throw new Error('Invalid language name');
+    throw new LanguageLoadError(langName, new Error('Invalid language name'));
   }

   try {
     const wasmPath = await getWasmPath(langName);
     return await Parser.Language.load(wasmPath);
   } catch (error: any) {
-    throw new Error(`Failed to load language ${langName}: ${error.message}`);
+    throw new LanguageLoadError(langName, error);
   }
 }
🧰 Tools
🪛 Biome (1.9.4)

[error] 16-16: Unexpected any. Specify a different type.

any disables many type checking rules. Its use should be avoided.

(lint/suspicious/noExplicitAny)

🪛 GitHub Check: codecov/patch

[warning] 10-11: src/core/tree-sitter/loadLanguage.ts#L10-L11
Added lines #L10 - L11 were not covered by tests


[warning] 17-18: src/core/tree-sitter/loadLanguage.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


21-29: LGTM! Consider improving error handling.

The function correctly resolves and verifies the WASM file path. However, consider adding more specific error types for better error handling.

Apply this diff to improve error handling:

+class WasmFileNotFoundError extends Error {
+  constructor(langName: string, wasmPath: string) {
+    super(`WASM file not found for language ${langName}: ${wasmPath}`);
+    this.name = 'WasmFileNotFoundError';
+  }
+}

 async function getWasmPath(langName: string): Promise<string> {
   const wasmPath = require.resolve(`tree-sitter-wasms/out/tree-sitter-${langName}.wasm`);
   try {
     await fs.access(wasmPath);
     return wasmPath;
   } catch {
-    throw new Error(`WASM file not found for language ${langName}: ${wasmPath}`);
+    throw new WasmFileNotFoundError(langName, wasmPath);
   }
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 27-28: src/core/tree-sitter/loadLanguage.ts#L27-L28
Added lines #L27 - L28 were not covered by tests

src/core/file/workers/fileProcessWorker.ts (1)

37-50: LGTM! Consider improving error handling.

The function correctly handles compressed output and errors. However, consider adding more specific error types and improving error messages.

Apply this diff to improve error handling:

+class ParseFileError extends Error {
+  constructor(filePath: string, cause: Error) {
+    super(`Error parsing ${filePath} in compressed mode: ${cause.message}`);
+    this.name = 'ParseFileError';
+    this.cause = cause;
+  }
+}

 if (config.output.compress) {
   const parseFile = await getFn_parseFile();
   try {
     const parsedContent = await parseFile(processedContent, rawFile.path, config);
     if (parsedContent === undefined) {
-      logger.trace(`Failed to parse ${rawFile.path} in compressed mode. Using original content.`);
+      logger.warn(`Failed to parse ${rawFile.path} in compressed mode. Using original content.`);
     }
     processedContent = parsedContent ?? processedContent;
   } catch (error: any) {
-    logger.error(`Error parsing ${rawFile.path} in compressed mode: ${error.message}`);
-    //re-throw error
-    throw error;
+    throw new ParseFileError(rawFile.path, error);
   }
 }
🧰 Tools
🪛 Biome (1.9.4)

[error] 45-45: Unexpected any. Specify a different type.

any disables many type checking rules. Its use should be avoided.

(lint/suspicious/noExplicitAny)

🪛 GitHub Check: codecov/patch

[warning] 38-46: src/core/file/workers/fileProcessWorker.ts#L38-L46
Added lines #L38 - L46 were not covered by tests


[warning] 48-49: src/core/file/workers/fileProcessWorker.ts#L48-L49
Added lines #L48 - L49 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ecb2f1 and 43525d1.

📒 Files selected for processing (2)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/core/file/workers/fileProcessWorker.ts

[error] 45-45: Unexpected any. Specify a different type.

any disables many type checking rules. Its use should be avoided.

(lint/suspicious/noExplicitAny)

src/core/tree-sitter/loadLanguage.ts

[error] 16-16: Unexpected any. Specify a different type.

any disables many type checking rules. Its use should be avoided.

(lint/suspicious/noExplicitAny)

🪛 GitHub Check: codecov/patch
src/core/file/workers/fileProcessWorker.ts

[warning] 38-46: src/core/file/workers/fileProcessWorker.ts#L38-L46
Added lines #L38 - L46 were not covered by tests


[warning] 48-49: src/core/file/workers/fileProcessWorker.ts#L48-L49
Added lines #L48 - L49 were not covered by tests

src/core/tree-sitter/loadLanguage.ts

[warning] 10-11: src/core/tree-sitter/loadLanguage.ts#L10-L11
Added lines #L10 - L11 were not covered by tests


[warning] 17-18: src/core/tree-sitter/loadLanguage.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 27-28: src/core/tree-sitter/loadLanguage.ts#L27-L28
Added lines #L27 - L28 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Test (macos-latest, 22.x)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Test (windows-latest, 18.0.0)
  • GitHub Check: Analyze (javascript-typescript)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
src/core/tree-sitter/loadLanguage.ts (2)

8-20: Enhance input validation.

While the function includes basic input validation, it could be more thorough to prevent potential issues.

Apply this diff to improve input validation:

 export async function loadLanguage(langName: string): Promise<Parser.Language> {
-  if (!langName) {
+  if (!langName || typeof langName !== 'string' || langName.trim() === '') {
     throw new Error('Invalid language name');
   }
+  langName = langName.trim().toLowerCase();

22-30: Add input validation and improve error handling.

The function could benefit from input validation and more descriptive error messages.

Apply this diff to improve the function:

 async function getWasmPath(langName: string): Promise<string> {
+  if (!langName || typeof langName !== 'string' || langName.trim() === '') {
+    throw new Error('Invalid language name');
+  }
+  langName = langName.trim().toLowerCase();
+
   const wasmPath = require.resolve(`tree-sitter-wasms/out/tree-sitter-${langName}.wasm`);
   try {
     await fs.access(wasmPath);
     return wasmPath;
-  } catch {
-    throw new Error(`WASM file not found for language ${langName}: ${wasmPath}`);
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    throw new Error(`WASM file not found for language ${langName} at ${wasmPath}: ${message}`);
   }
 }
src/core/tree-sitter/LanguageParser.ts (2)

18-20: Add input validation and handle edge cases.

The method should validate input and handle edge cases like files without extensions.

Apply this diff to improve the method:

 private getFileExtension(filePath: string) {
+  if (!filePath || typeof filePath !== 'string') {
+    throw new Error('Invalid file path');
+  }
+  const ext = path.extname(filePath).toLowerCase().slice(1);
+  return ext || '';
-  return path.extname(filePath).toLowerCase().slice(1);
 }

50-57: Add input validation and improve type safety.

The method should validate input and use type guards instead of type assertions.

Apply this diff to improve the method:

 public guessTheLang(filePath: string) {
+  if (!filePath || typeof filePath !== 'string') {
+    throw new Error('Invalid file path');
+  }
   const ext = this.getFileExtension(filePath);
-  if (!Object.keys(ext2lang).includes(ext)) {
+  if (!(ext in ext2lang)) {
     return undefined;
   }
-  const lang = ext2lang[ext as keyof typeof ext2lang] as keyof typeof lang2Query;
+  const lang = ext2lang[ext as keyof typeof ext2lang];
+  if (!(lang in lang2Query)) {
+    return undefined;
+  }
   return lang;
 }
src/core/tree-sitter/parseFile.ts (1)

8-8: Address the TODO comment.

The config parameter is marked as unused. Consider removing it if not needed or document why it's reserved for future use.

Do you want me to help implement the config parameter usage or remove it?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43525d1 and 9f5a75d.

📒 Files selected for processing (4)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Test (macos-latest, 19.x)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Test (windows-latest, 18.0.0)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
src/core/tree-sitter/LanguageParser.ts (1)

43-48: Add error handling and input validation.

The method should validate input and handle errors from prepareLang.

Apply this diff to improve the method:

 public async getQueryForLang(name: SupportedLang) {
+  if (!name) {
+    throw new Error('Invalid language name');
+  }
+  try {
     if (!this.loadedQueries[name]) {
       await this.prepareLang(name);
     }
     return this.loadedQueries[name];
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    throw new Error(`Failed to get query for language ${name}: ${message}`);
+  }
 }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/core/tree-sitter/LanguageParser.ts (3)

18-20: Add input validation and edge case handling.

The getFileExtension method could be more robust by handling edge cases.

 private getFileExtension(filePath: string) {
+  if (!filePath) {
+    return '';
+  }
   return path.extname(filePath).toLowerCase().slice(1);
 }

35-40: Add input validation to parser and query getters.

Both methods could benefit from input validation.

Apply this pattern to both methods:

 public async getParserForLang(name: SupportedLang) {
+  if (!name || typeof name !== 'string') {
+    throw new RepomixError('Invalid language name');
+  }
   if (!this.loadedParsers[name]) {
     await this.prepareLang(name);
   }
   return this.loadedParsers[name];
 }

Also applies to: 43-48

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 37-38: src/core/tree-sitter/LanguageParser.ts#L37-L38
Added lines #L37 - L38 were not covered by tests


50-57: Improve type safety and add documentation.

The type casting in guessTheLang could be more type-safe, and the method would benefit from documentation.

+/**
+ * Attempts to determine the programming language based on file extension.
+ * @param filePath - The path to the file
+ * @returns The detected language or undefined if not supported
+ */
 public guessTheLang(filePath: string) {
   const ext = this.getFileExtension(filePath);
-  if (!Object.keys(ext2lang).includes(ext)) {
+  if (!ext || !Object.prototype.hasOwnProperty.call(ext2lang, ext)) {
     return undefined;
   }
-  const lang = ext2lang[ext as keyof typeof ext2lang] as keyof typeof lang2Query;
+  const lang = ext2lang[ext as keyof typeof ext2lang];
+  return lang satisfies keyof typeof lang2Query;
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f5a75d and 466343d.

📒 Files selected for processing (1)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/tree-sitter/LanguageParser.ts

[warning] 30-32: src/core/tree-sitter/LanguageParser.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 37-38: src/core/tree-sitter/LanguageParser.ts#L37-L38
Added lines #L37 - L38 were not covered by tests


[warning] 63-65: src/core/tree-sitter/LanguageParser.ts#L63-L65
Added lines #L63 - L65 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Test (windows-latest, 20.x)
🔇 Additional comments (3)
src/core/tree-sitter/LanguageParser.ts (3)

1-17: LGTM! Well-structured class with proper type definitions.

The imports are well-organized, and the class structure follows TypeScript best practices with proper type annotations for the private fields.


22-33: Add test coverage for error handling paths.

The error handling in prepareLang is not covered by tests according to static analysis. Please add test cases to cover error scenarios.

Would you like me to help generate test cases for error handling scenarios? For example:

  1. Language loading failure
  2. Parser initialization failure
  3. Query creation failure
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-32: src/core/tree-sitter/LanguageParser.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


59-66: Add test coverage for initialization error handling.

The error handling in init is not covered by tests according to static analysis.

Would you like me to help generate test cases for the initialization error scenario? For example:

  1. Mock Parser.init() to throw an error
  2. Verify that the error is properly caught and wrapped
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 63-65: src/core/tree-sitter/LanguageParser.ts#L63-L65
Added lines #L63 - L65 were not covered by tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
src/core/tree-sitter/LanguageParser.ts (3)

35-40: 🛠️ Refactor suggestion

Add error handling and input validation.

The function should validate input and handle errors from prepareLang.

Apply this diff to improve the method:

 public async getParserForLang(name: SupportedLang) {
+  if (!name) {
+    throw new Error('Invalid language name');
+  }
+  try {
     if (!this.loadedParsers[name]) {
       await this.prepareLang(name);
     }
     return this.loadedParsers[name];
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    throw new Error(`Failed to get parser for language ${name}: ${message}`);
+  }
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 37-38: src/core/tree-sitter/LanguageParser.ts#L37-L38
Added lines #L37 - L38 were not covered by tests


22-33: 🛠️ Refactor suggestion

Add resource cleanup to prevent memory leaks.

The function should clean up resources in case of errors to prevent memory leaks.

Apply this diff to add cleanup:

 private async prepareLang(name: SupportedLang) {
+  let parser: Parser | undefined;
   try {
     const lang = await loadLanguage(name);
-    const parser = new Parser();
+    parser = new Parser();
     parser.setLanguage(lang);
     this.loadedParsers[name] = parser;
     this.loadedQueries[name] = lang.query(lang2Query[name]);
   } catch (error) {
+    if (parser) {
+      parser.delete();
+    }
     const message = error instanceof Error ? error.message : String(error);
     throw new RepomixError(`Failed to prepare language ${name}: ${message}`);
   }
 }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-32: src/core/tree-sitter/LanguageParser.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


43-48: 🛠️ Refactor suggestion

Add error handling and input validation.

The function should validate input and handle errors from prepareLang.

Apply this diff to improve the method:

 public async getQueryForLang(name: SupportedLang) {
+  if (!name) {
+    throw new Error('Invalid language name');
+  }
+  try {
     if (!this.loadedQueries[name]) {
       await this.prepareLang(name);
     }
     return this.loadedQueries[name];
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    throw new Error(`Failed to get query for language ${name}: ${message}`);
+  }
 }
🧹 Nitpick comments (1)
src/core/tree-sitter/parseFile.ts (1)

11-62: Optimize memory usage with generators.

The function should use generators to yield chunks instead of storing them in memory.

Apply this diff to optimize memory usage:

 const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+  if (!fileContent || !filePath) {
+    throw new Error('Invalid input: fileContent and filePath are required');
+  }

   const lines = fileContent.split('\n');
   if (lines.length < 1) {
     return '';
   }

   const lang = languageParser.guessTheLang(filePath);
   if (lang === undefined) {
     return undefined;
   }

   const query = await languageParser.getQueryForLang(lang);
   const parser = await languageParser.getParserForLang(lang);
-  const chunks = [];

+  async function* generateChunks() {
     try {
       const tree = parser.parse(fileContent);
       const captures = query.captures(tree.rootNode);
       captures.sort((a, b) => a.node.startPosition.row - b.node.startPosition.row);

       for (const capture of captures) {
         const { node, name } = capture;
         const startRow = node.startPosition.row;
         const endRow = node.endPosition.row;

         if (!name.includes('name') || !lines[startRow]) {
           continue;
         }

         const selectedLines = lines.slice(startRow, endRow + 1);
         if (selectedLines.length < 1) {
           continue;
         }
-        const chunk = selectedLines.join('\n');
-        chunks.push(chunk);
+        yield selectedLines.join('\n');
       }
     } catch (error: unknown) {
-      logger.log(`Error parsing file: ${error}\n`);
+      const message = error instanceof Error ? error.message : String(error);
+      logger.error(`Error parsing file ${filePath}: ${message}`);
+      throw error;
     }
   }

-  return chunks.join('\n');
+  const chunks = [];
+  for await (const chunk of generateChunks()) {
+    chunks.push(chunk);
+  }
+  return chunks.join('\n');
 };
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 15-16: src/core/tree-sitter/parseFile.ts#L15-L16
Added lines #L15 - L16 were not covered by tests


[warning] 21-22: src/core/tree-sitter/parseFile.ts#L21-L22
Added lines #L21 - L22 were not covered by tests


[warning] 53-53: src/core/tree-sitter/parseFile.ts#L53
Added line #L53 was not covered by tests


[warning] 59-60: src/core/tree-sitter/parseFile.ts#L59-L60
Added lines #L59 - L60 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 466343d and 21520c5.

📒 Files selected for processing (4)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/tree-sitter/queries/typescript.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/tree-sitter/parseFile.ts

[warning] 15-16: src/core/tree-sitter/parseFile.ts#L15-L16
Added lines #L15 - L16 were not covered by tests


[warning] 21-22: src/core/tree-sitter/parseFile.ts#L21-L22
Added lines #L21 - L22 were not covered by tests


[warning] 53-53: src/core/tree-sitter/parseFile.ts#L53
Added line #L53 was not covered by tests


[warning] 59-60: src/core/tree-sitter/parseFile.ts#L59-L60
Added lines #L59 - L60 were not covered by tests

src/core/tree-sitter/LanguageParser.ts

[warning] 30-32: src/core/tree-sitter/LanguageParser.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 37-38: src/core/tree-sitter/LanguageParser.ts#L37-L38
Added lines #L37 - L38 were not covered by tests


[warning] 63-65: src/core/tree-sitter/LanguageParser.ts#L63-L65
Added lines #L63 - L65 were not covered by tests

🔇 Additional comments (2)
src/core/tree-sitter/LanguageParser.ts (2)

18-20: LGTM!

The function correctly extracts and normalizes file extensions.


50-57: LGTM!

The function correctly handles language detection with proper type safety.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/core/tree-sitter/parseFile.ts (1)

17-71: 🛠️ Refactor suggestion

Optimize memory usage and improve error handling.

The current implementation can be improved in several ways:

  1. Use generators to yield chunks instead of storing them in memory
  2. Add input validation
  3. Improve error handling
  4. Add resource cleanup

Apply this diff to improve the implementation:

-export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+export const parseFile = async (
+  fileContent: string,
+  filePath: string,
+  config: RepomixConfigMerged
+): Promise<string | undefined> => {
+  if (!fileContent || !filePath) {
+    throw new Error('Invalid input: fileContent and filePath are required');
+  }
+
   const languageParser = await getLanguageParserSingleton();
 
   const lines = fileContent.split('\n');
   if (lines.length < 1) {
     return '';
   }
 
   const lang: SupportedLang | undefined = languageParser.guessTheLang(filePath);
   if (lang === undefined) {
     return undefined;
   }
 
   const query = await languageParser.getQueryForLang(lang);
   const parser = await languageParser.getParserForLang(lang);
-  const chunks = [];
 
+  async function* generateChunks() {
   try {
     const tree = parser.parse(fileContent);
     const captures = query.captures(tree.rootNode);
     captures.sort((a, b) => a.node.startPosition.row - b.node.startPosition.row);
 
     for (const capture of captures) {
       const { node, name } = capture;
       const startRow = node.startPosition.row;
       const endRow = node.endPosition.row;
 
       if (!name.includes('name') || !lines[startRow]) {
         continue;
       }
 
       const selectedLines = lines.slice(startRow, endRow + 1);
       if (selectedLines.length < 1) {
         continue;
       }
-      const chunk = selectedLines.join('\n');
-      chunks.push(chunk);
+      yield selectedLines.join('\n');
     }
   } catch (error: unknown) {
-    logger.log(`Error parsing file: ${error}\n`);
+    const message = error instanceof Error ? error.message : String(error);
+    logger.error(`Error parsing file ${filePath}: ${message}`);
+    throw error;
   }
-  return chunks.join('\n');
+  }
+
+  const chunks = [];
+  for await (const chunk of generateChunks()) {
+    chunks.push(chunk);
+  }
+  return chunks.join('\n');
 };
+
+parseFile.cleanup = async () => {
+  const languageParser = await getLanguageParserSingleton();
+  await languageParser.cleanup();
+};
🧹 Nitpick comments (2)
src/core/tree-sitter/parseFile.ts (1)

16-17: Address the TODO comment.

The TODO comment indicates that the config parameter is not being used. Consider removing it if it's not needed, or document its intended future use.

Do you want me to help implement the configuration handling?

tests/core/tree-sitter/parseFile.test.ts (1)

50-50: Remove debug logging.

The console.log statement should be removed as it's not needed in tests.

-    console.log(result);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21520c5 and 4da50f6.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (29)
  • package.json (1 hunks)
  • repomix.config.json (1 hunks)
  • src/cli/actions/defaultAction.ts (1 hunks)
  • src/cli/cliRun.ts (1 hunks)
  • src/config/configSchema.ts (2 hunks)
  • src/core/file/workers/fileProcessWorker.ts (2 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2lang.ts (1 hunks)
  • src/core/tree-sitter/lang2query.ts (1 hunks)
  • src/core/tree-sitter/loadLanguage.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/README.md (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/c_sharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
  • tests/cli/actions/defaultAction.test.ts (1 hunks)
  • tests/cli/cliRun.test.ts (2 hunks)
  • tests/config/configSchema.test.ts (3 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
  • tests/core/tree-sitter/parseFile.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (25)
  • src/cli/cliRun.ts
  • repomix.config.json
  • src/core/tree-sitter/ext2lang.ts
  • src/core/tree-sitter/queries/python.ts
  • src/core/tree-sitter/queries/php.ts
  • src/core/tree-sitter/queries/README.md
  • src/core/tree-sitter/queries/rust.ts
  • tests/cli/cliRun.test.ts
  • src/core/tree-sitter/queries/ruby.ts
  • src/cli/actions/defaultAction.ts
  • src/config/configSchema.ts
  • tests/config/configSchema.test.ts
  • package.json
  • tests/cli/actions/defaultAction.test.ts
  • src/core/tree-sitter/queries/swift.ts
  • src/core/tree-sitter/loadLanguage.ts
  • src/core/tree-sitter/lang2query.ts
  • src/core/tree-sitter/queries/c.ts
  • src/core/tree-sitter/queries/go.ts
  • src/core/tree-sitter/queries/cpp.ts
  • src/core/tree-sitter/queries/typescript.ts
  • src/core/tree-sitter/queries/c_sharp.ts
  • tests/core/tree-sitter/LanguageParser.test.ts
  • src/core/tree-sitter/queries/java.ts
  • src/core/tree-sitter/queries/javascript.ts
🔇 Additional comments (2)
src/core/tree-sitter/LanguageParser.ts (1)

1-67: LGTM!

The implementation looks good. The class provides a robust interface for language parsing with proper error handling and resource cleanup.

src/core/file/workers/fileProcessWorker.ts (1)

1-62: LGTM!

The implementation looks good. The file processing logic is well-structured with proper error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/core/tree-sitter/parseFile.ts (1)

67-69: 🛠️ Refactor suggestion

Improve error handling and logging.

The error handling could be improved by:

  1. Using logger.error instead of logger.log
  2. Including more context in the error message
  3. Rethrowing the error for better error propagation

Apply this diff to improve error handling:

   } catch (error: unknown) {
-    logger.log(`Error parsing file: ${error}\n`);
+    const message = error instanceof Error ? error.message : String(error);
+    const context = {
+      filePath,
+      language: lang,
+      error: message
+    };
+    logger.error('Error parsing file:', context);
+    throw new Error(`Failed to parse ${filePath}: ${message}`);
   }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 68-69: src/core/tree-sitter/parseFile.ts#L68-L69
Added lines #L68 - L69 were not covered by tests

🧹 Nitpick comments (6)
src/core/tree-sitter/ext2Lang.ts (1)

4-23: Consider adding more file extensions.

The mapping is comprehensive but could benefit from additional common file extensions:

  • .mjs and .mts for ECMAScript modules
  • .cc for C++
  • .m and .mm for Objective-C/C++

Apply this diff to add more extensions:

 export const ext2Lang = {
   vue: 'javascript',
   cjs: 'javascript',
   js: 'javascript',
+  mjs: 'javascript',
   jsx: 'javascript',
   ts: 'typescript',
+  mts: 'typescript',
   tsx: 'typescript',
   h: 'c',
   c: 'c',
   hpp: 'cpp',
   cpp: 'cpp',
+  cc: 'cpp',
   py: 'python',
   rs: 'rust',
   java: 'java',
   go: 'go',
   cs: 'c_sharp',
   rb: 'ruby',
   php: 'php',
   swift: 'swift',
+  m: 'c',
+  mm: 'cpp',
 };
src/core/tree-sitter/queries/php.ts (1)

1-28: Consider adding more PHP-specific patterns.

The queries cover basic PHP constructs but could benefit from additional patterns:

  • Trait declarations
  • Interface declarations
  • Namespace declarations
  • Use statements

Add these patterns to enhance PHP support:

 export const queryPhp = `
 (class_declaration
   name: (name) @name.definition.class) @definition.class

+; Trait declarations
+(trait_declaration
+  name: (name) @name.definition.trait) @definition.trait
+
+; Interface declarations
+(interface_declaration
+  name: (name) @name.definition.interface) @definition.interface
+
+; Namespace declarations
+(namespace_definition
+  name: (namespace_name) @name.definition.namespace) @definition.namespace
+
+; Use statements
+(use_declaration
+  (qualified_name (name) @name.reference.use)) @reference.use
+
 (function_definition
   name: (name) @name.definition.function) @definition.function

 (method_declaration
   name: (name) @name.definition.function) @definition.function

 (object_creation_expression
   [
     (qualified_name (name) @name.reference.class)
     (variable_name (name) @name.reference.class)
   ]) @reference.class

 (function_call_expression
   function: [
     (qualified_name (name) @name.reference.call)
     (variable_name (name)) @name.reference.call
   ]) @reference.call

 (scoped_call_expression
   name: (name) @name.reference.call) @reference.call

 (member_call_expression
   name: (name) @name.reference.call) @reference.call
 `;
src/core/tree-sitter/queries/go.ts (1)

1-32: Consider adding more Go-specific patterns.

The queries handle basic Go constructs well, especially with comment handling, but could benefit from additional patterns:

  • Interface declarations
  • Struct declarations
  • Package declarations
  • Import statements

Add these patterns to enhance Go support:

 export const queryGo = `
 (
   (comment)* @doc
   .
   (function_declaration
     name: (identifier) @name.definition.function) @definition.function
   (#strip! @doc "^//\\s*")
   (#set-adjacent! @doc @definition.function)
 )

 (
   (comment)* @doc
   .
   (method_declaration
     name: (field_identifier) @name.definition.method) @definition.method
   (#strip! @doc "^//\\s*")
   (#set-adjacent! @doc @definition.method)
 )

+; Interface declarations
+(
+  (comment)* @doc
+  .
+  (type_declaration
+    (type_spec
+      name: (type_identifier) @name.definition.interface
+      type: (interface_type))) @definition.interface
+  (#strip! @doc "^//\\s*")
+  (#set-adjacent! @doc @definition.interface)
+)
+
+; Struct declarations
+(
+  (comment)* @doc
+  .
+  (type_declaration
+    (type_spec
+      name: (type_identifier) @name.definition.struct
+      type: (struct_type))) @definition.struct
+  (#strip! @doc "^//\\s*")
+  (#set-adjacent! @doc @definition.struct)
+)
+
+; Package declarations
+(package_clause
+  name: (identifier) @name.definition.package) @definition.package
+
+; Import declarations
+(import_spec
+  path: (interpreted_string_literal) @name.reference.import) @reference.import
+
 (call_expression
   function: [
     (identifier) @name.reference.call
     (parenthesized_expression (identifier) @name.reference.call)
     (selector_expression field: (field_identifier) @name.reference.call)
     (parenthesized_expression (selector_expression field: (field_identifier) @name.reference.call))
   ]) @reference.call

 (type_spec
   name: (type_identifier) @name.definition.type) @definition.type

 (type_identifier) @name.reference.type @reference.type
 `;
src/core/tree-sitter/queries/cSharp.ts (1)

1-48: Enhance C# query patterns for better code coverage.

The current Tree-sitter query patterns could be enhanced to cover more C# language constructs:

  1. Properties and events
  2. Async/await methods
  3. Generic type parameters
  4. Delegates and lambda expressions

Add these patterns to improve code coverage:

 export const queryCSharp = `
+; property definitions
+(property_declaration
+  name: (identifier) @name.definition.property) @definition.property
+
+; event definitions
+(event_declaration
+  name: (identifier) @name.definition.event) @definition.event
+
+; async method definitions
+(method_declaration
+  modifiers: (modifier_list [(async_modifier)]) @modifiers
+  name: (identifier) @name.definition.method) @definition.method
+
+; generic type parameters
+(type_parameter_list
+  (type_parameter
+    name: (identifier) @name.definition.type_parameter)) @definition.type_parameter
+
+; delegate definitions
+(delegate_declaration
+  name: (identifier) @name.definition.delegate) @definition.delegate
+
+; lambda expressions
+(lambda_expression
+  parameters: (parameter_list) @parameters) @definition.lambda
 ${queryCSharp}`
src/core/tree-sitter/queries/rust.ts (1)

1-62: Enhance Rust query patterns for better code coverage.

The current Tree-sitter query patterns could be enhanced to cover more Rust language constructs:

  1. Async functions and blocks
  2. Lifetime parameters
  3. Const and static items
  4. Associated types in traits

Add these patterns to improve code coverage:

 export const queryRust = `
+; async function definitions
+(async_fn_item
+  name: (identifier) @name.definition.async_function) @definition.async_function
+
+; lifetime parameters
+(lifetime
+  name: (identifier) @name.definition.lifetime) @definition.lifetime
+
+; const items
+(const_item
+  name: (identifier) @name.definition.const) @definition.const
+
+; static items
+(static_item
+  name: (identifier) @name.definition.static) @definition.static
+
+; associated types in traits
+(assoc_type
+  name: (type_identifier) @name.definition.type) @definition.type
 ${queryRust}`
src/core/tree-sitter/parseFile.ts (1)

16-17: Address the TODO comment about unused config parameter.

The config parameter is marked as unused. Consider using it to configure parsing behavior, such as:

  • Filtering specific types of nodes
  • Controlling the depth of parsing
  • Setting parsing options

Would you like me to help implement the configuration options?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4da50f6 and 7c00e21.

📒 Files selected for processing (19)
  • src/cli/actions/defaultAction.ts (2 hunks)
  • src/cli/actions/remoteAction.ts (1 hunks)
  • src/cli/cliSpinner.ts (1 hunks)
  • src/core/tree-sitter/LanguageParser.ts (1 hunks)
  • src/core/tree-sitter/ext2Lang.ts (1 hunks)
  • src/core/tree-sitter/lang2Query.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • src/core/tree-sitter/queries/c.ts (1 hunks)
  • src/core/tree-sitter/queries/cSharp.ts (1 hunks)
  • src/core/tree-sitter/queries/cpp.ts (1 hunks)
  • src/core/tree-sitter/queries/go.ts (1 hunks)
  • src/core/tree-sitter/queries/java.ts (1 hunks)
  • src/core/tree-sitter/queries/javascript.ts (1 hunks)
  • src/core/tree-sitter/queries/php.ts (1 hunks)
  • src/core/tree-sitter/queries/python.ts (1 hunks)
  • src/core/tree-sitter/queries/ruby.ts (1 hunks)
  • src/core/tree-sitter/queries/rust.ts (1 hunks)
  • src/core/tree-sitter/queries/swift.ts (1 hunks)
  • src/core/tree-sitter/queries/typescript.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/cli/actions/remoteAction.ts
🚧 Files skipped from review as they are similar to previous changes (8)
  • src/core/tree-sitter/queries/c.ts
  • src/core/tree-sitter/queries/ruby.ts
  • src/core/tree-sitter/queries/swift.ts
  • src/core/tree-sitter/queries/python.ts
  • src/core/tree-sitter/queries/java.ts
  • src/core/tree-sitter/queries/cpp.ts
  • src/core/tree-sitter/queries/typescript.ts
  • src/core/tree-sitter/queries/javascript.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/tree-sitter/LanguageParser.ts

[warning] 30-32: src/core/tree-sitter/LanguageParser.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 37-38: src/core/tree-sitter/LanguageParser.ts#L37-L38
Added lines #L37 - L38 were not covered by tests


[warning] 63-65: src/core/tree-sitter/LanguageParser.ts#L63-L65
Added lines #L63 - L65 were not covered by tests

src/core/tree-sitter/parseFile.ts

[warning] 23-24: src/core/tree-sitter/parseFile.ts#L23-L24
Added lines #L23 - L24 were not covered by tests


[warning] 29-30: src/core/tree-sitter/parseFile.ts#L29-L30
Added lines #L29 - L30 were not covered by tests


[warning] 62-62: src/core/tree-sitter/parseFile.ts#L62
Added line #L62 was not covered by tests


[warning] 68-69: src/core/tree-sitter/parseFile.ts#L68-L69
Added lines #L68 - L69 were not covered by tests

src/cli/actions/defaultAction.ts

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Test (windows-latest, 18.0.0)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
src/cli/cliSpinner.ts (1)

6-6: LGTM! Good improvement in export style.

The change from default export to named export is a good practice as it makes imports more explicit and improves IDE auto-import functionality.

src/cli/actions/defaultAction.ts (2)

14-14: LGTM! Import statement updated correctly.

The import statement has been properly updated to use named import syntax, aligning with the changes in cliSpinner.ts.


170-172:

❓ Verification inconclusive

Add test coverage for the compress option.

The compress option handling logic needs test coverage to ensure it works as expected.

Would you like me to help you generate test cases for the compress option handling? The test cases should cover:

  1. Setting compress to true/false via CLI options
  2. Verifying the option is correctly merged into the final config
  3. Ensuring the option affects the output as expected

🏁 Script executed:

#!/bin/bash
# Description: Check for existing test coverage of the compress option

# Search for test files that might contain compress option tests
rg -l 'compress' tests/

Length of output: 163


Verify Complete Test Coverage for the Compress Option

  • It appears that multiple test files (e.g., tests/cli/actions/defaultAction.test.ts, tests/cli/cliRun.test.ts) contain references to "compress." Please ensure these tests explicitly cover:
    • Setting compress to both true and false via CLI options.
    • Verifying that the compress property is correctly merged into the final configuration (i.e., in cliConfig.output).
    • Confirming that the compress option correctly affects the output behavior.

If the current tests don’t sufficiently cover these scenarios, please add targeted test cases. Let me know if you’d like assistance generating those tests.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 171-172: src/cli/actions/defaultAction.ts#L171-L172
Added lines #L171 - L172 were not covered by tests

src/core/tree-sitter/ext2Lang.ts (1)

1-3: LGTM! Documentation is helpful.

The reference to Tree-sitter WASM output helps understand the source of language support.

src/core/tree-sitter/lang2Query.ts (1)

1-30: LGTM! Well-organized implementation.

The implementation is clean and well-structured:

  • Imports are organized logically
  • Language-to-query mapping is comprehensive
  • Type definition using keyof typeof is elegant and type-safe

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/core/tree-sitter/languageParser.ts (2)

22-33: Unify error handling approach for consistency.

Currently, prepareLang throws a RepomixError, while init rethrows a generic Error. For consistency across the codebase, consider using RepomixError in both methods to maintain uniform error handling.

Here is a potential fix:

-throw new Error(`Failed to initialize parser: ${message}`);
+throw new RepomixError(`Failed to initialize parser: ${message}`);

36-40: Consider concurrency handling for prepareLang.

When two calls to getParserForLang with the same language occur in parallel, they may trigger prepareLang simultaneously. Currently, nothing prevents duplicative initialization. While Node.js is single-threaded, concurrency through async awaits can still overlap. You may want to employ a locking mechanism or a queue to avoid redundant loads.

src/core/tree-sitter/parseFile.ts (2)

16-16: Address the TODO to utilize config.

A TODO note indicates that config is not used. Decide whether the configuration influences filtering, output formatting, or AST capturing. Otherwise, remove the parameter to avoid confusion.


34-40: Consider logging errors at the error level.

Currently, the code logs parsing errors with logger.log(...). For clarity and structured logging, use logger.error(...) when an exception occurs during parsing, ensuring that errors can be differentiated from regular logs.

-    logger.log(`Error parsing file: ${error}\n`);
+    logger.error(`Error parsing file: ${error}`);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c00e21 and 29e7d9a.

📒 Files selected for processing (3)
  • src/core/tree-sitter/languageParser.ts (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
  • tests/core/tree-sitter/LanguageParser.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/core/tree-sitter/LanguageParser.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test coverage
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
src/core/tree-sitter/languageParser.ts (1)

59-67: Ensure that parser initialization is retried or handled gracefully.

If Parser.init() fails once, future calls to init() might need a way to retry or clean up partial states. Consider whether partial initializations are possible and implement a safe fallback. For now, re-invoking init() might not fix partial failures.

src/core/tree-sitter/parseFile.ts (1)

20-71: Optional: Incorporate streaming or generator approach for large files.

As your captures can iterate through large file contents, consider yielding snippets instead of loading all into memory. This can enhance performance, especially for huge files. Revisit existing bot suggestions about using generators if memory usage is a concern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/core/tree-sitter/parseFile.ts (2)

21-23: ⚠️ Potential issue

Add input validation.

The function should validate input parameters.

Apply this diff to add input validation:

 export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+  if (!fileContent) {
+    throw new Error('Invalid input: fileContent is required');
+  }
+  if (!filePath) {
+    throw new Error('Invalid input: filePath is required');
+  }
+  if (!config) {
+    throw new Error('Invalid input: config is required');
+  }
   const languageParser = await getLanguageParserSingleton();

77-79: 🛠️ Refactor suggestion

Improve error handling.

The current error handling lacks proper error typing and logging structure.

Apply this diff to improve error handling:

-  } catch (error: unknown) {
-    logger.log(`Error parsing file: ${error}\n`);
+  } catch (error: unknown) {
+    const context = {
+      filePath,
+      language: lang,
+      error: error instanceof Error ? error.message : String(error)
+    };
+    logger.error('Error parsing file:', context);
+    throw new Error(`Failed to parse ${filePath}: ${context.error}`);
   }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 78-79: src/core/tree-sitter/parseFile.ts#L78-L79
Added lines #L78 - L79 were not covered by tests

🧹 Nitpick comments (2)
src/core/tree-sitter/parseFile.ts (2)

16-18: Enhance chunk normalization.

The normalizeChunk function could be more robust in handling edge cases.

Apply this diff to improve the implementation:

 function normalizeChunk(chunk: string): string {
-  return chunk.trim();
+  if (!chunk) return '';
+  // Normalize line endings and remove extra whitespace
+  return chunk
+    .replace(/\r\n/g, '\n')
+    .split('\n')
+    .map(line => line.trim())
+    .join('\n')
+    .trim();
 }

20-20: Address the TODO comment.

The config parameter is currently unused. Consider using it to configure parsing behavior.

Would you like me to help implement configuration options for the parser? Some potential options could include:

  • Configurable chunk size limits
  • Filtering options for specific node types
  • Custom normalization rules
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29e7d9a and 9265066.

📒 Files selected for processing (1)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/tree-sitter/parseFile.ts

[warning] 27-28: src/core/tree-sitter/parseFile.ts#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 33-34: src/core/tree-sitter/parseFile.ts#L33-L34
Added lines #L33 - L34 were not covered by tests


[warning] 67-67: src/core/tree-sitter/parseFile.ts#L67
Added line #L67 was not covered by tests


[warning] 78-79: src/core/tree-sitter/parseFile.ts#L78-L79
Added lines #L78 - L79 were not covered by tests

🪛 GitHub Actions: CI
src/core/tree-sitter/parseFile.ts

[error] 68-68: Unexpected whitespace change detected. Please remove unnecessary blank lines.

🔇 Additional comments (1)
src/core/tree-sitter/parseFile.ts (1)

26-28: Add test coverage for edge cases.

Several code paths lack test coverage according to the static analysis:

  • Empty file handling
  • Unsupported language handling
  • Empty chunk handling
  • Error handling

Would you like me to help generate comprehensive test cases for these scenarios?

Also applies to: 33-34, 66-68, 78-79

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 27-28: src/core/tree-sitter/parseFile.ts#L27-L28
Added lines #L27 - L28 were not covered by tests

@SpyC0der77
Copy link
Contributor

How is this going?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/core/tree-sitter/parseFile.ts (1)

20-20: Address the TODO comment.

The TODO comment indicates that the config parameter is not used. Please ensure that the parameter is utilized or remove it if unnecessary.

Do you want me to help implement the configuration handling or open a new issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9265066 and 9d4f260.

📒 Files selected for processing (3)
  • README.md (4 hunks)
  • repomix.config.json (1 hunks)
  • src/core/tree-sitter/parseFile.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • repomix.config.json
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~417-~417: Loose punctuation mark.
Context: ...can increase token count. - --compress: Perform intelligent code extraction, fo...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
🔇 Additional comments (6)
src/core/tree-sitter/parseFile.ts (4)

6-14: Improve type safety and add cleanup method.

The singleton pattern implementation can be improved:

  1. Use undefined instead of null for better TypeScript idioms.
  2. Add a cleanup method for proper resource management.

Apply this diff to improve the implementation:

-let languageParserSingleton: LanguageParser | null = null;
+let languageParserSingleton: LanguageParser | undefined;

 const getLanguageParserSingleton = async () => {
   if (!languageParserSingleton) {
     languageParserSingleton = new LanguageParser();
     await languageParserSingleton.init();
   }
   return languageParserSingleton;
 };
+
+export const cleanupLanguageParser = async () => {
+  if (languageParserSingleton) {
+    await languageParserSingleton.cleanup();
+    languageParserSingleton = undefined;
+  }
+};

16-18: LGTM!

The function is simple, focused, and correctly implements string trimming.


21-23: Add input validation.

The function should validate input parameters.

Apply this diff to add input validation:

 export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
+  if (!fileContent) {
+    throw new Error('Invalid input: fileContent is required');
+  }
+  if (!filePath) {
+    throw new Error('Invalid input: filePath is required');
+  }
+  if (!config) {
+    throw new Error('Invalid input: config is required');
+  }
   const languageParser = await getLanguageParserSingleton();

38-81: Optimize memory usage and improve type safety.

Consider the following improvements:

  1. Use a generator to yield chunks instead of storing them in memory
  2. Add type guards for error handling
  3. Add input validation

Apply this diff to improve the function:

 export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) => {
   const languageParser = await getLanguageParserSingleton();

   // Split the file content into individual lines
   const lines = fileContent.split('\n');
   if (lines.length < 1) {
     return '';
   }

   const lang = languageParser.guessTheLang(filePath);
   if (lang === undefined) {
     // Language not supported
     return undefined;
   }

   const query = await languageParser.getQueryForLang(lang);
   const parser = await languageParser.getParserForLang(lang);
-  const processedChunks = new Set<string>();
-  const chunks = [];
+
+  async function* generateChunks() {
   try {
     // Parse the file content into an Abstract Syntax Tree (AST)
     const tree = parser.parse(fileContent);
     const captures = query.captures(tree.rootNode);
     captures.sort((a, b) => a.node.startPosition.row - b.node.startPosition.row);

     for (const capture of captures) {
       const { node, name } = capture;
       const startRow = node.startPosition.row;
       const endRow = node.endPosition.row;

       if (!name.includes('name') || !lines[startRow]) {
         continue;
       }

       const selectedLines = lines.slice(startRow, endRow + 1);
       if (selectedLines.length < 1) {
         continue;
       }
       const chunk = selectedLines.join('\n');
       const normalizedChunk = normalizeChunk(chunk);

-      if (!processedChunks.has(normalizedChunk)) {
-        processedChunks.add(normalizedChunk);
-        chunks.push(chunk);
-      }
+      yield chunk;
     }
-  } catch (error: unknown) {
-    logger.log(`Error parsing file: ${error}\n`);
+  } catch (error: unknown) {
+    const message = error instanceof Error ? error.message : String(error);
+    logger.error(`Error parsing file ${filePath}: ${message}`);
+    throw error;
   }
-  return chunks.join('\n');
+  }
+
+  const processedChunks = new Set<string>();
+  const chunks = [];
+  for await (const chunk of generateChunks()) {
+    const normalizedChunk = normalizeChunk(chunk);
+    if (!processedChunks.has(normalizedChunk)) {
+      processedChunks.add(normalizedChunk);
+      chunks.push(chunk);
+    }
+  }
+  return chunks.join('\n');
 };
README.md (2)

417-417: LGTM!

The new --compress option is well-documented with a clear description.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~417-~417: Loose punctuation mark.
Context: ...can increase token count. - --compress: Perform intelligent code extraction, fo...

(UNLIKELY_OPENING_PUNCTUATION)


515-549: LGTM!

The code compression section is well-documented with:

  • Clear explanation of the feature
  • Helpful example demonstrating the compression
  • Note about supported languages

@yamadashy
Copy link
Owner

yamadashy commented Feb 15, 2025

@huy-trn
I've added some commits and README adjustments. I've tested it with TypeScript and PHP, and it's working well!
I believe it's ready to be merged in its current state. If any issues arise, we can address them through issues after the release.

I plan to update the website documentation after this gets merged.

Would you be okay with merging this?
Thank you again for your contribution!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
README.md (2)

424-427: [CLI Option Description Enhancement]
The newly added --compress option is clearly introduced in the CLI options section with a concise description. However, the description would benefit from consistent punctuation. For example, adding a period at the end of the sentence can improve clarity.

- `--compress`: Perform intelligent code extraction, focusing on essential function and class signatures to reduce token count
+ `--compress`: Perform intelligent code extraction, focusing on essential function and class signatures to reduce token count.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~426-~426: Loose punctuation mark.
Context: ...can increase token count. - --compress: Perform intelligent code extraction, fo...

(UNLIKELY_OPENING_PUNCTUATION)


524-548: [Code Compression Section Clarity]
The new "### Code Compression" section provides a thorough explanation of how the --compress option works, including a practical example that demonstrates the transformation from full code to a compressed output. This section helps users understand the benefits and expected behavior of the compression feature. One suggestion is to consider adding a brief note about any limitations or scenarios where the compression might not capture all the necessary contextual details, which can further guide users.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4f260 and 169ea22.

📒 Files selected for processing (2)
  • README.md (5 hunks)
  • repomix.config.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • repomix.config.json
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~426-~426: Loose punctuation mark.
Context: ...can increase token count. - --compress: Perform intelligent code extraction, fo...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (1)
README.md (1)

597-603: [Configuration Example Consistency]
The example configuration now includes the "compress": false, property. This update aligns the configuration schema with the new CLI option, ensuring that users can control code compression via their config file. The default value of false is appropriate and clearly documented.

@yamadashy
Copy link
Owner

Hi @SpyC0der77! The project is going well!

@huy-trn
Copy link
Contributor Author

huy-trn commented Feb 15, 2025

Wow! That’s quite an effort, @yamadashy.

I agree that we can merge it at this stage and address any issues later, since it’s a new feature and won’t affect other parts of Repomix.

Sorry I couldn’t find time to finish the PR sooner or write cleaner code, but thanks to you, it’s all covered now!

Again, great work!

@yamadashy
Copy link
Owner

@huy-trn
I just helped finish what you started.
Thank you for starting this great feature! Your initial work made this much easier - without your contribution, this feature would have taken much longer to implement.

I'll merge and release it now!

@yamadashy yamadashy merged commit 1f610a5 into yamadashy:main Feb 16, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants