Skip to content

πŸ“„ A command-line tool that creates a human-readable snapshot of your codebase

License

Notifications You must be signed in to change notification settings

DeX-Group-LLC/repo-serializer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

96 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

repo-serializer

License: MIT Tests Status Coverage Status Node Version NPM Version Dependencies Install Size

πŸ“„ A command-line tool that creates a human-readable snapshot of your codebase. It generates two files:

  • A tree view of your repository structure
  • A single, well-formatted file containing all text-based source code

Perfect for code reviews, documentation, archiving, or sharing code snippets without sending the entire repository.

Table of Contents

Features

  • Respects .gitignore files (including nested ones)
  • Intelligently detects text files with UTF-8 encoding
  • Configurable text file detection sensitivity
  • Optional handling of replacement characters
  • Excludes binary files automatically
  • Customizable ignore patterns
  • Pretty-printed directory structure
  • Clear file content separation
  • Supports nested directories
  • Handles large repositories efficiently

Prerequisites

  • Node.js 18.x or higher
  • npm 9.x or higher
  • Git (optional, for respecting .gitignore patterns)

Installation

npm install -g repo-serializer

Usage

Command Line

# Basic usage (current directory)
repo-serialize

# Specify a different directory
repo-serialize -d /path/to/repo

# Specify output directory
repo-serialize -o /path/to/output

# Custom output filenames
repo-serialize -s structure.txt -c content.txt

# Add additional ignore patterns
repo-serialize -i "*.log" "temp/" "*.tmp"

# Full CLI Options
repo-serialize [options]

Options:
  # Input/Output Options
  -d, --dir <directory>           Target directory to serialize (default: current working directory)
  -o, --output <directory>        Output directory for generated files (default: current working directory)
  -s, --structure-file <filename> Name of the structure output file (default: repo_structure.txt)
  -c, --content-file <filename>   Name of the content output file (default: repo_content.txt)

  # Processing Options
  -m, --max-file-size <size>      Maximum file size to process (512B-4MB). Accepts units: B, KB, MB
                                  Examples: "512B", "1KB", "4MB" (default: 8KB)
  -a, --all                       Disable default ignore patterns (default: false)
  -g, --no-gitignore              Disable .gitignore processing (enabled by default)
  -i, --ignore <patterns...>      Additional patterns to ignore
  --hierarchical                  Use hierarchical (alphabetical) ordering for content file (default: false)
  -r, --max-replacement-ratio <ratio> Maximum ratio of replacement characters allowed (0-1, default: 0)
  --keep-replacement-chars        Keep replacement characters in output (default: false)

  # Behavior Options
  -f, --force                     Overwrite existing files without prompting (default: false)
  --silent                        Suppress all console output (default: false)
  --verbose                       Enable verbose logging of all processed and ignored files (default: false)
                                 Note: Cannot be used with --silent

  # Information Options
  -V, --version                   Display the version number
  -h, --help                      Display help information

Examples:
  # Basic input/output usage
  repo-serialize -d ./my-project -o ./output

  # Use hierarchical content ordering
  repo-serialize --hierarchical

  # Disable default ignore patterns
  repo-serialize -a

  # Processing configuration with file size in KB
  repo-serialize -m 1KB -a --no-gitignore -i "*.log" "temp/"

  # Text file detection configuration
  repo-serialize --max-replacement-ratio 0.1 --keep-replacement-chars

  # Force overwrite without prompting
  repo-serialize -f

  # Run quietly (suppress console output)
  repo-serialize -q

  # Complete example with all option types
  repo-serialize --dir ./project \
                 --output ./analysis \
                 --structure-file tree.txt \
                 --content-file source.txt \
                 --max-file-size 1MB \
                 --max-replacement-ratio 0.1 \
                 --keep-replacement-chars \
                 --all \
                 --no-gitignore \
                 --ignore "*.log" "temp/" \
                 --force \
                 --quiet

  # LLM-optimized snapshot (using 4MB limit)
  repo-serialize -m 4MB -a -o ./llm-analysis

Programmatic Usage

const { serializeRepo } = require('repo-serializer');

// Basic usage with default options
await serializeRepo({
    repoRoot: '/path/to/repo',
    outputDir: '/path/to/output'
});


// Advanced usage with all options
await serializeRepo({
    // Input/Output options
    repoRoot: '/path/to/repo',           // Directory to serialize
    outputDir: '/path/to/output',        // Output directory
    structureFile: 'structure.txt',      // Custom structure filename
    contentFile: 'content.txt',          // Custom content filename

    // Processing options
    maxFileSize: 8192,                   // Max file size in bytes (512B-4MB)
    ignoreDefaultPatterns: false,        // Set to true to disable default ignores
    noGitignore: false,                  // Set to true to disable .gitignore processing
    additionalIgnorePatterns: ['*.log'], // Additional patterns to ignore
    hierarchicalContent: false,          // Set to true to use hierarchical (alphabetical) content ordering
    maxReplacementRatio: 0,           // 0 means no replacement characters allowed
    keepReplacementChars: false,      // false means strip replacement characters

    // Behavior options
    force: false,                        // Overwrite without prompting
    silent: false,                       // Suppress all console output
    verbose: false                       // Enable verbose logging (cannot be used with silent)
});

Output Format

Structure File

Shows the repository's file and folder structure in a tree format:

repo/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ index.js
β”‚   └── utils/
β”‚       └── helper.js
└── package.json

Content File

Contains the contents of all text files, clearly marked with separators:

============================================================
START OF FILE: src/index.js
------------------------------------------------------------
[file contents here]
------------------------------------------------------------
END OF FILE: src/index.js
============================================================

Common Use Cases

Code Review Preparation

# Generate a snapshot before submitting a PR
repo-serialize -o code-review/

Documentation Generation

# Create a snapshot of your project's current state
repo-serialize --structure-file docs/structure.txt --content-file docs/full-source.txt

Project Archiving

# Archive a specific version of your codebase
repo-serialize -d ./project-v1.0 -o ./archives/v1.0

LLM Code Analysis

# Generate files optimized for LLM processing
repo-serialize --max-file-size 5242880 --include-hidden

Default Ignored Patterns

Always Ignored

These patterns are always ignored and cannot be overridden:

  • .git/

Default Ignored

These patterns are ignored by default but can be included using the -a, --all flag:

  • Hidden files (.*)
  • Hidden directories (.*/)
  • package-lock.json

Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make your changes
  4. Run tests (npm test)
  5. Commit your changes (git commit -am 'Add new feature')
  6. Push to the branch (git push origin feature/improvement)
  7. Create a Pull Request

Please make sure to update tests as appropriate and follow the existing coding style.

Issues

If you encounter any problems or have suggestions, please open an issue on GitHub. Include as much information as possible:

  • Your operating system
  • Node.js version
  • repo-serializer version
  • Steps to reproduce the issue
  • Expected vs actual behavior

License

MIT

Text File Detection

The tool uses UTF-8 encoding detection with configurable sensitivity to identify text files:

Replacement Character Handling

When processing files, the tool may encounter invalid UTF-8 sequences which are replaced with the Unicode replacement character (U+FFFD). You can control this behavior with two options:

  1. --max-replacement-ratio <ratio> (default: 0)

    • 0: Strictest setting, rejects files with any replacement characters
    • 0.1: Allows up to 10% replacement characters
    • 1: Most lenient, accepts any amount of replacement characters
  2. --keep-replacement-chars (default: false)

    • When false: Strips replacement characters from output
    • When true: Keeps replacement characters in output

Examples

# Strict text detection (default)
repo-serialize

# Allow up to 10% invalid characters but strip them from output
repo-serialize --max-replacement-ratio 0.1

# Allow up to 25% invalid characters and keep them in output
repo-serialize --max-replacement-ratio 0.25 --keep-replacement-chars

# Most lenient: accept any text-like file and keep all characters
repo-serialize --max-replacement-ratio 1 --keep-replacement-chars

About

πŸ“„ A command-line tool that creates a human-readable snapshot of your codebase

Resources

License

Stars

Watchers

Forks

Packages

No packages published