Skip to content

feat: add large file detection to pre-commit hook#1011

Merged
rjmurillo merged 1 commit intomainfrom
fix/large-file-detection
Mar 7, 2026
Merged

feat: add large file detection to pre-commit hook#1011
rjmurillo merged 1 commit intomainfrom
fix/large-file-detection

Conversation

@rjmurillo
Copy link
Copy Markdown
Owner

@rjmurillo rjmurillo commented Mar 7, 2026

Summary

Adds a Test-LargeFiles function to the pre-commit hook infrastructure that prevents committing oversized files.

What it does

  • File size check: Flags any staged file exceeding 500 KB (prevents accidental binary/data commits)
  • Line count check: Flags source files (.cs, .ps1, .sh, .yaml, .yml, .json, .xml, .md) exceeding 1,000 lines (encourages modular code)
  • Both thresholds are configurable via function parameters

Changes

File Change
.githooks/lib/LintHelpers.ps1 Added Test-LargeFiles function with configurable thresholds and PowerShell doc comments
.githooks/hooks/Invoke-PreCommit.ps1 Calls Test-LargeFiles as the first check in the pre-commit flow
CONTRIBUTING.md Documents the new hook check in the hooks table

Why

Closes the "Large File Detection" agent readiness gap. The check integrates naturally into the existing PowerShell-based git hook system and runs before formatting/linting so oversized files are caught early.

Testing

  • Verified the function loads and runs without errors via pwsh
  • Confirmed existing files (e.g., CONTRIBUTING.md at 963 lines, 50.9 KB) are below thresholds
  • Bypass available via git commit --no-verify for exceptional cases

Summary by CodeRabbit

  • Chores
    • Added large file detection to pre-commit hooks to prevent oversized files (500 KB limit) and source files with excessive line counts (1000-line limit) from being committed.
    • Updated contributing documentation to reflect new pre-commit validation.

Add Test-LargeFiles function to the shared LintHelpers library that
checks staged files against configurable thresholds:
- Maximum file size: 500 KB (prevents accidental binary/data commits)
- Maximum line count: 1000 lines for source files (encourages modular code)

The check runs first in the pre-commit hook, before formatting and
linting, so oversized files are caught early. Thresholds are
configurable via parameters for project-specific needs.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 7, 2026 00:49
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new pre-commit hook that enhances code quality and repository hygiene by automatically detecting and preventing the commit of excessively large files or source files exceeding a defined line count. This ensures that accidental inclusion of binaries or unmaintainable code structures is caught early in the development workflow, promoting cleaner and more modular codebases.

Highlights

  • Large File Detection: Implemented a new pre-commit hook to prevent committing oversized files, addressing the "Large File Detection" agent readiness gap.
  • File Size Check: Staged files are now checked against a configurable 500 KB size limit to prevent accidental binary or data commits.
  • Line Count Check: Source files (e.g., .cs, .ps1, .sh, .yaml, .yml, .json, .xml, .md) are checked for a configurable maximum of 1,000 lines to encourage modular code.
  • Configurable Thresholds: Both the maximum file size and line count thresholds are configurable via function parameters within the new Test-LargeFiles function.
  • Early Detection: The large file detection runs as the first check in the pre-commit flow, ensuring oversized files are caught early before other formatting or linting checks.
Changelog
  • .githooks/hooks/Invoke-PreCommit.ps1
    • Integrated the Test-LargeFiles function call as the initial check within the pre-commit hook execution flow.
  • .githooks/lib/LintHelpers.ps1
    • Added the Test-LargeFiles function, including configurable parameters for maximum file size and line count, along with PowerShell documentation comments.
  • CONTRIBUTING.md
    • Updated the hooks table to document the newly added large file detection pre-commit check, specifying its mode and default thresholds.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d63781e8-feec-481e-afb1-09d89be18633

📥 Commits

Reviewing files that changed from the base of the PR and between c61a66a and 6daf25c.

📒 Files selected for processing (3)
  • .githooks/hooks/Invoke-PreCommit.ps1
  • .githooks/lib/LintHelpers.ps1
  • CONTRIBUTING.md

📝 Walkthrough

Walkthrough

Introduces a large file detection pre-commit hook feature that scans staged files to enforce size limits (500 KB default) and line count constraints (1000 lines for source files), failing the commit if violations are detected.

Changes

Cohort / File(s) Summary
Large File Detection Hook Implementation
.githooks/hooks/Invoke-PreCommit.ps1, .githooks/lib/LintHelpers.ps1
Adds Test-LargeFiles function to validate staged files against configurable size and line-count thresholds, with invocation integrated into the pre-commit hook initialization.
Documentation
CONTRIBUTING.md
Documents the new large file detection hook with default parameters (500 KB, 1000 lines) in the pre-commit hooks reference table.

Sequence Diagram

sequenceDiagram
    participant PreCommit as Pre-commit Hook
    participant LintHelpers as Test-LargeFiles
    participant Git as Git
    participant FileSystem as File System
    participant Reporter as Hook Reporter

    PreCommit->>LintHelpers: Invoke Test-LargeFiles()
    LintHelpers->>Git: Get repository root
    LintHelpers->>Git: Get staged files list
    Git-->>LintHelpers: Staged files
    LintHelpers->>FileSystem: Query file properties
    FileSystem-->>LintHelpers: File size & content
    LintHelpers->>LintHelpers: Check size violations<br/>(MaxFileSizeKB)
    LintHelpers->>LintHelpers: Check line count violations<br/>(SourceExtensions)
    alt Violations Found
        LintHelpers->>Reporter: Print warnings
        LintHelpers->>Reporter: Mark hook failed
        Reporter-->>PreCommit: Hook Failed
    else No Violations
        LintHelpers-->>PreCommit: Hook Passed
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

feature, documentation, build

Suggested reviewers

  • MattKotsenas
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/large-file-detection

Comment @coderabbitai help to get the list of available commands and usage tips.

@deepsource-io
Copy link
Copy Markdown

deepsource-io bot commented Mar 7, 2026

DeepSource Code Review

We reviewed changes in c61a66a...6daf25c on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
C# Mar 7, 2026 12:49a.m. Review ↗

@coderabbitai coderabbitai bot requested a review from MattKotsenas March 7, 2026 00:50
@rjmurillo rjmurillo merged commit 6f6dcf9 into main Mar 7, 2026
38 of 40 checks passed
@rjmurillo rjmurillo deleted the fix/large-file-detection branch March 7, 2026 00:51
@rjmurillo rjmurillo added this to the vNext milestone Mar 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “large file detection” gate to the PowerShell-based git pre-commit hooks to prevent committing oversized files, and documents the new check for contributors.

Changes:

  • Introduces Test-LargeFiles in the hook helper library to enforce file size and line-count thresholds.
  • Runs Test-LargeFiles as the first step in the pre-commit hook flow.
  • Updates contributor documentation to list the new pre-commit check.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
.githooks/lib/LintHelpers.ps1 Adds Test-LargeFiles implementation for size/line-count enforcement.
.githooks/hooks/Invoke-PreCommit.ps1 Invokes Test-LargeFiles at the start of the pre-commit checks.
CONTRIBUTING.md Documents the new pre-commit large file detection check and thresholds.

Comment on lines +133 to +146
$fullPath = Join-Path $repoRoot $file
if (-not (Test-Path $fullPath)) {
continue
}

$fileInfo = Get-Item $fullPath
if ($fileInfo.Length -gt $maxBytes) {
$sizeKB = [math]::Round($fileInfo.Length / 1024, 1)
$violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)"
}

$ext = [System.IO.Path]::GetExtension($file)
if ($ext -and $SourceExtensions -contains $ext) {
$lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check reads file size/contents from the working tree (via Get-Item/Get-Content on $fullPath), not from the staged index. With partial staging it can miss oversized staged content (e.g., stage a 2MB file, then edit it down locally without re-staging; the hook will see the smaller working-tree file but the commit will still include the large staged blob). Consider computing size/line count from the index instead (e.g., git cat-file -s ":$file" for size and git show ":$file" for line counting).

Suggested change
$fullPath = Join-Path $repoRoot $file
if (-not (Test-Path $fullPath)) {
continue
}
$fileInfo = Get-Item $fullPath
if ($fileInfo.Length -gt $maxBytes) {
$sizeKB = [math]::Round($fileInfo.Length / 1024, 1)
$violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)"
}
$ext = [System.IO.Path]::GetExtension($file)
if ($ext -and $SourceExtensions -contains $ext) {
$lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines
# Get size of the staged blob from the index, not the working tree.
$sizeOutput = git cat-file -s ":$file" 2>$null
if ($LASTEXITCODE -ne 0 -or -not $sizeOutput) {
continue
}
$fileBytes = [int64]$sizeOutput
if ($fileBytes -gt $maxBytes) {
$sizeKB = [math]::Round($fileBytes / 1024, 1)
$violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)"
}
$ext = [System.IO.Path]::GetExtension($file)
if ($ext -and $SourceExtensions -contains $ext) {
# Count lines from the staged content in the index.
$stagedContent = git show ":$file" 2>$null
if ($LASTEXITCODE -ne 0) {
continue
}
$lineCount = ($stagedContent | Measure-Object -Line).Lines

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +128
if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) {
return
}

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If git diff --cached ... fails (non-zero $LASTEXITCODE), the function silently returns and the hook will allow the commit without running this check. This is inconsistent with Get-StagedFiles, which throws on git failures. Consider failing the hook (throw or Set-HookFailed + warning) when git commands fail, and only return early when there are genuinely no staged files.

Suggested change
if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) {
return
}
if ($LASTEXITCODE -ne 0) {
Write-Warning "git diff --cached failed with exit code $LASTEXITCODE. Large-file check cannot be completed."
Set-HookFailed -Check "large-file-detection"
return
}
if (-not $stagedFiles) {
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +148
$ext = [System.IO.Path]::GetExtension($file)
if ($ext -and $SourceExtensions -contains $ext) {
$lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines
if ($lineCount -gt $MaxLineCount) {
$violations += " $file ($lineCount lines exceeds $MaxLineCount line limit)"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line-count check reads and allocates the entire file content even if the size check already flagged the file, and Get-Content | Measure-Object -Line can be slow for large files. To keep the hook fast, consider skipping line counting once a file already exceeds the size threshold, and using a more streaming-friendly approach to count lines (avoid materializing all lines into memory).

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +108
Prevents accidentally committing oversized files or source files
that have grown beyond a maintainable line count. Binary and
generated files are excluded from the line-count check.
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment says "Binary and generated files are excluded from the line-count check", but the implementation only gates line counting by extension ($SourceExtensions). Either adjust the description to match the actual behavior, or add explicit exclusion logic for generated/binary cases (e.g., known generated suffixes/paths).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable pre-commit check to detect large files, which helps maintain repository health. The implementation is generally solid, but I've identified a couple of areas for improvement in the new PowerShell function. My feedback focuses on enhancing efficiency by using a better method for line counting and improving maintainability by reusing existing helper functions to reduce code duplication. These changes will make the new hook more robust and consistent with the existing codebase.

Comment on lines +123 to +136
$repoRoot = git rev-parse --show-toplevel
$stagedFiles = git diff --cached --name-only --diff-filter=d
if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) {
return
}

$maxBytes = $MaxFileSizeKB * 1024
$violations = @()

foreach ($file in $stagedFiles) {
$fullPath = Join-Path $repoRoot $file
if (-not (Test-Path $fullPath)) {
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be simplified by reusing existing functionality and context from the calling script. Specifically:

  1. Reuse Get-StagedFiles: Instead of duplicating git diff logic, use the existing Get-StagedFiles helper function to promote code reuse (DRY principle).
  2. Simplify Path Handling: The Invoke-PreCommit.ps1 script already sets the working directory to the repository root. This makes the git rev-parse call and Join-Path logic redundant. Relative paths can be used directly.

This refactoring improves maintainability and consistency with other hooks.

    $stagedFiles = Get-StagedFiles -DiffFilter 'd'
    if (-not $stagedFiles) {
        return
    }

    $maxBytes = $MaxFileSizeKB * 1024
    $violations = @()

    foreach ($file in $stagedFiles) {
        $fullPath = $file
        if (-not (Test-Path $fullPath)) {
            continue
        }


$ext = [System.IO.Path]::GetExtension($file)
if ($ext -and $SourceExtensions -contains $ext) {
$lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Get-Content loads the entire file into memory before counting lines, which can be inefficient for large text files. Using .NET's [System.IO.File]::ReadLines() provides a more memory-efficient, streaming approach to reading lines.

            $lineCount = ([System.IO.File]::ReadLines($fullPath) | Measure-Object).Count

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.

$stagedFiles = git diff --cached --name-only --diff-filter=d
if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) {
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchecked git rev-parse exit code before overwrite

Low Severity

$LASTEXITCODE from git rev-parse on line 123 is silently overwritten by git diff on line 124. The check on line 125 only validates the git diff result, so a git rev-parse failure goes undetected. If $repoRoot ends up $null, subsequent Join-Path calls will error or resolve against the wrong directory. The caller script correctly checks $LASTEXITCODE immediately after git rev-parse (line 4–5 of Invoke-PreCommit.ps1), but Test-LargeFiles doesn't follow this pattern for its own local $repoRoot.

Fix in Cursor Fix in Web

$stagedFiles = git diff --cached --name-only --diff-filter=d
if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) {
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git failure silently skips large file check

Medium Severity

When git diff --cached fails ($LASTEXITCODE -ne 0), Test-LargeFiles silently returns without warning. This is inconsistent with Get-StagedFiles, which throws an exception on the same failure, allowing the try-catch in Invoke-PreCommit.ps1 to report the error. Combining the error condition with the "no staged files" condition in one check means a git failure is indistinguishable from an empty staging area, so the large file safety check can be silently bypassed.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants