feat: add large file detection to pre-commit hook#1011
Conversation
Add Test-LargeFiles function to the shared LintHelpers library that checks staged files against configurable thresholds: - Maximum file size: 500 KB (prevents accidental binary/data commits) - Maximum line count: 1000 lines for source files (encourages modular code) The check runs first in the pre-commit hook, before formatting and linting, so oversized files are caught early. Thresholds are configurable via parameters for project-specific needs. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new pre-commit hook that enhances code quality and repository hygiene by automatically detecting and preventing the commit of excessively large files or source files exceeding a defined line count. This ensures that accidental inclusion of binaries or unmaintainable code structures is caught early in the development workflow, promoting cleaner and more modular codebases. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughIntroduces a large file detection pre-commit hook feature that scans staged files to enforce size limits (500 KB default) and line count constraints (1000 lines for source files), failing the commit if violations are detected. Changes
Sequence DiagramsequenceDiagram
participant PreCommit as Pre-commit Hook
participant LintHelpers as Test-LargeFiles
participant Git as Git
participant FileSystem as File System
participant Reporter as Hook Reporter
PreCommit->>LintHelpers: Invoke Test-LargeFiles()
LintHelpers->>Git: Get repository root
LintHelpers->>Git: Get staged files list
Git-->>LintHelpers: Staged files
LintHelpers->>FileSystem: Query file properties
FileSystem-->>LintHelpers: File size & content
LintHelpers->>LintHelpers: Check size violations<br/>(MaxFileSizeKB)
LintHelpers->>LintHelpers: Check line count violations<br/>(SourceExtensions)
alt Violations Found
LintHelpers->>Reporter: Print warnings
LintHelpers->>Reporter: Mark hook failed
Reporter-->>PreCommit: Hook Failed
else No Violations
LintHelpers-->>PreCommit: Hook Passed
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| C# | Mar 7, 2026 12:49a.m. | Review ↗ |
There was a problem hiding this comment.
Pull request overview
Adds a new “large file detection” gate to the PowerShell-based git pre-commit hooks to prevent committing oversized files, and documents the new check for contributors.
Changes:
- Introduces
Test-LargeFilesin the hook helper library to enforce file size and line-count thresholds. - Runs
Test-LargeFilesas the first step in the pre-commit hook flow. - Updates contributor documentation to list the new pre-commit check.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.githooks/lib/LintHelpers.ps1 |
Adds Test-LargeFiles implementation for size/line-count enforcement. |
.githooks/hooks/Invoke-PreCommit.ps1 |
Invokes Test-LargeFiles at the start of the pre-commit checks. |
CONTRIBUTING.md |
Documents the new pre-commit large file detection check and thresholds. |
| $fullPath = Join-Path $repoRoot $file | ||
| if (-not (Test-Path $fullPath)) { | ||
| continue | ||
| } | ||
|
|
||
| $fileInfo = Get-Item $fullPath | ||
| if ($fileInfo.Length -gt $maxBytes) { | ||
| $sizeKB = [math]::Round($fileInfo.Length / 1024, 1) | ||
| $violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)" | ||
| } | ||
|
|
||
| $ext = [System.IO.Path]::GetExtension($file) | ||
| if ($ext -and $SourceExtensions -contains $ext) { | ||
| $lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines |
There was a problem hiding this comment.
This check reads file size/contents from the working tree (via Get-Item/Get-Content on $fullPath), not from the staged index. With partial staging it can miss oversized staged content (e.g., stage a 2MB file, then edit it down locally without re-staging; the hook will see the smaller working-tree file but the commit will still include the large staged blob). Consider computing size/line count from the index instead (e.g., git cat-file -s ":$file" for size and git show ":$file" for line counting).
| $fullPath = Join-Path $repoRoot $file | |
| if (-not (Test-Path $fullPath)) { | |
| continue | |
| } | |
| $fileInfo = Get-Item $fullPath | |
| if ($fileInfo.Length -gt $maxBytes) { | |
| $sizeKB = [math]::Round($fileInfo.Length / 1024, 1) | |
| $violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)" | |
| } | |
| $ext = [System.IO.Path]::GetExtension($file) | |
| if ($ext -and $SourceExtensions -contains $ext) { | |
| $lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines | |
| # Get size of the staged blob from the index, not the working tree. | |
| $sizeOutput = git cat-file -s ":$file" 2>$null | |
| if ($LASTEXITCODE -ne 0 -or -not $sizeOutput) { | |
| continue | |
| } | |
| $fileBytes = [int64]$sizeOutput | |
| if ($fileBytes -gt $maxBytes) { | |
| $sizeKB = [math]::Round($fileBytes / 1024, 1) | |
| $violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)" | |
| } | |
| $ext = [System.IO.Path]::GetExtension($file) | |
| if ($ext -and $SourceExtensions -contains $ext) { | |
| # Count lines from the staged content in the index. | |
| $stagedContent = git show ":$file" 2>$null | |
| if ($LASTEXITCODE -ne 0) { | |
| continue | |
| } | |
| $lineCount = ($stagedContent | Measure-Object -Line).Lines |
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
If git diff --cached ... fails (non-zero $LASTEXITCODE), the function silently returns and the hook will allow the commit without running this check. This is inconsistent with Get-StagedFiles, which throws on git failures. Consider failing the hook (throw or Set-HookFailed + warning) when git commands fail, and only return early when there are genuinely no staged files.
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | |
| return | |
| } | |
| if ($LASTEXITCODE -ne 0) { | |
| Write-Warning "git diff --cached failed with exit code $LASTEXITCODE. Large-file check cannot be completed." | |
| Set-HookFailed -Check "large-file-detection" | |
| return | |
| } | |
| if (-not $stagedFiles) { | |
| return | |
| } |
| $ext = [System.IO.Path]::GetExtension($file) | ||
| if ($ext -and $SourceExtensions -contains $ext) { | ||
| $lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines | ||
| if ($lineCount -gt $MaxLineCount) { | ||
| $violations += " $file ($lineCount lines exceeds $MaxLineCount line limit)" |
There was a problem hiding this comment.
The line-count check reads and allocates the entire file content even if the size check already flagged the file, and Get-Content | Measure-Object -Line can be slow for large files. To keep the hook fast, consider skipping line counting once a file already exceeds the size threshold, and using a more streaming-friendly approach to count lines (avoid materializing all lines into memory).
| Prevents accidentally committing oversized files or source files | ||
| that have grown beyond a maintainable line count. Binary and | ||
| generated files are excluded from the line-count check. |
There was a problem hiding this comment.
Doc comment says "Binary and generated files are excluded from the line-count check", but the implementation only gates line counting by extension ($SourceExtensions). Either adjust the description to match the actual behavior, or add explicit exclusion logic for generated/binary cases (e.g., known generated suffixes/paths).
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable pre-commit check to detect large files, which helps maintain repository health. The implementation is generally solid, but I've identified a couple of areas for improvement in the new PowerShell function. My feedback focuses on enhancing efficiency by using a better method for line counting and improving maintainability by reusing existing helper functions to reduce code duplication. These changes will make the new hook more robust and consistent with the existing codebase.
| $repoRoot = git rev-parse --show-toplevel | ||
| $stagedFiles = git diff --cached --name-only --diff-filter=d | ||
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | ||
| return | ||
| } | ||
|
|
||
| $maxBytes = $MaxFileSizeKB * 1024 | ||
| $violations = @() | ||
|
|
||
| foreach ($file in $stagedFiles) { | ||
| $fullPath = Join-Path $repoRoot $file | ||
| if (-not (Test-Path $fullPath)) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
This function can be simplified by reusing existing functionality and context from the calling script. Specifically:
- Reuse
Get-StagedFiles: Instead of duplicatinggit difflogic, use the existingGet-StagedFileshelper function to promote code reuse (DRY principle). - Simplify Path Handling: The
Invoke-PreCommit.ps1script already sets the working directory to the repository root. This makes thegit rev-parsecall andJoin-Pathlogic redundant. Relative paths can be used directly.
This refactoring improves maintainability and consistency with other hooks.
$stagedFiles = Get-StagedFiles -DiffFilter 'd'
if (-not $stagedFiles) {
return
}
$maxBytes = $MaxFileSizeKB * 1024
$violations = @()
foreach ($file in $stagedFiles) {
$fullPath = $file
if (-not (Test-Path $fullPath)) {
continue
}
|
|
||
| $ext = [System.IO.Path]::GetExtension($file) | ||
| if ($ext -and $SourceExtensions -contains $ext) { | ||
| $lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines |
There was a problem hiding this comment.
Get-Content loads the entire file into memory before counting lines, which can be inefficient for large text files. Using .NET's [System.IO.File]::ReadLines() provides a more memory-efficient, streaming approach to reading lines.
$lineCount = ([System.IO.File]::ReadLines($fullPath) | Measure-Object).Count
| $stagedFiles = git diff --cached --name-only --diff-filter=d | ||
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | ||
| return | ||
| } |
There was a problem hiding this comment.
Unchecked git rev-parse exit code before overwrite
Low Severity
$LASTEXITCODE from git rev-parse on line 123 is silently overwritten by git diff on line 124. The check on line 125 only validates the git diff result, so a git rev-parse failure goes undetected. If $repoRoot ends up $null, subsequent Join-Path calls will error or resolve against the wrong directory. The caller script correctly checks $LASTEXITCODE immediately after git rev-parse (line 4–5 of Invoke-PreCommit.ps1), but Test-LargeFiles doesn't follow this pattern for its own local $repoRoot.
| $stagedFiles = git diff --cached --name-only --diff-filter=d | ||
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | ||
| return | ||
| } |
There was a problem hiding this comment.
Git failure silently skips large file check
Medium Severity
When git diff --cached fails ($LASTEXITCODE -ne 0), Test-LargeFiles silently returns without warning. This is inconsistent with Get-StagedFiles, which throws an exception on the same failure, allowing the try-catch in Invoke-PreCommit.ps1 to report the error. Combining the error condition with the "no staged files" condition in one check means a git failure is indistinguishable from an empty staging area, so the large file safety check can be silently bypassed.


Summary
Adds a
Test-LargeFilesfunction to the pre-commit hook infrastructure that prevents committing oversized files.What it does
Changes
.githooks/lib/LintHelpers.ps1Test-LargeFilesfunction with configurable thresholds and PowerShell doc comments.githooks/hooks/Invoke-PreCommit.ps1Test-LargeFilesas the first check in the pre-commit flowCONTRIBUTING.mdWhy
Closes the "Large File Detection" agent readiness gap. The check integrates naturally into the existing PowerShell-based git hook system and runs before formatting/linting so oversized files are caught early.
Testing
pwshgit commit --no-verifyfor exceptional casesSummary by CodeRabbit