-
Notifications
You must be signed in to change notification settings - Fork 4
feat: add large file detection to pre-commit hook #1011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -97,3 +97,62 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Write-Host "FAIL: $Check" -ForegroundColor Red | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| $script:HookExitCode = 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| function Test-LargeFiles { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| <# | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .SYNOPSIS | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Checks staged files against size and line-count thresholds. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .DESCRIPTION | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Prevents accidentally committing oversized files or source files | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| that have grown beyond a maintainable line count. Binary and | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| generated files are excluded from the line-count check. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .PARAMETER MaxFileSizeKB | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Maximum file size in kilobytes. Default 500 KB. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .PARAMETER MaxLineCount | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Maximum number of lines for source files. Default 1000 lines. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .PARAMETER SourceExtensions | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| File extensions treated as source code for line-count checks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [CmdletBinding()] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| param( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [int]$MaxFileSizeKB = 500, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [int]$MaxLineCount = 1000, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [string[]]$SourceExtensions = @('.cs', '.ps1', '.sh', '.yaml', '.yml', '.json', '.xml', '.md') | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| $repoRoot = git rev-parse --show-toplevel | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| $stagedFiles = git diff --cached --name-only --diff-filter=d | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unchecked
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+125
to
+128
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if ($LASTEXITCODE -ne 0 -or -not $stagedFiles) { | |
| return | |
| } | |
| if ($LASTEXITCODE -ne 0) { | |
| Write-Warning "git diff --cached failed with exit code $LASTEXITCODE. Large-file check cannot be completed." | |
| Set-HookFailed -Check "large-file-detection" | |
| return | |
| } | |
| if (-not $stagedFiles) { | |
| return | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can be simplified by reusing existing functionality and context from the calling script. Specifically:
- Reuse
Get-StagedFiles: Instead of duplicatinggit difflogic, use the existingGet-StagedFileshelper function to promote code reuse (DRY principle). - Simplify Path Handling: The
Invoke-PreCommit.ps1script already sets the working directory to the repository root. This makes thegit rev-parsecall andJoin-Pathlogic redundant. Relative paths can be used directly.
This refactoring improves maintainability and consistency with other hooks.
$stagedFiles = Get-StagedFiles -DiffFilter 'd'
if (-not $stagedFiles) {
return
}
$maxBytes = $MaxFileSizeKB * 1024
$violations = @()
foreach ($file in $stagedFiles) {
$fullPath = $file
if (-not (Test-Path $fullPath)) {
continue
}
Copilot
AI
Mar 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check reads file size/contents from the working tree (via Get-Item/Get-Content on $fullPath), not from the staged index. With partial staging it can miss oversized staged content (e.g., stage a 2MB file, then edit it down locally without re-staging; the hook will see the smaller working-tree file but the commit will still include the large staged blob). Consider computing size/line count from the index instead (e.g., git cat-file -s ":$file" for size and git show ":$file" for line counting).
| $fullPath = Join-Path $repoRoot $file | |
| if (-not (Test-Path $fullPath)) { | |
| continue | |
| } | |
| $fileInfo = Get-Item $fullPath | |
| if ($fileInfo.Length -gt $maxBytes) { | |
| $sizeKB = [math]::Round($fileInfo.Length / 1024, 1) | |
| $violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)" | |
| } | |
| $ext = [System.IO.Path]::GetExtension($file) | |
| if ($ext -and $SourceExtensions -contains $ext) { | |
| $lineCount = (Get-Content $fullPath | Measure-Object -Line).Lines | |
| # Get size of the staged blob from the index, not the working tree. | |
| $sizeOutput = git cat-file -s ":$file" 2>$null | |
| if ($LASTEXITCODE -ne 0 -or -not $sizeOutput) { | |
| continue | |
| } | |
| $fileBytes = [int64]$sizeOutput | |
| if ($fileBytes -gt $maxBytes) { | |
| $sizeKB = [math]::Round($fileBytes / 1024, 1) | |
| $violations += " $file ($($sizeKB) KB exceeds $($MaxFileSizeKB) KB limit)" | |
| } | |
| $ext = [System.IO.Path]::GetExtension($file) | |
| if ($ext -and $SourceExtensions -contains $ext) { | |
| # Count lines from the staged content in the index. | |
| $stagedContent = git show ":$file" 2>$null | |
| if ($LASTEXITCODE -ne 0) { | |
| continue | |
| } | |
| $lineCount = ($stagedContent | Measure-Object -Line).Lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get-Content loads the entire file into memory before counting lines, which can be inefficient for large text files. Using .NET's [System.IO.File]::ReadLines() provides a more memory-efficient, streaming approach to reading lines.
$lineCount = ([System.IO.File]::ReadLines($fullPath) | Measure-Object).Count
Copilot
AI
Mar 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line-count check reads and allocates the entire file content even if the size check already flagged the file, and Get-Content | Measure-Object -Line can be slow for large files. To keep the hook fast, consider skipping line counting once a file already exceeds the size threshold, and using a more streaming-friendly approach to count lines (avoid materializing all lines into memory).
Check warning on line 154 in .githooks/lib/LintHelpers.ps1
Codacy Production / Codacy Static Code Analysis
.githooks/lib/LintHelpers.ps1#L154
File 'LintHelpers.ps1' uses Write-Host. Avoid using Write-Host because it might not work in all hosts, does not work when there is no host, and (prior to PS 5.0) cannot be suppressed, captured, or redirected. Instead, use Write-Output, Write-Verbose, or Write-Information.
Check warning on line 155 in .githooks/lib/LintHelpers.ps1
Codacy Production / Codacy Static Code Analysis
.githooks/lib/LintHelpers.ps1#L155
File 'LintHelpers.ps1' uses Write-Host. Avoid using Write-Host because it might not work in all hosts, does not work when there is no host, and (prior to PS 5.0) cannot be suppressed, captured, or redirected. Instead, use Write-Output, Write-Verbose, or Write-Information.


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment says "Binary and generated files are excluded from the line-count check", but the implementation only gates line counting by extension (
$SourceExtensions). Either adjust the description to match the actual behavior, or add explicit exclusion logic for generated/binary cases (e.g., known generated suffixes/paths).