Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logScanner fails to parse pointer file containing unicode chars #5655

Merged
merged 4 commits into from
Mar 1, 2024

Conversation

jochenhz
Copy link
Contributor

@jochenhz jochenhz commented Feb 22, 2024

Git will quote paths using Unicode characters in git log, but the fileHeaderRegex does not account for optional quotes in the log.

This change fixes the regex to account for quotes due to Unicode characters and improves the tests.
Due to this bug, git lfs prune might accidentally prune recent, unpushed, or stashed files that contain unicode character if there is more than one changed file in the commit.

One might see a git log looking like this, for example (note the quotation marks).

Note: This is rather critical because if the file "Source Assets/Audio/Alltagsger\303\244usche - Kamera - Blitzlicht.mp3" would be unpushed or stashed, prune --force would delete the file as well

lfs-commit-sha: cde89bbb531b285170c0119c34e656141dca269a fba0486e7940971a64f87d8ef8e65fb1be44b2ad

diff --git a/Source Assets/Audio/01 - Am Flughafen - Ankunftsbereich.mp3 b/Source Assets/Audio/01 - Am Flughafen - Ankunftsbereich.mp3
new file mode 100644
index 0000000..8bf5d61
--- /dev/null
+++ b/Source Assets/Audio/01 - Am Flughafen - Ankunftsbereich.mp3       
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0b21b27eb6f510ef4da2ef3b35c3e394e42e735913fdd2c9bd3dbfde3be19b7b
+size 1535686
diff --git "a/Source Assets/Audio/Alltagsger\303\244usche - Kamera - Blitzlicht.mp3" "b/Source Assets/Audio/Alltagsger\303\244usche - Kamera - Blitzlicht.mp3"
new file mode 100755
index 0000000..63a83b8
--- /dev/null
+++ "b/Source Assets/Audio/Alltagsger\303\244usche - Kamera - Blitzlicht.mp3"   
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9b265c0416fe1332300bdb08e3d59326b92e948f9a3cab4d4401b11e22100d5b
+size 251836

Git will quote paths using unicode characters in git log but the fileHeaderRegex does not account for optional quotes in the log.

This change fixes the regex to account for quotes due to unicode characters and improves the tests.
@jochenhz jochenhz requested a review from a team as a code owner February 22, 2024 15:52
Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, but don't we need to handle the octal escapes as well? The example in the PR message shows a couple of octal escapes that we'd probably want to handle in setFilename.

@jochenhz
Copy link
Contributor Author

This seems fine, but don't we need to handle the octal escapes as well? The example in the PR message shows a couple of octal escapes that we'd probably want to handle in setFilename.

They are already matched by the current regex, see

@bk2204
Copy link
Member

bk2204 commented Feb 22, 2024

They are already matched by the current regex, see

Right, but setFilename checks to see if the file is matched by the filter, and the escaped filename wouldn't necessarily match the filter. If the filter is *ç*.bin and the filename is républiquefrançaise.bin, then the encoded name would be r\303\251publiquefran\303\247aise.bin, and that wouldn't match.

@jochenhz
Copy link
Contributor Author

They are already matched by the current regex, see

Right, but setFilename checks to see if the file is matched by the filter, and the escaped filename wouldn't necessarily match the filter. If the filter is *ç*.bin and the filename is républiquefrançaise.bin, then the encoded name would be r\303\251publiquefran\303\247aise.bin, and that wouldn't match.

Ah, good catch!
I have pushed a change that converts the incoming name to utf-8 and added some tests for it. Not sure if this is the most elegant solution in golang though.

lfs/gitscanner_log.go Show resolved Hide resolved
@bk2204 bk2204 merged commit 4467605 into git-lfs:main Mar 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants