Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git show result misses newline at end of diff #1377

Closed
markhh80 opened this issue Nov 12, 2021 · 5 comments · Fixed by #1423
Closed

git show result misses newline at end of diff #1377

markhh80 opened this issue Nov 12, 2021 · 5 comments · Fixed by #1423

Comments

@markhh80
Copy link

The result of git.Git.show (...) misses newlines ("\n") at the end of a file.

Reproduction

  1. setup git repo / use existing
  2. add empty textfile "foo.txt", commit file
  3. modify file by inserting string "foo\r\n", commit changes
  4. run git cmd "show [commit sha]:foo.txt"
  5. ... via GitPython, example:
# commits = ...
repo.git.show(f"{commits[0].hexsha}:{testfile}")
  1. ... from Git terminal, pipe through cat etc. to see control characters, example:
git show 74e70e7:foo.txt | cat -vet | less

Observation

The string returned from show via GitPython has a trailing "\r", missing the "\n". The result from terminal execution has a trailing "\r\n".

Expected

Both outputs have the correct trailing "\r\n".

Addendum

  • run GitPython cmd with option "with_extended_output" -> same result

  • run GitPython cmd with option "stdout_as_string" -> same result (but in binary format)

  • crosscheck: run "git show" via subprocess.popen -> output has correct "\r\n"

    # commits = ...
    p = subprocess.Popen(["git", "show", f"{commits[0].hexsha}:{testfile}"], stdout=subprocess.PIPE)
    out, err = p.communicate()
    print(repr(out))
    
@Byron
Copy link
Member

Byron commented Nov 13, 2021

Thanks a lot for the wonderful description of the issue and for spending the time that undoubtedly when into it.

It's likely that GitPython strips git output somewhere in its command execution machinery, and I wonder why it was added in the first place.

@toku-sa-n
Copy link
Contributor

Hello, I've also encountered this problem (see my SO question and the answer). I found the line where GitPython strips the last \n.

GitPython/git/cmd.py

Lines 946 to 948 in 0b33576

# strip trailing "\n"
if stdout_value.endswith(newline): # type: ignore
stdout_value = stdout_value[:-1]

The test below fails for the current main branch, and succeeds after removing [:-1].

    @with_rw_directory
    def test_git_show_last_newline(self, rw_dir):
        r = Repo.init(rw_dir)
        fp = osp.join(rw_dir, 'hello.txt')
        with open(fp, 'w') as fs:
            fs.write("hello\n")
        r.git.add(Git.polish_url(fp))
        r.git.commit(message="init")
        self.assertEqual(r.git.show("HEAD:hello.txt"), 'hello\n')

The last \n is unnecessary for most operations, and this process is reasonable. However, this is problematic for git show, especially for the binary files.
I think there are three options:

  • Do not remove the last \n at any time.
  • Do not remove the last \n for the git show output.
  • Do not change the code, and instead add some notes to the documentation

I'd like to send a PR to resolve this issue, so could you tell me your intention toward this?

@Byron
Copy link
Member

Byron commented Apr 5, 2022

Thanks a lot for contributing a test and additional analysis, @toku-sa-n, I'd love to see this culminate in a PR.

Here are my thoughts:

Do not remove the last \n at any time.

This would be a potentially breaking change even though it would certainly be the correct way to do it.

Do not remove the last \n for the git show output.

This would require a parameter to be added to execute() to control whether or not the newline should be stripped. This is backwards compatible if the default is True and allows callers to avoid the issue after running into it.

Providing a means to alleviate this seems like the minimum action to perform here.

If it's never desirable to strip newlines for binary output it might be possible to detect if something is binary by looking at the first 1024 bytes or so of the output and avoid stripping newlines then. There is probably a package for that, similar to the file program in unix. Maybe this could be investigated and further discussed in the PR. I'd be particularly interested in the performance impact of such doing so.

Thanks everyone.

@SonOfLilit
Copy link

Adding a flag and keeping the default as False even for show, which is what ended up happening in this thread, is not really a solution. People will keep hitting this issue, debugging for a few hours, and some of the times not figuring out there is a flag to address this and doing some horrible workaround instead.

Will you accept a PR to toggle the default for show (and only for show because it's the only place it matters)?

Will you accept a PR to fire a deprecation warning if it's False on show (saying the default is about to change), and then in say a year another PR to toggle the default?

I'm speaking fro personal pain here:

I just encountered a vendor-provided codebase that failed on some edge case in git show, and examining the issue I noticed they used GitPython for everything except git show and a separate library for that, so I reverted to GitPython and after an additional half an hour of debuggning figured out why the file hashes sometimes came out different and googled "gitpython show trailing newlines". Evidently the person who wrote that codebase didn't think to google this, or perhaps they encountered it before this bug was reported and "fixed".

@Byron
Copy link
Member

Byron commented Jul 25, 2024

Please feel free to submit a PR, ideally with an accompanying test to show what it's doing. Breaking changes can't be done though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants