Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print filenames for objects in status output with '--filenames' option #44

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jmurty
Copy link
Contributor

@jmurty jmurty commented Jun 26, 2014

The '--filenames' option makes it easy to view the filename represented
by a git-fat object reference, at the cost of a slight performance
and memory hit compared to the plain git-fat status command.

This option is most helpful when you are thinking about running
git-fat gc to clean up some garbage/unreferenced objects, so you
can check what you are about to delete.

  • Add referenced_objects_with_filenames() method that (optionally)
    stores file name data while looking up git-fat referenced objects.
  • Refactor referenced_objects() method to use the above method while
    providing existing interface.
  • If '--filenames' option is given to the status command, print
    filename(s) next to git-fat object hash values.

The '--filenames' option makes it easy to view the filename represented
by a git-fat object reference, at the cost of a slight performance
and memory hit compared to the plain `git-fat status` command.

This option is most helpful when you are thinking about running
`git-fat gc` to clean up some garbage/unreferenced objects, so you
can check what you are about to delete.

* Add referenced_objects_with_filenames() method that (optionally)
  stores file name data while looking up git-fat referenced objects.
* Refactor referenced_objects() method to use the above method while
  providing existing interface.
* If '--filenames' option is given to the `status` command, print
  filename(s) next to git-fat object hash values.
@abraithwaite
Copy link

Hey @jmurty, I did something similar to this in our fork with git fat list.

Your solution of storing hash->filename strings in a dict is the one I tried first too, but you quickly run out of memory for medium to large sized repositories. The way we implemented it is running through rev-list twice which isn't ideal, but better than nothing at all.

Throw away mappings of git hash value to filename(s) for objects that
are not relevant to git-fat. Since we can do this clean-up during
processing, this change should minimise the memory cost of using the
--filenames option since uninteresting filenames are no longer stored.
@jmurty
Copy link
Contributor Author

jmurty commented Jun 29, 2014

Thanks for the feedback @abraithwaite.

I have just added a small improvement to the proposed feature to clear out uninteresting filenames during processing, which should massively reduce the memory consumption. Or at least use no more memory than is really necessary to store all the filenames of interest.

Are you able to test this improvement against a medium- or large-size repository to see if it survives?

@@ -298,7 +304,11 @@ class GitFat(object):
p1 = subprocess.Popen(['git','rev-list','--objects',rev], stdout=subprocess.PIPE)
def cut_sha1hash(input, output):
for line in input:
output.write(line.split()[0] + '\n')
splits = line.split()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried this yet, but it looks like this part will fail on files with spaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants