Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGA downloading a higher number of siva files than expected #85

Open
gomesfernanda opened this issue Oct 8, 2018 · 2 comments
Open
Labels
empathy-sessions Issue filed as as part of empathy sessions

Comments

@gomesfernanda
Copy link
Contributor

I want to download all siva files with "Jupyter Notebook" on PGA.

To know how many they are, I ran:
$ pga list --lang "Jupyter Notebook" -f csv

After examining the csv file, I knew that there were 2,606 repos and 3,767 siva files corresponding to them.

To download the siva files, I ran
$ pga get --lang "Jupyter Notebook" -v

And the response that I got was:

DEBU[0004] local copy is outdated or non existent
1 / 6349 [>----------------------------------------------------------]   0.02% 40m59s

Meaning that it was downloading 6,349 files, and I have no idea why. If somebody can help me with this.

@gomesfernanda
Copy link
Contributor Author

I was investigating this today and found out that there are 3,295 siva files for the repo https://github.com/google/skia-buildbot.

So, it was my mistake, pga get IS downloading the exact number of siva files, however I'm intrigued on this extreme number of siva files for one repo. Is it normal?

@ajnavarro
Copy link
Contributor

@gomesfernanda When you clone the repository with standard refspecs you will obtain something like that:

$ git clone [email protected]:google/skia-buildbot.git
Cloning into 'skia-buildbot'...
remote: Enumerating objects: 3543, done.
remote: Counting objects: 100% (3543/3543), done.
remote: Compressing objects: 100% (2652/2652), done.
remote: Total 108260 (delta 1931), reused 1807 (delta 598), pack-reused 104717
Receiving objects: 100% (108260/108260), 51.61 MiB | 398.00 KiB/s, done.
Resolving deltas: 100% (77333/77333), done.

It contains just a few branches and only 4 root commits:

$ git rev-list --all --remotes --max-parents=0 | wc -l
4

But if you fetch using the same refspec that was used to fetch that repository using Borges:

$ git checkout origin/master

$ git fetch origin +refs/*:refs/*
remote: Enumerating objects: 60611, done.
remote: Counting objects: 100% (60611/60611), done.
remote: Compressing objects: 100% (13044/13044), done.
receiving objects:  53% (82299/155281), 60.64 MiB | 1.14 MiB/s    
[...]

$ git rev-list --all --remotes --max-parents=0 | wc -l
5734

That means, right now that repository will be on 5734 different siva files.

This is because they are using Gerrit. Gerrit generates a new orphan branch per each "pull request".

@smola smola added the empathy-sessions Issue filed as as part of empathy sessions label Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
empathy-sessions Issue filed as as part of empathy sessions
Projects
None yet
Development

No branches or pull requests

3 participants