Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo is getting large #41

Open
jchodera opened this issue Mar 19, 2020 · 3 comments
Open

Repo is getting large #41

jchodera opened this issue Mar 19, 2020 · 3 comments

Comments

@jchodera
Copy link
Member

This repo is getting to be pretty large (>1.2GB), so I'd like to suggest we break it up into smaller repos or put files on osf.io and link from here so that it's still possible for people to check it out without losing their connection.

@mohe2015
Copy link

What about Git's Large File Storage?

@jchodera
Copy link
Member Author

That could work, but we also need to clean things up with the BFG Repo Cleaner.

Other alternatives:

  • gzipped tarballs of input structures instead of uncompressed structures
  • store input files on osf.io

@derrickstolee
Copy link

Another option is microsoft/scalar, or Git with partial clone. Scalar will get you set up with partial clone and Git's sparse-checkout feature automatically. That will save network time because you are not downloading every version of every file. The sparse-checkout means you can expand the working directory as you need it.

I tested against this repo with Scalar 20.03.167.1 on Windows. It should work the same on Mac.

Start by cloning:

$ scalar clone https://github.com/FoldingAtHome/coronavirus
Clone parameters:
  Repo URL:     https://github.com/FoldingAtHome/coronavirus
  Branch:       Default
  Cache Server: Default
  Local Cache:  C:\.scalarCache
  Destination:  C:\_git\t\coronavirus
  FullClone:     False
Authenticating...Succeeded
Fetching objects from remote...Succeeded
Checking out 'master'...Succeeded

$ cd coronavirus/src/

$ ls
README.md

Notice that only README.md is on disk. This is because the sparse-checkout is set to only include files at root. If you want the files for a certain directory (or list of directories) you can use the git sparse-checkout command:

$ git sparse-checkout set system-preparation/6m17
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 19 (delta 5), reused 3 (delta 3), pack-reused 9
Receiving objects: 100% (19/19), 39.51 MiB | 11.23 MiB/s, done.
Resolving deltas: 100% (5/5), done.
Updating files: 100% (19/19), done.

$ ls
README.md  system-preparation/

$ ls system-preparation/
6m17/  README.md

If you really want every file at HEAD, then git sparse-checkout disable will populate the entire working directory. However, this repo is large because of the number of files, not because of a deep history. When I disabled sparse-checkout I downloaded around a gig of data:

$ git sparse-checkout disable
remote: Enumerating objects: 82, done.
remote: Counting objects: 100% (82/82), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 576 (delta 68), reused 49 (delta 48), pack-reused 494
Receiving objects: 100% (576/576), 1.04 GiB | 11.14 MiB/s, done.
Resolving deltas: 100% (335/335), done.
Updating files: 100% (741/741), done.

$ ls
potential-targets/  publications/  README.md  system-preparation/

Note that you can do all of this with plain Git, but Scalar makes it a bit easier. As the repo continues to grow, Scalar can help in a few extra ways, too.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants