-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large repo size #1403
Comments
Good catch! I agree this is bad, and that we should fix it asap. |
Ouch -- I can't believe that not only did I let that rrd file slip through in #1085, but I edited it and uploaded a modified version in #1301 🤦♂️ We produce these files in the CI so there's zero reason for them to be checked in for exactly this reason. It will be annoying, but a history rewrite is probably worth wrangling. Might as well take advantage and see if there are other things we can purge while we're at it -- scanning the list, I see a Definitely need to set up a github workflow and some precommit hooks to default reject anything larger than a certain threshold. I'm shocked this isn't just a stock configurable behavior in github. Regarding (2), this should have been obvious, but is a pretty big flaw of the gh-pages branch deployment model. Thanks for pointing it out! In addition to collapsing commits, we might as well move the entire gh-pages deploy to the rerun-docs repository instead. |
Let's:
|
History has been rewritten as of 03-03-2023. Fresh clone is down to 22MB:
|
Hi I just cloned the repo to try a few of the examples and noticed it took quite a while, and current size is around 75MB, which is quite big for a git repo. Appart from being slightly annoying when cloning the repo, it can also have an impact on CI runtimes and costs, so I tried to understand where most of this size comes from.
When using the script from this gist, I get the following result:
Full file: file_sizes.txt
Browsing these results, it seems there are two main sources for this weight
demo.rrd
filesgh-pages
Regarding (1), are these demo files worth keeping checked in the repo. If their main purpose is for easy try with the python APIs, would it make sense to load them into the wheel instead? Is the
.rrd
file format going to stay stable? If not, it might incur further ballooning down the line. Maybe storing these files versioned somewhere else that can easily be curled or web-button-downloaded would make more sense?Regarding (2), It seems to me by looking at your CI actions that the branch does not contains the docs building logic and is only the build target storage. If that's the case, I can see two rather easy changes. Either publish instead on something like netlify, which is free for static content, even with your own DNS settings. Or use a different setting for your publishing on the
gh-pages
branch that makes that an orphan branch and rewrite over the same initial branch commit. As such, the objects do not get accumulated in the git history.I'm mentioning this repo size issue because this is the kind of things that requires a git history rewrite to fix, and you usually prefer doing these things early on, instead of when the number of contributors starts really growing.
The text was updated successfully, but these errors were encountered: