-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shortening CI and doc building time #28
Comments
@idomic is working on this one. He found an option to cache conda environments: https://dev.to/epassaro/caching-anaconda-environments-in-github-actions-5hde some thoughts: Since our projects are Python packages, they must have a the solution that Ido found relies on conda environment files (env.yaml), instead of (setup.py) files but the caching action looks like a generic caching feature so I think it'll work. I think that changing the run step here (from the proposed solution in the blog post): - name: Update environment
run: mamba env update -n my-env -f environment.yml
if: steps.cache.outputs.cache-hit != 'true'
for the series of steps that we run to prepare the environment will work for example, for sk-eval, the run step would be all the pip install commands that we have here: https://github.com/ploomber/sklearn-evaluation/blob/3b3f6aa8f568c0b444d132f369e123026fb1c303/.github/workflows/ci.yml#L42 |
If the above doesn't work, we can generate a conda environment file after running the |
@idomic as discussed earlier - in the top comment here I mention a few things about speeding up the doc building process. let's assign someone to take a deep dive on this |
did some research on accelerating docs building parallel builds
we can enable parallel notebook executing with the parallel:
serial:
noteslooks like sphinx offers the One important thing I noted is that this is a "best-effort" thing, if some steps are not parallelizable, the parameter is ignored. I realized this because the flag didn't work I saw some warnings about some latex thing not being parallelizable, then I tried commenting out all the variables in conf.py that reference to latex; ran again and this time it worked (the logs don't say anything but I realized it was working bc the logs printed all at once at the end as opposed to line by line, which happens when running serially) caching
Another option is caching the notebooks. jupyter-book already offers a mechanism to cache previously executed notebooks. The first thing I tried was to enable readthedocs offline formats so each build would produce a zip with all the HTML contents. However, this didn't work and the command triggered a full doc build again since jupyter-book uses a separate folder to store the notebook's state. the only solution I can think of is to use readthedocs config to run a command before and after building the docs (this is supported). after a successful build, we can upload the full folder to S3 and before each build we can fetch the folder. this will speed things up a lot as the build will only have to run notebooks that have changed. |
Which repo should we focus on for this issue? (I saw @edublancas has fixed for skeval). |
let's focus on JupySQL, the tests are pretty fast on ubuntu but slow on windows and mac, although that appears to be a hardware problem. I think let's focus on implementing the documentation caching (as described in my earlier comment) and see how well it works. |
It's also pretty sporadic, sometimes 5m, sometimes 15m, I'm not sure a lot can be done there. Keep updating here. |
following up on the comment from @neelasha23 in our team brainstorming:
shortening the CI and doc building time is essential since we're blocked while waiting for this to finish, sharing a few thoughts
ci
I recently added a flag to our repos so pytest prints tests that take >5 seconds to finish, this can help us spot slow tests and fix them. there might be cases where we have no choice, but in the majority of cases we should strive to have fast tests; we could go beyond printing logs and enforce this by failing slow tests. if we don't enforce it as a rule, then we'd have to run periodic checks to look for slow tests
another solution would be to find a vendor that can run our tests in more powerful hardware. I remember seeing some of them. This can be a quick win.
Finally, can also try running pytest in parallel, there's an extension that allows doing that; the challenge there is that if our tests are not fully isolated, we might sun into issues that will be hard to debug.
docs
similarly, as with the CI, we could print the running time for each notebook so we spot slow ones. I think jupyter-book has an option to timeout slow notebooks so that's another option.
we can also check if readthedocs has an option to build our docs in faster hardware.
running notebooks in parallel is also another option, but in this case, we'll have to implement it ourselves. an alternative can be a caching layer. readthedocs doesn't cache anything since it starts from an empty container but jupyter-book has caching built-in. so we could build something that fetches the latest successful docs build from master/main so we can benefit from it. But I'm unsure if readthedocs has an API for downloading previous builds
The text was updated successfully, but these errors were encountered: