Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build time out #1767

Closed
chebee7i opened this issue Oct 18, 2015 · 24 comments
Closed

Build time out #1767

chebee7i opened this issue Oct 18, 2015 · 24 comments
Assignees
Labels
Support Support question

Comments

@chebee7i
Copy link
Contributor

I'm trying to build, but sphinx takes more than 900 seconds to finish and so the job is timing out.

https://readthedocs.org/projects/chebee7i-networkx/builds/3408967/

Is there a way to increase this limit?

@martenson
Copy link

For two weeks or so we also cannot build the Galaxy docs on readthedocs.org. Every build fails with timeout (which is 900s) and the suspected offender is the command where pip installs our requirements.txt (takes ~500s, see the 3rd command on this page https://readthedocs.org/projects/galaxy/builds/3407847/ ). Because of this taking so long we do not have enough time to build our docs before the build times out.

In the last successful build this step took 11 seconds instead of 500 because the cache was used. (see https://readthedocs.org/projects/galaxy/builds/3358410/ )

How to solve this? Did something change on RTD side?

@agjohnson
Copy link
Contributor

We've recently updated our build backend processes. Among other changes for security, they now also timeout after taking too long to build. Consider ways to shorten your build if possible -- pruning requirements and mocking unnecessary modules for documentation generation will speed up install and compilation times.

In the near future, we'd like to allow projects that have donated or have a gold subscription longer build times and more memory. We're also working on https://github.com/rtfd/sphinx-autoapi to get around the whole issue of Python needing to execute code just to obtain docstrings.

@agjohnson agjohnson added the Support Support question label Oct 21, 2015
@chebee7i
Copy link
Contributor Author

How about a more reasonable timeout? I think this is just going to push users/projects away from rtd.

@agjohnson
Copy link
Contributor

Sure, a more reasonable timeout would be fine, with more reasonable compensation :)

Really though, we're not necessarily against raising the limit here -- what would you consider to be a fair timeout limit?

In the end, we're a free service with a minuscule budget. We need to be able to put restrictions on builds to maintain fair queueing. We can't continue to offer free, unbound, unlimited builds without some support -- our budget can't support it. We hope more users are willing to support us if they find they are heavy users.

@martenson
Copy link

@agjohnson Firstly, thank you for your response. Getting a support for a free service feels heavenly. We much appreciate the services you guys are providing and we are giving you credit on many occasions including our release notes. Thank you for the work you are doing.

I do not consider Galaxy Project being a 'heavy user' - we do doc builds cca every two weeks. It is just 'heavy project' for your process because it has many dependencies. Moreover, if the cache was used, as in pre-change builds, the build would take ~500s less.

We will explore possible solutions of this situation from our side and update this thread if we come up with something.

@agjohnson
Copy link
Contributor

@martenson Agreed, I think build frequency would be good to gauge here as well. A project that is built every couple of days is not the same as a project that sees frequent commits and long build times.

Maybe that means that keeping a low base timeout limit, but weight infrequent builds with more additional build time.

@martenson
Copy link

I tried to rebuild our docs with
Give the virtual environment access to the global site-packages dir. set to True and got a speedup in the dependencies. The build failed too but with a different error and before the 900s timeout (Build took 845 seconds). Is this still the same issue - the build being killed form outside?

writing... Killed
Command time: 142s Return: 137

link to build: https://readthedocs.org/projects/galaxy/builds/3422092/

@agjohnson
Copy link
Contributor

@martenson Yeah, that looks like an OOM kill on the build vm, due to limits set up with Docker. It should catch that as a failure and report it, though we use metadata from Docker's API to determine an OOM kill, this might just be an uncaught kill.

Currently, our build memory limit is 1G, it might be worth figuring out if that is normal usage. We've done some complex builds that track large API reference sets, and still haven't hit that sort of memory limit. I'll see if I can't dig up any more info on our end.

Another byproduct of Docker containerization is that we aren't sharing the pip cache anymore -- which might speed up the builds in your case. I'll open a ticket about localizing the pip cache for the containers.

@martenson
Copy link

We touched this topic at our internal meeting and the outcome is that

If the feature was available we would be happy to pay for being able to build and host on readthedocs.

But afaik that option is not possible yet and with putting these restrictions on builds we are in a bad spot. :/

@chebee7i
Copy link
Contributor Author

Sorry I haven't had time to come back to this. I think many projects do not need to build the docs on each commit. Once a day at a pre-specified time would be more than enough...not sure if it is possible to set something like that up though.

@agjohnson
Copy link
Contributor

I haven't had time to get back to this. I have some thoughts on making the build timeout a bit more fair for users, past users that have donated. Unfortunately, I haven't had time to work on this much in the last week. I'll see about just bumping up the limits for now to allow for your projects to build for the short term.

@mscuthbert
Copy link

Would allowing projects that don't need it to opt out of the single page build (no one will ever read this) or the json build be appropriate? I just moved a project to readthedocs -- wonderful! So thankful for this VOLUNTEER, FREE service -- and I'm running into the timeouts (music21.readthedocs.org), but only when it gets to the second and third builds (json and single-page). I'd gladly opt out of that if I could. Thanks!

I'd also be fine with limits on total time per day (though, from experiences with other projects, increasing this during the "setup" period can help to ease frustration; I've probably built 25 times today trying to get the transition set up; after this, I'll probably need a build less than once a week)

@martenson
Copy link

Galaxy has very similar experience to @mscuthbert - the additional builds are not needed, and one build per week would probably be enough. To be frank I personally do not understand the direction readthedocs is moving. This is documentation, not a production bug that needs to be built and deployed asap.

In the meantime we were forced to host our own documentation. The docs build from scratch takes us 4mins on an old machine.

martenson added a commit to martenson/galaxy that referenced this issue Dec 3, 2015
martenson added a commit to martenson/galaxy that referenced this issue Dec 3, 2015
@martenson
Copy link

Just letting you know that due to these issues we had to leave readthedocs and started to host our own docs...sadly.

https://docs.galaxyproject.org/en/master/index.html

@agjohnson
Copy link
Contributor

Unfortunate, yes, however enforcing build timeouts for projects like this has addressed a number of issues that required our constant attention: rampant builds, daily build queue congestion, resource contention on the build servers. This greatly improves the service for 97% of users, though at the expense of 3% of users. If we had the funds and the time, we could take on the operational costs that unchecked builds incur -- we have neither though.

As I mentioned above, there's room for improvement on fairer build queue timing, however this work currently has a lower priority than the work that is keeping everything moving. Users can already support a project using gold subscriptions, we look to add a longer timeout for these blessed projects. This will happen likely this month or next. I think the most correct answer for the future is a longer timeout dependent on build queue depth, but that's more work than we can immediately muster.

tl;dr - free services can't be fair to all of their users and those donating their time to keeping the service running at the same time.

@agjohnson
Copy link
Contributor

For the project mentioned here, I've increased the allowed build timeouts. We've recently added per project settings for some of the container level settings. An additional builder has helped to keep build queue congestion down slightly.

Again, this will eventually be a gold subscription feature, if you find it useful, consider donating. We still need to add the ability for gold subscription blessed project to alter the build settings, which is the next piece here.

@agjohnson
Copy link
Contributor

Closing this here, as we have the ability to increase limits for specific projects now.

@arsenovic
Copy link

i also have run into this problem.

what is the recommended solution?

@perone
Copy link

perone commented Apr 18, 2018

I'm also facing the same problem: "Command killed due to excessive memory consumption" when pip is installing dependencies. What can be done ? @agjohnson how can the limit be increased ?

@davidfischer
Copy link
Contributor

@perone, can you post more details preferably in a separate issue with some details? Which project is it for? How long does the build take to run locally? How much memory does it use locally?

Even large projects to do not generally take more than a few hundred MB of memory unless something is wrong.

@perone
Copy link

perone commented Apr 18, 2018

@davidfischer Hi David, I'm pretty sure that the problem is with PyTorch pip installation. It has a large file size:

torch-0.3.1-cp27-cp27mu-manylinux1_x86_64.whl (496.9MB)

However this shouldn't trigger a OOM killer right ?

PS: PyTorch is a pretty common framework, so I wonder how other people solved that.

@humitos
Copy link
Member

humitos commented Apr 19, 2018

However this shouldn't trigger a OOM killer right ?

It seems there are more people with the same problem:

pytorch/pytorch#1022

I also think that pip has some problems with big files. It's not the first time a read something similar.

@humitos
Copy link
Member

humitos commented Apr 19, 2018

Also, @perone, do you need pytorch to build your docs? Maybe it's a good reason to have a separate requirements.txt file for RTD.

(and also save a lot of bandwith :) )

@perone
Copy link

perone commented Apr 19, 2018

Thanks @humitos, I already did that (separate requirements for RTD), but the problem is that then I need to mock and mocking creates another issue for the documentation of inherited classes, so it's another problem to solve =(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Support Support question
Projects
None yet
Development

No branches or pull requests

8 participants