Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failing with "text file busy" #1609

Closed
jhamrick opened this issue Aug 28, 2015 · 15 comments
Closed

Build failing with "text file busy" #1609

jhamrick opened this issue Aug 28, 2015 · 15 comments
Labels
Operations Operations or server issue Support Support question

Comments

@jhamrick
Copy link

I've recently been having trouble with my builds failing, usually with the following error message (from https://readthedocs.org/projects/nbgrader/builds/3267954/):

Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.4/dist-packages/virtualenv.py", line 2363, in <module>
    main()
  File "/usr/local/lib/python3.4/dist-packages/virtualenv.py", line 832, in main
    symlink=options.symlink)
  File "/usr/local/lib/python3.4/dist-packages/virtualenv.py", line 994, in create_environment
    site_packages=site_packages, clear=clear, symlink=symlink))
  File "/usr/local/lib/python3.4/dist-packages/virtualenv.py", line 1288, in install_python
    shutil.copyfile(executable, py_executable)
  File "/usr/lib/python3.4/shutil.py", line 108, in copyfile
    with open(dst, 'wb') as fdst:
OSError: [Errno 26] Text file busy: '/home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/python3'
Using base prefix '/usr'
New python executable in /home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/python3

I read somewhere else that this could be due to running multiple builds at the same time, but then if that's the case, how can I prevent this from happening? It happens on pretty much every commit (e.g. it will pass on master but fail on latest).

@agjohnson
Copy link
Contributor

Hrm, this shouldn't be happening, as we try to make and effort to block builds if another task is running for the version. This might be a stale build on the builders, but is more likely a bug

@agjohnson agjohnson added Operations Operations or server issue Bug A bug labels Aug 31, 2015
@colons
Copy link

colons commented Sep 3, 2015

I think this is the same thing?

@jhamrick
Copy link
Author

jhamrick commented Sep 3, 2015

Looks like it, to me.

@jhamrick
Copy link
Author

jhamrick commented Sep 3, 2015

I should clarify that I have tried wiping the virtualenvs for all my builds, which usually makes things work a little better for the first build after that, but then goes back to failing pretty much everytime after that.

@agjohnson
Copy link
Contributor

@jhamrick forgot to look into this on the servers, sorry for the delay. I see the nbgrader project causing a number of defunct python3 processes on our build servers. Any thoughts on why that would be the case?

Here's the specific command that that forked for python3 and never cleaned up:

home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/python /home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/master/bin/jupyter-nbconvert --to rst --execute --FilesWriter.build_directory=user_guide user_guide/03_generating_assignments.ipynb

I've killed the processes for now, but sounds like it will come back.

@colons I don't see the same defunct process behavior on your issue, unless it has already resolved

@jhamrick
Copy link
Author

jhamrick commented Sep 6, 2015

@agjohnson Thanks for looking into this! The defunct python processes is definitely odd. There are two other types of errors that I've been seeing, both of which are bizarre to me, but perhaps are possibly related?

  1. There are some failed builds that don't actually have any error messages associated with them (e.g. https://readthedocs.org/projects/nbgrader/builds/3285081/) so I have not idea what's going wrong with those. Perhaps those correspond to the defunct processes you found?
  2. There has also been the occasional failed build that looks like it's for some reason trying to use a different python executable (e.g. https://readthedocs.org/projects/nbgrader/builds/3283087/), though I haven't actually seen this one in a while. In that error message, you'll notice that it complains about /home/docs/checkouts/readthedocs.org/user_builds/nbgrader/envs/latest/bin/python3 not existing, which is unsurprising given that the build is for the master environment, not latest. I think I have a little more insight into this one, which is that I suspect it has to do with how I'm building my docs -- I have some of my docs in Jupyter (IPython) notebooks, and during the build execute the code in those notebooks. It should just be using whatever python is installed, but as Jupyter notebooks can be executed with different kernels it's possible that it's trying to use a different executable. It seems like actually the defunct python processes are perhaps the result of these failures, though I don't know why the process would just hang as opposed to exiting.

I am going to be away for a couple weeks, but will investigate the second type of failed build further when I get back. Do you have any insight as to what is causing the first type of build (that doesn't seem to have any error messages) to fail?

@agjohnson
Copy link
Contributor

My guess is that failures without response are likely the task not reporting a response when python process goes defunct, with subsequent errors about busy files a symptom of the defunct process still hanging around.

@khrapovs
Copy link

khrapovs commented Sep 8, 2015

I would love to see some progress on this as my project (https://readthedocs.org/projects/dataanalysispython/builds/) is failing to build for several days now. I tried wiping it to no avail.
Thanks in advance!

@agjohnson
Copy link
Contributor

@khrapovs your project is also causing defunct processes, I've had to clean out stale tasks from both projects periodically. Your build processes aren't going defunct, they seem to be in a loop eating up a considerable amount of resources.

@agjohnson
Copy link
Contributor

@khrapovs the fact that you are loading thedoctest module and making heavy use of doctest syntax in your examples might be part of the problem -- see http://sphinx-doc.org/ext/doctest.html#confval-doctest_test_doctest_blocks

@agjohnson agjohnson added Support Support question and removed Bug A bug labels Sep 8, 2015
@khrapovs
Copy link

@agjohnson Ok, I have removed doctest extension. I wiped the project. I deleted it completely a couple of times. It still does not compile. Moreover, it "builds" for two hours and then fails. Logs do not report any errors.

@agjohnson
Copy link
Contributor

@khrapovs Your last build was building for 60m and I see the same behavior on a build of your project. strace on the python process shows it's completely locked up -- perhaps in a loop?

@agjohnson
Copy link
Contributor

@khrapovs your use of ipython to perform calculations is the problem. I can reproduce this locally. I halted the sphinx build process when it hung locally looping over your calculation:

% make html
sphinx-build -b html -d _build/doctrees   . _build/html
Running Sphinx v1.3.1
loading pickled environment... not yet created
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 8 source files that are out of date
updating environment: 8 added, 0 changed, 0 removed
reading sources... [ 12%] dataio
reading sources... [ 25%] index
reading sources... [ 37%] introduction
reading sources... [ 50%] numpy
reading sources... [ 62%] pandas
reading sources... [ 75%] pythonbasics
^C

>>>-------------------------------------------------------------------------
Exception in /Users/anthony/tmp/dataanalysispython/notes/pythonbasics.rst at block ending on line 658
Specify :okexcept: as an option in the ipython:: block to suppress this message
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-20-e4e8587042a6> in <module>()
      1 while error > 1e-3:
      2     pi *= 4*i**2 / (4*i**2 - 1)
----> 3     error = abs(pi - 3.141592653589793)
      4 print(pi)
      5

KeyboardInterrupt:
<<<-------------------------------------------------------------------------

@khrapovs
Copy link

@agjohnson Yep, just found it and already pushed the fix.
Please, accept my sincerest apologies.

@gregmuellegger
Copy link
Contributor

Cool that we could help resolving the issue. Closing this as the builds are back to normal 🌞

humitos added a commit that referenced this issue Oct 11, 2018
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and
while building after that time another build is triggered for the same
Version and the same builder takes the task the lock will be
considered "old" and remove and taken by the new build.

This will end up in a collision when accessing the files and it could
raise an exception like `IOError: [Errno 26] Text file busy`. Also, it
could fail with another unexpected reasons.

This PR increases the `max_lock_age` to the same value assigned for
the project to end the build in order:

* custom container time limit or,
* `settings.DOCKER_LIMITS['time']` or,
* `settings.REPO_LOCK_SECONDS` or,
* 30 seconds

Related to #1609
humitos added a commit that referenced this issue Oct 11, 2018
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and
while building after that time another build is triggered for the same
Version and the same builder takes the task the lock will be
considered "old" and remove and taken by the new build.

This will end up in a collision when accessing the files and it could
raise an exception like `IOError: [Errno 26] Text file busy`. Also, it
could fail with another unexpected reasons.

This PR increases the `max_lock_age` to the same value assigned for
the project to end the build in order:

* custom container time limit or,
* `settings.DOCKER_LIMITS['time']` or,
* `settings.REPO_LOCK_SECONDS` or,
* 30 seconds

Related to #1609
humitos added a commit that referenced this issue Oct 11, 2018
When building a project, if it tooks more than `REPO_LOCK_SECONDS` and
while building after that time another build is triggered for the same
Version and the same builder takes the task the lock will be
considered "old" and remove and taken by the new build.

This will end up in a collision when accessing the files and it could
raise an exception like `IOError: [Errno 26] Text file busy`. Also, it
could fail with another unexpected reasons.

This PR increases the `max_lock_age` to the same value assigned for
the project to end the build in order:

* custom container time limit or,
* `settings.DOCKER_LIMITS['time']` or,
* `settings.REPO_LOCK_SECONDS` or,
* 30 seconds

Related to #1609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Operations Operations or server issue Support Support question
Projects
None yet
Development

No branches or pull requests

5 participants