Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pex creation fails under pants w/ No such file or directory error #1051

Closed
asherf opened this issue Sep 30, 2020 · 13 comments · Fixed by #1062 or #1080
Closed

pex creation fails under pants w/ No such file or directory error #1051

asherf opened this issue Sep 30, 2020 · 13 comments · Fixed by #1062 or #1080
Assignees
Labels

Comments

@asherf
Copy link
Contributor

asherf commented Sep 30, 2020

23:06:05.36 [INFO] Starting: Resolving 3rdparty/python/constraints.txt
23:06:05.38 [WARN] /home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/base/exception_sink.py:313: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
  process_title=setproctitle.getproctitle(),
23:06:05.38 [ERROR] 1 Exception encountered:
Engine traceback:
  in select
  in `typecheck` goal
  in Typecheck using MyPy
  in pants.backend.python.typecheck.mypy.rules.mypy_typecheck_partition
  in pants.backend.python.util_rules.pex.create_pex
  in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/engine/process.py", line 241, in fallible_to_exec_result_or_raise
    raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Building mypy.pex with 1 requirement: mypy==0.782' failed with exit code 1.
stdout:
stderr:
Failed to spawn a job for DistributionTarget(interpreter=PythonInterpreter('/usr/local/bin/python3.8', PythonIdentity('/usr/local/bin/python3.8', 'cp38', 'cp38', 'manylinux2014_x86_64', (3, 8, 5)))): [Errno 2] No such file or directory: '/home/toolchain/.pex/unzipped_pexes/49b4b32295a60a0a4dfc0659c3a44f93560c4a81/.deps/pex-2.1.16-py2.py3-none-any.whl/pex/__pycache__/variables.cpython-37.pyc.139923989690656'
Traceback (most recent call last):
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 256, in run
    engine_result = self._run_v2()
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 168, in _run_v2
    return self._maybe_run_v2_body(goals, poll=False)
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 185, in _maybe_run_v2_body
    return self.graph_session.run_goal_rules(
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/init/engine_initializer.py", line 125, in run_goal_rules
    exit_code = self.scheduler_session.run_goal_rule(
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 568, in run_goal_rule
    self._raise_on_error([t for _, t in throws])
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 527, in _raise_on_error
    raise ExecutionError(
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:
Engine traceback:
  in select
  in `typecheck` goal
  in Typecheck using MyPy
  in pants.backend.python.typecheck.mypy.rules.mypy_typecheck_partition
  in pants.backend.python.util_rules.pex.create_pex
  in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0b2_py38/lib/python3.8/site-packages/pants/engine/process.py", line 241, in fallible_to_exec_result_or_raise
    raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Building mypy.pex with 1 requirement: mypy==0.782' failed with exit code 1.
stdout:
stderr:
Failed to spawn a job for DistributionTarget(interpreter=PythonInterpreter('/usr/local/bin/python3.8', PythonIdentity('/usr/local/bin/python3.8', 'cp38', 'cp38', 'manylinux2014_x86_64', (3, 8, 5)))): [Errno 2] No such file or directory: '/home/toolchain/.pex/unzipped_pexes/49b4b32295a60a0a4dfc0659c3a44f93560c4a81/.deps/pex-2.1.16-py2.py3-none-any.whl/pex/__pycache__/variables.cpython-37.pyc.139923989690656'

seeing this randomly from time to time in different CI runs.

@stuhood
Copy link

stuhood commented Sep 30, 2020

Pants now invokes multiple instances of mypy concurrently: it's possible that this is related. But the contents of the __pycache__ should be concurrency safe, afaik.

@jsirois
Copy link
Member

jsirois commented Oct 1, 2020

Looking very closely at the last line, I think this may be caused by the sys.executable value that lies issue on OSX framework builds (#1009 fixed by #1049) Since what pex thinks is a Python 3.8 interpreter is actually launching a Python 3.7 interpreter (the file not found is a 3.7 bytecode file Failed to spawn a job for DistributionTarget(interpreter=PythonInterpreter('/usr/local/bin/python3.8', PythonIdentity('/usr/local/bin/python3.8', 'cp38', 'cp38', 'manylinux2014_x86_64', (3, 8, 5)))): [Errno 2] No such file or directory: '/home/toolchain/.pex/unzipped_pexes/49b4b32295a60a0a4dfc0659c3a44f93560c4a81/.deps/pex-2.1.16-py2.py3-none-any.whl/pex/__pycache__/variables.cpython-37.pyc.139923989690656'.

I can't provide a causal chain reason yet though so I'll leave this open to work on a bit harder.

@jsirois jsirois added the bug label Oct 1, 2020
@jsirois jsirois self-assigned this Oct 1, 2020
@asherf
Copy link
Contributor Author

asherf commented Oct 1, 2020

this happens in CI (circleci), not OSX.

@jsirois
Copy link
Member

jsirois commented Oct 1, 2020

Aha, ok - almost certainly something else then. I'll take a look.

@jsirois jsirois mentioned this issue Oct 1, 2020
4 tasks
@jsirois
Copy link
Member

jsirois commented Oct 1, 2020

Notes:

  1. If this is a straight-up concurrency problem, the relevant 3.8 code is here:
    https://github.com/python/cpython/blob/4c2e299d80c53591f05de2669c0edeaf8acd8544/Lib/importlib/_bootstrap_external.py#L120-L139
    And the relevant 3.7 code is here and the same:
    https://github.com/python/cpython/blob/4e02981de0952f54bf87967f8e10d169d6946b40/Lib/importlib/_bootstrap_external.py#L105-L124
    As commented the code is best effort. Two independent processes could have the same object id for the string representing the path of the final pyc file and could thus race. This seems really unlikely though and I'm not sure how we'd fix this without turning off bytecode compilation.

  2. This may not be a straight-up concurrency issue since the interpreter (3.8) vs bytecode path (37) indicates something unexpected.

@jsirois
Copy link
Member

jsirois commented Oct 1, 2020

Narrowing down a bit more, this error involves running the Pex PEX which is the one released by the Pex proejct and marked as --unzip. This means for the corresponding points above:

  1. Since the PEX bootstrap is doing the unzipping into PEX_ROOT/unzipped_pexes using atomic_directory which is UUID4 best-effort it could also eagerly bytecode compile all files while under that lock to circumvent the much weaker best effort of CPython which just relies on object id - aka memory address - uniqueness.
  2. This could explain the interpreter mismatch if we're running the Pex PEX with 3.8 but it then re-execs itself with 3.7. That needs more investigation though since the Pex PEX interpreter constraints are ">=2.7,<3.9,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*" which allows for 3.8 and since Pex supposedly always uses the current interpreter instead of re-execing if the current interpreter matches constraints.

@jsirois jsirois mentioned this issue Oct 3, 2020
5 tasks
@jsirois
Copy link
Member

jsirois commented Oct 5, 2020

Alright, explanation in hand. The issue here is AtomicDirectory. That code is robust but should only be used when the contents of a directory you want to create atomically are immutable. In the case of --unzip (~/.pex/unzipped_pexes/...) and --not-zip-safe (~/.pex/code/...) and the unpacked wheel cache (~/.pex/installed_wheels/...) where the directory is populated with python code that will be executed later, the directory contents is appended to later by python bytecode compilation. Since AtomicDirectory has at least once semantics, in a race involving more than one process filling the directory atomically simultaneously we can have:

  1. 1st process creates the directory.
  2. Another process executes against that directory and creates a temporary bytecode file but does not yet rename it: https://github.com/python/cpython/blob/4c2e299d80c53591f05de2669c0edeaf8acd8544/Lib/importlib/_bootstrap_external.py#L126-L132
  3. 2nd process (re) creates the directory, losing the temprary bytecode file from 2.
  4. The process in 2 finds the temporary bytecode file it wrote is missing when attempting to atomically rename it: https://github.com/python/cpython/blob/4c2e299d80c53591f05de2669c0edeaf8acd8544/Lib/importlib/_bootstrap_external.py#L133

For atomic mutable directory creation we'll need an interprocess lock. Another approach would be to continue with AtomicDirectory but append an identifier to the path unique to the bytecode compilation, say cpython-37. This has the disadvantage though of keeping a copy of source code for each unique interpreter run against it.

@Eric-Arellano
Copy link
Contributor

Do either of these solutions need to be applied unconditionally, or we could try to detect when it's unsafe and only use these fixes then?

@jsirois
Copy link
Member

jsirois commented Oct 5, 2020

Do either of these solutions need to be applied unconditionally, or we could try to detect when it's unsafe and only use these fixes then?

They need to be applied unconditionally since detecting when its safe amounts to detecting if there are other similar processes racing which amounts to either implementing an interprocess lock (the proposed solution) or else eliminating the need for a lock (the alternative solution).

jsirois added a commit to jsirois/pex that referenced this issue Oct 8, 2020
Use this new mode to ensure directories Pex creates that contain Python
code are created exactly once so that the implicit Python bytecode
compilation process is not thwarted by racing directory creation.

Fixes pex-tool#1051
jsirois added a commit that referenced this issue Oct 8, 2020
Use this new mode to ensure directories Pex creates that contain Python
code are created exactly once so that the implicit Python bytecode
compilation process is not thwarted by racing directory creation.

Fixes #1051
@jsirois jsirois removed the bug label Oct 8, 2020
@asherf
Copy link
Contributor Author

asherf commented Oct 8, 2020

yay! thanks for fixing @jsirois !

@asherf
Copy link
Contributor Author

asherf commented Oct 13, 2020

@jsirois need to reopen this... Since I am seeing this with the latest release.


Engine traceback:
  in select
  in pants.core.goals.typecheck.typecheck
  in pants.backend.python.typecheck.mypy.rules.mypy_typecheck
  in pants.backend.python.typecheck.mypy.rules.mypy_typecheck_partition
  in pants.backend.python.util_rules.pex.create_pex
  in pants.engine.process.fallible_to_exec_result_or_raise
Traceback (most recent call last):
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/engine/process.py", line 241, in fallible_to_exec_result_or_raise
    raise ProcessExecutionFailure(
pants.engine.process.ProcessExecutionFailure: Process 'Resolving 3rdparty/python/constraints.txt' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for DistributionTarget(interpreter=PythonInterpreter('/usr/local/bin/python3.8', PythonIdentity('/usr/local/bin/python3.8', 'cp38', 'cp38', 'manylinux2014_x86_64', (3, 8, 5)))): [Errno 2] No such file or directory: '/home/toolchain/.pex/unzipped_pexes/d8bfff1518d5d211d4ec95cf18305ff3939eef68/.deps/pex-2.1.18-py2.py3-none-any.whl/pex/__pycache__/platforms.cpython-37.pyc.139727263750640'


Traceback (most recent call last):
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 281, in run
    engine_result = self._run_v2()
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 193, in _run_v2
    return self._maybe_run_v2_body(goals, poll=False)
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/bin/local_pants_runner.py", line 210, in _maybe_run_v2_body
    return self.graph_session.run_goal_rules(
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/init/engine_initializer.py", line 126, in run_goal_rules
    exit_code = self.scheduler_session.run_goal_rule(
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 569, in run_goal_rule
    self._raise_on_error([t for _, t in throws])
  File "/home/toolchain/.cache/pants/setup/bootstrap-Linux-x86_64/2.0.0rc0_py38/lib/python3.8/site-packages/pants/engine/internals/scheduler.py", line 528, in _raise_on_error
    raise ExecutionError(
pants.engine.internals.scheduler.ExecutionError: 1 Exception encountered:

@jsirois jsirois reopened this Oct 17, 2020
@jsirois
Copy link
Member

jsirois commented Oct 17, 2020

Ok. Assuming Linux posix APIs are not broken, which is a very safe bet, the observation of temporary pyc files created by Python can only be via os.walk. Pex has at least one of those.

@jsirois jsirois added the bug label Oct 17, 2020
jsirois added a commit to jsirois/pex that referenced this issue Oct 17, 2020
Previously `dir_hash` (and `pex_hash`) were able to observe in-flight
bytecode compilation which would lead to observe-delete-failedhash
sequencing.

Fixes pex-tool#1051
jsirois added a commit that referenced this issue Oct 17, 2020
Previously `dir_hash` (and `pex_hash`) were able to observe in-flight
bytecode compilation which would lead to observe-delete-failedhash
sequencing.

Fixes #1051
@jsirois
Copy link
Member

jsirois commented Nov 6, 2020

Another instance of this was found in #1098 and fixed in #1099.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants