[Core] Fix uv >=0.10.5 stripping execute permissions on XFS filesystems#8904
Conversation
|
/smoke-test -k test_aws_storage_mounts_cached |
Summary of ChangesHello @zpoint, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue where Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a regression in uv >= 0.10.5 that caused execute permissions to be stripped on XFS filesystems. The fix, which involves setting UV_LINK_MODE=copy for all uv invocations, is applied consistently across all relevant Dockerfiles, Python constants, and Kubernetes templates. The changes are clear and well-explained. I have one suggestion to improve the maintainability of the Kubernetes template by using a variable for a repeated command prefix.
| # set UV_SYSTEM_PYTHON to false in case the user provided docker image set it to true. | ||
| # unset PYTHONPATH and set CWD to $HOME to avoid user image interfering with SkyPilot runtime. | ||
| VIRTUAL_ENV=~/skypilot-runtime UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip install skypilot[kubernetes,remote] | ||
| VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip install skypilot[kubernetes,remote] | ||
| # Wait for `patch` package to be installed before applying ray patches | ||
| until dpkg -l | grep -q "^ii patch "; do | ||
| sleep 0.1 | ||
| echo "Waiting for patch package to be installed..." | ||
| done | ||
| # Apply Ray patches for progress bar fix | ||
| # ~/.sky/python_path is seeded by conda_installation_commands | ||
| VIRTUAL_ENV=~/skypilot-runtime UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && { | ||
| VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && { |
There was a problem hiding this comment.
To improve maintainability and avoid repetition, you could define a Jinja2 variable for the common uv command prefix and reuse it for both uv invocations.
{%- set uv_cmd_prefix = "VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false " + sky_unset_pythonpath_and_set_cwd + " ~/.local/bin/uv" %}
# set UV_SYSTEM_PYTHON to false in case the user provided docker image set it to true.
# unset PYTHONPATH and set CWD to $HOME to avoid user image interfering with SkyPilot runtime.
{{ uv_cmd_prefix }} pip install skypilot[kubernetes,remote]
# Wait for `patch` package to be installed before applying ray patches
until dpkg -l | grep -q "^ii patch "; do
sleep 0.1
echo "Waiting for patch package to be installed..."
done
# Apply Ray patches for progress bar fix
# ~/.sky/python_path is seeded by conda_installation_commands
{{ uv_cmd_prefix }} pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && {
uv 0.10.5 introduced a regression where its reflink/clone link mode strips execute permissions from wheel data files on XFS filesystems. This causes Ray's gcs_server and raylet binaries to be installed as 664 instead of 775, breaking ray start with PermissionError on Amazon Linux 2023 (which uses XFS). Set UV_LINK_MODE=copy in all uv invocations to bypass the broken reflink code path. This is forward-compatible and has negligible performance impact since copy mode is already the default on ext4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3fdd617 to
1aee297
Compare
|
/smoke-test |
1 similar comment
|
/smoke-test |
Michaelvll
left a comment
There was a problem hiding this comment.
Great catch @zpoint! We may need to cherry pick this to the stable release.
|
/smoke-test --kubernetes |
|
Confirmed that this fixes the issue in 0.11.1 (we patched it as a preDeployHook) Great detective work! |
…ms (#8904) uv 0.10.5 introduced a regression where its reflink/clone link mode strips execute permissions from wheel data files on XFS filesystems. This causes Ray's gcs_server and raylet binaries to be installed as 664 instead of 775, breaking ray start with PermissionError on Amazon Linux 2023 (which uses XFS). Set UV_LINK_MODE=copy in all uv invocations to bypass the broken reflink code path. This is forward-compatible and has negligible performance impact since copy mode is already the default on ext4. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ms (#8904) uv 0.10.5 introduced a regression where its reflink/clone link mode strips execute permissions from wheel data files on XFS filesystems. This causes Ray's gcs_server and raylet binaries to be installed as 664 instead of 775, breaking ray start with PermissionError on Amazon Linux 2023 (which uses XFS). Set UV_LINK_MODE=copy in all uv invocations to bypass the broken reflink code path. This is forward-compatible and has negligible performance impact since copy mode is already the default on ext4. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ms (#8904) uv 0.10.5 introduced a regression where its reflink/clone link mode strips execute permissions from wheel data files on XFS filesystems. This causes Ray's gcs_server and raylet binaries to be installed as 664 instead of 775, breaking ray start with PermissionError on Amazon Linux 2023 (which uses XFS). Set UV_LINK_MODE=copy in all uv invocations to bypass the broken reflink code path. This is forward-compatible and has negligible performance impact since copy mode is already the default on ext4. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
gcs_serverandrayletbinaries to be installed as664instead of775, breakingray startwithPermissionErroron Amazon Linux 2023UV_LINK_MODE=copyin all uv invocations (SKY_UV_CMDand Kubernetes templates) to bypass the broken reflink code pathRoot Cause
uv 0.10.5 PRs #18117 ("Attempt to use reflinks by default on Linux") and #18104 ("Fallback to hardlinks after reflink failure") introduced clone/reflink as the default link mode on Linux. On XFS filesystems (used by Amazon Linux 2023), this strips execute permissions from installed files. Ubuntu (ext4, no reflink support) is unaffected because uv falls back to copy mode.
Evidence — same commit, different uv versions:
775permissions, uv 0.10.5 →664permissionsChanges
sky/skylet/constants.pyUV_LINK_MODE=copytoSKY_UV_CMD(propagates to all downstream uv commands)sky/templates/kubernetes-ray.yml.j2UV_LINK_MODE=copyto 2 hardcoded uv invocations that bypassSKY_UV_CMDTest plan
bash format.sh— all checks pass (yapf, isort, mypy, pylint)pytest tests/unit_tests/test_controller_utils.py— 34/34 passedsky launchon AL2023 AMI (ami-0a5a5b7e2278263e5, us-east-2) with uv 0.10.5 →gcs_serverandrayletpermissions are775/smoke-test -k test_aws_storage_mounts_cached: https://buildkite.com/skypilot-1/smoke-tests/builds/8406#019c8f7e-b307-4f1b-a4a7-09912ed3489b🤖 Generated with Claude Code