Skip to content

[Core] Fix uv >=0.10.5 stripping execute permissions on XFS filesystems#8904

Merged
lloyd-brown merged 1 commit intoskypilot-org:masterfrom
zpoint:fix/uv-xfs-permission-bug
Feb 24, 2026
Merged

[Core] Fix uv >=0.10.5 stripping execute permissions on XFS filesystems#8904
lloyd-brown merged 1 commit intoskypilot-org:masterfrom
zpoint:fix/uv-xfs-permission-bug

Conversation

@zpoint
Copy link
Collaborator

@zpoint zpoint commented Feb 24, 2026

Summary

  • uv 0.10.5 introduced a regression where its reflink/clone link mode strips execute permissions from wheel data files on XFS filesystems, causing Ray's gcs_server and raylet binaries to be installed as 664 instead of 775, breaking ray start with PermissionError on Amazon Linux 2023
  • Set UV_LINK_MODE=copy in all uv invocations (SKY_UV_CMD and Kubernetes templates) to bypass the broken reflink code path

Root Cause

uv 0.10.5 PRs #18117 ("Attempt to use reflinks by default on Linux") and #18104 ("Fallback to hardlinks after reflink failure") introduced clone/reflink as the default link mode on Linux. On XFS filesystems (used by Amazon Linux 2023), this strips execute permissions from installed files. Ubuntu (ext4, no reflink support) is unaffected because uv falls back to copy mode.

Evidence — same commit, different uv versions:

  • Build #8390passed (ran 00:36 UTC, got uv 0.10.4)
  • Build #8401failed (ran 08:56 UTC, got uv 0.10.5, published 00:55 UTC)
  • A/B test on AL2023 VM confirmed: uv 0.10.4 → 775 permissions, uv 0.10.5 → 664 permissions

Changes

File Change
sky/skylet/constants.py Add UV_LINK_MODE=copy to SKY_UV_CMD (propagates to all downstream uv commands)
sky/templates/kubernetes-ray.yml.j2 Add UV_LINK_MODE=copy to 2 hardcoded uv invocations that bypass SKY_UV_CMD

Test plan

🤖 Generated with Claude Code

@zpoint
Copy link
Collaborator Author

zpoint commented Feb 24, 2026

/smoke-test -k test_aws_storage_mounts_cached

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zpoint, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where uv versions 0.10.5 and newer incorrectly stripped execute permissions from installed binaries on XFS filesystems, leading to PermissionError when running Ray components on Amazon Linux 2023. The fix involves explicitly configuring uv to use the copy link mode, ensuring that file permissions are preserved during installation across various environments.

Highlights

  • Fix uv permission regression: Addresses a regression in uv versions >=0.10.5 that caused execute permissions to be stripped from installed binaries on XFS filesystems, specifically affecting Ray's gcs_server and raylet on Amazon Linux 2023.
  • Force copy link mode for uv: Implements a workaround by setting UV_LINK_MODE=copy in all uv invocations to bypass the problematic reflink/clone link mode.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • Dockerfile
    • Added ENV UV_LINK_MODE=copy to ensure uv uses copy mode during installation.
  • Dockerfile_k8s
    • Added ENV UV_LINK_MODE=copy to ensure uv uses copy mode in Kubernetes Dockerfiles.
  • Dockerfile_k8s_gpu
    • Added ENV UV_LINK_MODE=copy to ensure uv uses copy mode in Kubernetes GPU Dockerfiles.
  • sky/skylet/constants.py
    • Modified SKY_UV_CMD to prepend UV_LINK_MODE=copy to the uv command, affecting all downstream uv calls.
  • sky/templates/kubernetes-ray.yml.j2
    • Updated two uv pip install commands to include UV_LINK_MODE=copy, specifically for Kubernetes deployments that might bypass SKY_UV_CMD.
  • tests/smoke_tests/docker/Dockerfile_test
    • Added ENV UV_LINK_MODE=copy to the Dockerfile used for smoke tests.
Activity
  • The author identified a regression in uv version 0.10.5 causing permission issues on XFS filesystems.
  • The author performed A/B testing on an AL2023 VM, confirming the permission change between uv 0.10.4 (775) and 0.10.5 (664).
  • A detailed test plan was outlined, including format.sh, pytest, pre-commit hooks, and sky launch verification on AL2023.
  • No external review comments or further activity from other users are noted in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a regression in uv >= 0.10.5 that caused execute permissions to be stripped on XFS filesystems. The fix, which involves setting UV_LINK_MODE=copy for all uv invocations, is applied consistently across all relevant Dockerfiles, Python constants, and Kubernetes templates. The changes are clear and well-explained. I have one suggestion to improve the maintainability of the Kubernetes template by using a variable for a repeated command prefix.

Comment on lines 923 to +933
# set UV_SYSTEM_PYTHON to false in case the user provided docker image set it to true.
# unset PYTHONPATH and set CWD to $HOME to avoid user image interfering with SkyPilot runtime.
VIRTUAL_ENV=~/skypilot-runtime UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip install skypilot[kubernetes,remote]
VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip install skypilot[kubernetes,remote]
# Wait for `patch` package to be installed before applying ray patches
until dpkg -l | grep -q "^ii patch "; do
sleep 0.1
echo "Waiting for patch package to be installed..."
done
# Apply Ray patches for progress bar fix
# ~/.sky/python_path is seeded by conda_installation_commands
VIRTUAL_ENV=~/skypilot-runtime UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && {
VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false {{sky_unset_pythonpath_and_set_cwd}} ~/.local/bin/uv pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and avoid repetition, you could define a Jinja2 variable for the common uv command prefix and reuse it for both uv invocations.

                {%- set uv_cmd_prefix = "VIRTUAL_ENV=~/skypilot-runtime UV_LINK_MODE=copy UV_SYSTEM_PYTHON=false " + sky_unset_pythonpath_and_set_cwd + " ~/.local/bin/uv" %}
                # set UV_SYSTEM_PYTHON to false in case the user provided docker image set it to true.
                # unset PYTHONPATH and set CWD to $HOME to avoid user image interfering with SkyPilot runtime.
                {{ uv_cmd_prefix }} pip install skypilot[kubernetes,remote]
                # Wait for `patch` package to be installed before applying ray patches
                until dpkg -l | grep -q "^ii  patch "; do
                  sleep 0.1
                  echo "Waiting for patch package to be installed..."
                done
                # Apply Ray patches for progress bar fix
                # ~/.sky/python_path is seeded by conda_installation_commands
                {{ uv_cmd_prefix }} pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && {

uv 0.10.5 introduced a regression where its reflink/clone link mode
strips execute permissions from wheel data files on XFS filesystems.
This causes Ray's gcs_server and raylet binaries to be installed as
664 instead of 775, breaking ray start with PermissionError on
Amazon Linux 2023 (which uses XFS).

Set UV_LINK_MODE=copy in all uv invocations to bypass the broken
reflink code path. This is forward-compatible and has negligible
performance impact since copy mode is already the default on ext4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zpoint zpoint force-pushed the fix/uv-xfs-permission-bug branch from 3fdd617 to 1aee297 Compare February 24, 2026 12:07
@zpoint
Copy link
Collaborator Author

zpoint commented Feb 24, 2026

/smoke-test

1 similar comment
@zpoint
Copy link
Collaborator Author

zpoint commented Feb 24, 2026

/smoke-test

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @zpoint! We may need to cherry pick this to the stable release.

@Michaelvll Michaelvll added this to the 0.11.2 milestone Feb 24, 2026
@Michaelvll
Copy link
Collaborator

/smoke-test --kubernetes
/smoke-test --gcp
/quicktest-core --base-branch releases/0.11.1

@ggilley
Copy link
Contributor

ggilley commented Feb 24, 2026

Confirmed that this fixes the issue in 0.11.1 (we patched it as a preDeployHook)

Great detective work!

@lloyd-brown lloyd-brown merged commit 5296b59 into skypilot-org:master Feb 24, 2026
24 of 25 checks passed
lloyd-brown pushed a commit that referenced this pull request Feb 24, 2026
…ms (#8904)

uv 0.10.5 introduced a regression where its reflink/clone link mode
strips execute permissions from wheel data files on XFS filesystems.
This causes Ray's gcs_server and raylet binaries to be installed as
664 instead of 775, breaking ray start with PermissionError on
Amazon Linux 2023 (which uses XFS).

Set UV_LINK_MODE=copy in all uv invocations to bypass the broken
reflink code path. This is forward-compatible and has negligible
performance impact since copy mode is already the default on ext4.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
lloyd-brown pushed a commit that referenced this pull request Feb 24, 2026
…ms (#8904)

uv 0.10.5 introduced a regression where its reflink/clone link mode
strips execute permissions from wheel data files on XFS filesystems.
This causes Ray's gcs_server and raylet binaries to be installed as
664 instead of 775, breaking ray start with PermissionError on
Amazon Linux 2023 (which uses XFS).

Set UV_LINK_MODE=copy in all uv invocations to bypass the broken
reflink code path. This is forward-compatible and has negligible
performance impact since copy mode is already the default on ext4.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
lloyd-brown pushed a commit that referenced this pull request Feb 24, 2026
…ms (#8904)

uv 0.10.5 introduced a regression where its reflink/clone link mode
strips execute permissions from wheel data files on XFS filesystems.
This causes Ray's gcs_server and raylet binaries to be installed as
664 instead of 775, breaking ray start with PermissionError on
Amazon Linux 2023 (which uses XFS).

Set UV_LINK_MODE=copy in all uv invocations to bypass the broken
reflink code path. This is forward-compatible and has negligible
performance impact since copy mode is already the default on ext4.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@lloyd-brown lloyd-brown mentioned this pull request Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot Launch Tasks On RunPod due to Ray GCS Server Permission Issue

4 participants