Skip to content

Conversation

@kevinmingtarja
Copy link
Collaborator

@kevinmingtarja kevinmingtarja commented Oct 31, 2025

Failing on master: https://buildkite.com/skypilot-1/smoke-tests/builds/5076/steps/canvas?sid=019a3b52-536d-4eeb-affb-8b0f0ec808df

Related to #7531

In get_az_mount_install_cmd, we install fuse3, libfuse3-3, libfuse3-dev. This is a problem when running in k8s, because we symlink fusermount and fusermount3 to our fusermount-shim (see https://github.com/skypilot-org/skypilot/tree/master/addons/fuse-proxy for details):

{% if k8s_fuse_device_required %}
set -e
# Mask fusermount binary before enabling SSH access
FUSERMOUNT_PATH=$(which fusermount)
if [ -z "$FUSERMOUNT_PATH" ]; then
echo "Error: fusermount binary not found"
exit 1
fi
$(prefix_cmd) cp -p "$FUSERMOUNT_PATH" "${FUSERMOUNT_PATH}-original"
$(prefix_cmd) ln -sf {{k8s_fusermount_shared_dir}}/fusermount-shim "$FUSERMOUNT_PATH"
# "|| true" because fusermount3 is not always available
FUSERMOUNT3_PATH=$(which fusermount3) || true
if [ -z "$FUSERMOUNT3_PATH" ]; then
FUSERMOUNT3_PATH="${FUSERMOUNT_PATH}3"
fi

And re-installing fuse3 will overwrite this symlink.

To illustrate what happens before and after get_az_mount_install_cmd:

# Initially, points to the shim
(base) sky@sky-b27a-kevin-7a2eebbf-head:~$ fusermount -V
fusermount3-shim version: 0.1.0
(base) sky@sky-b27a-kevin-7a2eebbf-head:~$ sudo apt-get update
...
(base) sky@sky-b27a-kevin-7a2eebbf-head:~$ sudo apt-get install fuse3 libfuse3-dev -y
...
# Overwritten, now points to the real fuse3 binary
(base) sky@sky-b27a-kevin-7a2eebbf-head:~$ fusermount -V
fusermount3 version: 3.10.3

This will lead to an error the next time we try to call fusermount:

fusermount: failed to open /dev/fuse: Operation not permitted

Because our skypilot pods by design do not have root privileges, it has to rely on the shim to talk to the fusermount-server (which is the one that has the privileges).

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

It's failing with:

  | 2025-10-31 12:13:38 PDT | Mount path /mount_private_mount does not exist. Creating...
  | 2025-10-31 12:13:38 PDT | Mounting  to /mount_private_mount with goofys...
  | 2025-10-31 12:13:38 PDT | Mount failed with exit code 1.
  | 2025-10-31 12:13:38 PDT | Looking for goofys log files...
  | 2025-10-31 12:13:38 PDT | === Goofys log file contents ===
  | 2025-10-31 12:13:38 PDT | 2025/10/31 19:13:36.839566 main.FATAL Mounting file system: Mount: mount: running fusermount: exit status 1
  | 2025-10-31 12:13:38 PDT |  
  | 2025-10-31 12:13:38 PDT | stderr:
  | 2025-10-31 12:13:38 PDT | fusermount: failed to open /dev/fuse: Operation not permitted
  | 2025-10-31 12:13:38 PDT |  
  | 2025-10-31 12:13:38 PDT | 2025/10/31 19:13:37.839802 main.FATAL Unable to mount file system, see syslog for details
  | 2025-10-31 12:13:38 PDT | === End of goofys log file ===
  | 2025-10-31 12:13:38 PDT | command terminated with exit code 1

I think it could be specific to our test infra's k8s setup. Because the test runs fine when ran against a real k8s cluster.

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

4 similar comments
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja kevinmingtarja force-pushed the fix-docker-storage-mount branch from 4c7e982 to c643ac6 Compare October 31, 2025 22:35
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@kevinmingtarja kevinmingtarja changed the title [DEBUG] test_docker_storage_mounts failure on kubernetes [Storage] Fix Azure mount cmd to not re-install fuse3 when fusermount-shim is used Oct 31, 2025
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k storage --aws
/smoke-test -k storage --gcp
/smoke-test -k storage --kubernetes

@kevinmingtarja kevinmingtarja marked this pull request as ready for review October 31, 2025 23:53
@kevinmingtarja kevinmingtarja changed the title [Storage] Fix Azure mount cmd to not re-install fuse3 when fusermount-shim is used [Storage] Fix Azure mount install cmd to not reinstall fuse3 when fusermount-shim is used Oct 31, 2025
Copy link
Collaborator

@lloyd-brown lloyd-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, glad we are also adding logs that will help a lot if there are future issues! Thanks @kevinmingtarja !

Comment on lines +617 to +636
MOUNT_EXIT_CODE=$?
set -e
if [ $MOUNT_EXIT_CODE -ne 0 ]; then
echo "Mount failed with exit code $MOUNT_EXIT_CODE."
if [ "$MOUNT_BINARY" = "goofys" ]; then
echo "Looking for goofys log files..."
# Find goofys log files in /tmp (created by mktemp -t goofys.XXXX.log)
# Note: if /dev/log exists, goofys logs to syslog instead of a file
GOOFYS_LOGS=$(ls -t /tmp/goofys.*.log 2>/dev/null | head -1)
if [ -n "$GOOFYS_LOGS" ]; then
echo "=== Goofys log file contents ==="
cat "$GOOFYS_LOGS"
echo "=== End of goofys log file ==="
else
echo "No goofys log file found in /tmp"
fi
fi
# TODO(kevin): Print logs from rclone, etc too for observability.
exit $MOUNT_EXIT_CODE
fi
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: before this, our logs did not give any useful debugging info, only mentions the exit code.

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

1 similar comment
@kevinmingtarja
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@lloyd-brown lloyd-brown merged commit b715574 into master Nov 1, 2025
23 of 24 checks passed
@lloyd-brown lloyd-brown deleted the fix-docker-storage-mount branch November 1, 2025 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants