Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safetensors loading uses mmap with multiple processes sharing the same fd cause slow gcsfuse performance #10280

Closed
wlhee opened this issue Dec 18, 2024 · 4 comments · Fixed by #10305
Labels
bug Something isn't working

Comments

@wlhee
Copy link

wlhee commented Dec 18, 2024

Describe the bug

When I use StableDiffusionPipeline.from_single_file to load a safetensors model, I noticed that the loading speed is extremely slow when the file is loaded from GCSFuse (https://cloud.google.com/storage/docs/cloud-storage-fuse/overview).

The reason is that the loader creates multiple processes but they all share the same fd and its file handle. As each process reads different offset of the file, it makes the GCSFuse perform really badly because those reads appear to be random read jumping between offsets. For example:

connection.go:420] <- ReadFile (inode 2, PID 77, handle 1, offset 529453056, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 78, handle 1, offset 531812352, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 79, handle 1, offset 534171648, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 50, handle 1, offset 527351808, 4096 bytes)

The question I have is why the loading multiple processes share the same fd in the first place? As mmap is already used, even the multiple processes don't share the same fd, the kernel will still map the virtual memory for each process back to the same the page cache naturally, so there is no need to share the fd across the fd.

If they don't share the fd, GCSFuse will perform much better. Therefore, can we disable the fd sharing?

Reproduction

Simply using GCSFuse to serve a file to StableDiffusionPipeline.from_single_file

Logs

No response

System Info

N/A

Who can help?

@yiyixuxu @asomoza

@wlhee wlhee added the bug Something isn't working label Dec 18, 2024
@sayakpaul
Copy link
Member

Cc: @DN6 for single file loading.

@DN6
Copy link
Collaborator

DN6 commented Dec 18, 2024

@wlhee File loading is handled though the safetensors API. Are you seeing the same issue when you try something like

import safetensors

state_dict = safetensors.torch.load_file(<file_path>)

@wlhee
Copy link
Author

wlhee commented Dec 19, 2024

Surprisingly, when I just tested safetensors.torch.load_file(<file_path>), I didn't see multiple processes. I only see one process (single PID) sequentially read the file, which is pretty fast as GCSFuse is optimized for this sequential read pattern:

Screenshot 2024-12-18 at 10 48 10 PM

I also tried safetensors.torch.load(open(<file_path>, 'rb').read()) as described in comfyanonymous/ComfyUI#1992 (comment). It doesn't show much difference.

@danhipke
Copy link
Contributor

Opened #10305 to address.

The improvement reduces model loading from 16 minutes -> <1 min for a 7.2GB model mounted on a network mount using gcsfuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants