[BugFix] FS Offloading: Fallback from O_DIRECT#43674
[BugFix] FS Offloading: Fallback from O_DIRECT#43674varun-sundar-rabindranath wants to merge 2 commits into
Conversation
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> cleanup changes Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
There was a problem hiding this comment.
Code Review
This pull request refactors file I/O operations in the KV offload tiering module to support a fallback mechanism when O_DIRECT is not supported or fails. It introduces helper functions for reading, writing, and executing operations with fallback logic. The reviewer pointed out a potential issue in the file writing cleanup logic, where a temporary file might be removed before its file descriptor is closed, and suggested restructuring the try-except-finally blocks to ensure proper closing order.
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
|
Thanks for the report @varun-sundar-rabindranath ! |
Thanks @orozery . I agree that taking the fallback path is bad / undesirable for performance. I'll put up a PR to guarantee that SharedOffloadRegion always aligns the memory to page-size bytes. I'd argue for adding this fallback anyways as,
what do you think ? |
|
I found that the root cause of the test failure is unaligned test tensor in test_fs_tier.py - I have a fix for that here #43689 PTAL 🙌 I think this fallback is still relevant though. |
Personally, I believe in enforcing hard assumptions which are easier to discover, then having a fallback which will be harder to notice when it occurs. |
Sounds good. Thanks for taking a look 🙌 |
Purpose
VLLM_LOGGING_LEVEL=DEBUG pytest -s tests/v1/kv_offload/test_fs_tier.pysometimes fails when running locally.It fails with
from
vllm/v1/kv_offload/tiering/fs/io.pyIssue
On closer inspection, I see that the error is coming from
os.writedue buffer alignment violations when using O_DIRECT.Fix
Fallback to
os.openwithout O_DIRECT when using O_DIRECT fails.Test Plan
VLLM_LOGGING_LEVEL=DEBUG pytest -s tests/v1/kv_offload/test_fs_tier.pyTest Result
Pass reliably