Skip to content

[BugFix] FS Offloading: Fallback from O_DIRECT#43674

Closed
varun-sundar-rabindranath wants to merge 2 commits into
vllm-project:mainfrom
neuralmagic:varun/try-odirect
Closed

[BugFix] FS Offloading: Fallback from O_DIRECT#43674
varun-sundar-rabindranath wants to merge 2 commits into
vllm-project:mainfrom
neuralmagic:varun/try-odirect

Conversation

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor

Purpose

VLLM_LOGGING_LEVEL=DEBUG pytest -s tests/v1/kv_offload/test_fs_tier.py sometimes fails when running locally.
It fails with

FileSystemTierManagerPython: job 1 block I/O failed: [Errno 22] Invalid argument

from vllm/v1/kv_offload/tiering/fs/io.py

Issue

On closer inspection, I see that the error is coming from os.write due buffer alignment violations when using O_DIRECT.

Fix

Fallback to os.open without O_DIRECT when using O_DIRECT fails.

Test Plan

VLLM_LOGGING_LEVEL=DEBUG pytest -s tests/v1/kv_offload/test_fs_tier.py

Test Result

Pass reliably

Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>

cleanup changes

Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
@mergify mergify Bot added v1 bug Something isn't working labels May 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors file I/O operations in the KV offload tiering module to support a fallback mechanism when O_DIRECT is not supported or fails. It introduces helper functions for reading, writing, and executing operations with fallback logic. The reviewer pointed out a potential issue in the file writing cleanup logic, where a temporary file might be removed before its file descriptor is closed, and suggested restructuring the try-except-finally blocks to ensure proper closing order.

Comment thread vllm/v1/kv_offload/tiering/fs/io.py Outdated
@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

cc @rshavitt @orozery PTAL! Thanks 🙌

Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 26, 2026

Thanks for the report @varun-sundar-rabindranath !
I feel the fallback solution is not the right approach.
I think we should instead assure that the written buffers are aligned.
This is what Claude suggested:

Pad _row_stride to a 4096 multiple at allocation time in SharedOffloadRegion. Since the mmap base is page-aligned and
  every offset is bid * row_stride, making row_stride 4096-aligned guarantees all I/O buffers are aligned. The size written is
  block_size = row_stride, so that's also aligned. This costs a tiny bit of wasted space per block row.

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

varun-sundar-rabindranath commented May 26, 2026

Thanks for the report @varun-sundar-rabindranath ! I feel the fallback solution is not the right approach. I think we should instead assure that the written buffers are aligned. This is what Claude suggested:

Pad _row_stride to a 4096 multiple at allocation time in SharedOffloadRegion. Since the mmap base is page-aligned and
  every offset is bid * row_stride, making row_stride 4096-aligned guarantees all I/O buffers are aligned. The size written is
  block_size = row_stride, so that's also aligned. This costs a tiny bit of wasted space per block row.

Thanks @orozery . I agree that taking the fallback path is bad / undesirable for performance. I'll put up a PR to guarantee that SharedOffloadRegion always aligns the memory to page-size bytes.

I'd argue for adding this fallback anyways as,

  • Different filesystems can have different buffer boundary and size alignment requirements.
  • I feel alignment should not be a hard requirement for this utility. It should technically work for any buffer alignment and size.
  • Guaranteeing these requirements will be harder as the FS offloader starts supporting HMA and more complicated offloading (more complicated == offloading arbitrary buffer sizes). [edit : This should probably be fine given that each row in the CPU tensor view should be block and that requirement is enforced already]
  • I have added debug prints (I can change it to warn) that we could monitor and attempt fixes retroactively.

what do you think ?

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

I found that the root cause of the test failure is unaligned test tensor in test_fs_tier.py - I have a fix for that here #43689 PTAL 🙌

I think this fallback is still relevant though.

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 26, 2026

  • Different filesystems can have different buffer boundary and size alignment requirements.
    (From Claude): Linux O_DIRECT alignment is determined by the logical block size of the block device
    (/sys/block/*/queue/logical_block_size). It's always 512 or 4096. Padding to 4096 covers all cases.
  • I feel alignment should not be a hard requirement for this utility. It should technically work for any buffer alignment and size.

Personally, I believe in enforcing hard assumptions which are easier to discover, then having a fallback which will be harder to notice when it occurs.
We already have an option to disable O_DIRECT completely.
We should correctly determine upfront if we need O_DIRECT or not.

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

  • Different filesystems can have different buffer boundary and size alignment requirements.
    (From Claude): Linux O_DIRECT alignment is determined by the logical block size of the block device
    (/sys/block/*/queue/logical_block_size). It's always 512 or 4096. Padding to 4096 covers all cases.
  • I feel alignment should not be a hard requirement for this utility. It should technically work for any buffer alignment and size.

Personally, I believe in enforcing hard assumptions which are easier to discover, then having a fallback which will be harder to notice when it occurs. We already have an option to disable O_DIRECT completely. We should correctly determine upfront if we need O_DIRECT or not.

Sounds good. Thanks for taking a look 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants