Skip to content

feat: validate account / snapshot paths for direct-io capability#10957

Merged
kskalski merged 2 commits into
anza-xyz:masterfrom
kskalski:ks/check_direcT_io
Mar 6, 2026
Merged

feat: validate account / snapshot paths for direct-io capability#10957
kskalski merged 2 commits into
anza-xyz:masterfrom
kskalski:ks/check_direcT_io

Conversation

@kskalski
Copy link
Copy Markdown

@kskalski kskalski commented Mar 4, 2026

Problem

Enabling direct I/O for accounts-db creation from snapshot archive (or other snapshot ops in the future) maybe cause cryptic error surfaced to the user.
We could add a bit of code to detect if direct I/O is supported for paths we expect to use it on and signal more descriptive error.

Summary of Changes

  • add metadata module in agave-fs that detects direct I/O support by trying to find any file under specified path and check it using two approaches:
    • statx(STATX_DIOALIGN) - not that not all kernels validator could run on support this syscall, additionally, not all filesystems that do support direct I/O actually implement this syscall flag correctly (e.g. tmpfs), so it is usually not enough to use outcome from statx and call it a day
    • opening a file without and with direct I/O, signaling lack of support only when the former succeeds and latter doesn't
  • add accounts-db util function that checks accounts-db config and ensure direct-io is (if enabled) supported for path types we want it to be - otherwise return a targeted error message mentioning opt-out flag
  • call the validation function from validator and ledger-tool

This PR enables switching direct-io on for snapshot operations by default in #10790

@kskalski kskalski force-pushed the ks/check_direcT_io branch 2 times, most recently from 056a53c to 0be046e Compare March 4, 2026 08:06
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 76.51515% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.0%. Comparing base (1f99614) to head (7e082d1).
⚠️ Report is 44 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #10957     +/-   ##
=========================================
- Coverage    83.0%    83.0%   -0.1%     
=========================================
  Files         835      838      +3     
  Lines      317172   317333    +161     
=========================================
+ Hits       263429   263548    +119     
- Misses      53743    53785     +42     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread fs/src/metadata.rs
};
// statx with STATX_DIOALIGN is the preferred check, but libc does not expose
// statx on musl (requires musl >= 1.2.3), so skip it there.
#[cfg(not(target_env = "musl"))]
Copy link
Copy Markdown
Author

@kskalski kskalski Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fails CI because libc gates usage of many features by some kind of unstable enablement env var - since we build without that flag, I disabled it... I guess it's possible to get this working correctly, but needs more research.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed: this could work if build on alpine / with musl used RUST_LIBC_UNSTABLE_MUSL_V1_2_3: true, I guess we don't want to force that just for the purpose of this call, which we have a fallback for

Comment thread core/src/validator.rs
}
if let Some(shrink_paths) = &config.accounts_db_config.shrink_paths {
for shrink_path in shrink_paths {
move_and_async_delete_path_contents(shrink_path);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code in move_and_async_delete_path_contents and it also does create path at the end of the function... Seems like that create:

  • doesn't belong to the function deleting the contents and should be moved out to the call site / other helper
  • in fact it seems it better fits in the new validate_account_paths - I will refactor this in separate PR if that sounds good

@kskalski kskalski marked this pull request as ready for review March 4, 2026 08:46
@kskalski kskalski requested review from brooksprumo and roryharr March 4, 2026 08:46
Comment thread accounts-db/src/utils.rs Outdated
@roryharr
Copy link
Copy Markdown

roryharr commented Mar 4, 2026

Awesome, thanks for putting up with my madness.

roryharr
roryharr previously approved these changes Mar 4, 2026
Copy link
Copy Markdown

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accounts-db/validator/ledger-tool changes look good to me

@kskalski kskalski requested a review from alessandrod March 5, 2026 00:08
alessandrod
alessandrod previously approved these changes Mar 6, 2026
Copy link
Copy Markdown

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, see nits

Comment thread fs/src/metadata.rs Outdated
#[cfg(target_os = "linux")]
pub fn check_direct_io_capability(path: impl AsRef<Path>) -> io::Result<Option<bool>> {
let Some(file) = find_any_file_under_path(path.as_ref())? else {
return Ok(None);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this uncertain condition, couldn't we create a file and then stat it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an additional way to check - I think some of the passed paths could in fact be read-only (e.g. someone runs ledger-tool on some read-only view to avoid data updates) or even if they are not ideally we should not mess with the dir, so it's preferrable to first try reading.

Maybe if the dir is empty (so exactly this branch) the write check is more plausible - in that case supposedly nobody cares if we create tmp file there.

Hm.. I will add this as separate PR, as it will make the check slightly more intrusive (otherwise I hope to BP this one).

Comment thread fs/src/metadata.rs Outdated
/// Returns whether `path` (a file or directory) resides on a filesystem that supports
/// direct I/O (`O_DIRECT`).
///
/// Returns `Ok(Some(true))` if direct I/O is supported, `Ok(Some(false))` if it is not,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should probably return an ad hoc enum instead of requiring the caller
to do boolean logic between Option and bool 😂

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right... done :)

Comment thread fs/src/metadata.rs
@kskalski kskalski added this pull request to the merge queue Mar 6, 2026
Merged via the queue into anza-xyz:master with commit a4b6f4f Mar 6, 2026
51 checks passed
@kskalski kskalski deleted the ks/check_direcT_io branch March 6, 2026 16:27
@kskalski kskalski added the v4.0 Backport to v4.0 branch label Mar 9, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented Mar 9, 2026

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify Bot pushed a commit that referenced this pull request Mar 9, 2026
)

* feat(fs): add metadata module for queryng fs capabilities

* feat: validate account / snapshot paths for direct-io capability

(cherry picked from commit a4b6f4f)
kskalski added a commit that referenced this pull request Mar 11, 2026
…y (backport of #10957) (#11122)

feat: validate account / snapshot paths for direct-io capability (#10957)

* feat(fs): add metadata module for queryng fs capabilities

* feat: validate account / snapshot paths for direct-io capability

(cherry picked from commit a4b6f4f)

Co-authored-by: Kamil Skalski <kamil.skalski@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v4.0 Backport to v4.0 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants