Skip to content

v4.0: feat: validate account / snapshot paths for direct-io capability (backport of #10957)#11122

Merged
kskalski merged 1 commit into
v4.0from
mergify/bp/v4.0/pr-10957
Mar 11, 2026
Merged

v4.0: feat: validate account / snapshot paths for direct-io capability (backport of #10957)#11122
kskalski merged 1 commit into
v4.0from
mergify/bp/v4.0/pr-10957

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Mar 9, 2026

Problem

Enabling direct I/O for accounts-db creation from snapshot archive (or other snapshot ops in the future) maybe cause cryptic error surfaced to the user.
We could add a bit of code to detect if direct I/O is supported for paths we expect to use it on and signal more descriptive error.

Summary of Changes

  • add metadata module in agave-fs that detects direct I/O support by trying to find any file under specified path and check it using two approaches:
    • statx(STATX_DIOALIGN) - not that not all kernels validator could run on support this syscall, additionally, not all filesystems that do support direct I/O actually implement this syscall flag correctly (e.g. tmpfs), so it is usually not enough to use outcome from statx and call it a day
    • opening a file without and with direct I/O, signaling lack of support only when the former succeeds and latter doesn't
  • add accounts-db util function that checks accounts-db config and ensure direct-io is (if enabled) supported for path types we want it to be - otherwise return a targeted error message mentioning opt-out flag
  • call the validation function from validator and ledger-tool

This PR enables switching direct-io on for snapshot operations by default in #10790


This is an automatic backport of pull request #10957 done by Mergify.

)

* feat(fs): add metadata module for queryng fs capabilities

* feat: validate account / snapshot paths for direct-io capability

(cherry picked from commit a4b6f4f)
@mergify mergify Bot requested a review from a team as a code owner March 9, 2026 14:43
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.51515% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.0%. Comparing base (040a5b0) to head (798d280).

Additional details and impacted files
@@           Coverage Diff            @@
##             v4.0   #11122    +/-   ##
========================================
  Coverage    83.0%    83.0%            
========================================
  Files         837      838     +1     
  Lines      316477   316606   +129     
========================================
+ Hits       262832   262967   +135     
+ Misses      53645    53639     -6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread accounts-db/src/utils.rs
format!(
"direct-io (O_DIRECT) is not supported for paths `{paths_str}`. Ensure the \
filesystem hosting that path supports direct-io, or disable direct-io with \
--no-accounts-db-snapshots-direct-io flag.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this flag a development stopgap? something more like assume direct io everywhere, fatal if it fails by default and a process-wide --allow-no-direct-io that converts it to a warning, would prevent config complexity explosion

Copy link
Copy Markdown

@kskalski kskalski Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things happening here:

  • the flag is thought as escape-hatch for unexpected, but possible issues with direct-io, especially for old kernels + exotic file-systems - those could manifest as outright failure to even open files (the code in this module attempts to detect that) or more subtle issues during runtime (performance, random errors during heavy IO, hopefully no corruption)
  • the file-systems that properly reject direct-io do so as InvalidInput / "invalid argument" IO error during file open, that might be fine to interpret if you know that new version enabled direct-io in the specific area, but we decided it's better to emit more informational message for less informed operators

Given above, I think the schema:

  • statically control if direct-io is enabled through flags
  • if it is enabled, but we are able to prove it's not supported, fail with good message
  • otherwise use direct-io or fail with less useful errors

is quite fine.

Once we eliminate the need for opt-out from direct-io, e.g. after it's used in 1-2 releases in regularly run parts of the validator, we could eliminate the config options. I'm not sure how do you envision having both --allow-no-direct-io and avoiding config explosion - ability to opt-out implies wiring up the config everywhere.

Comment thread core/src/validator.rs
*start_progress.write().unwrap() = ValidatorStartProgress::CleaningAccounts;
let mut timer = Measure::start("clean_accounts_paths");
let mut timer = Measure::start("validate_and_clean_accounts_paths");
validate_account_paths(config)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think ideally this would be done before Validator::new, but current init is a huge mess and i don't expect you to fix it here/first

@t-nelson
Copy link
Copy Markdown

@kskalski can you tag in a sme?

@kskalski kskalski merged commit 7df341c into v4.0 Mar 11, 2026
50 checks passed
@kskalski kskalski deleted the mergify/bp/v4.0/pr-10957 branch March 11, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants