v4.0: feat: validate account / snapshot paths for direct-io capability (backport of #10957)#11122
Conversation
) * feat(fs): add metadata module for queryng fs capabilities * feat: validate account / snapshot paths for direct-io capability (cherry picked from commit a4b6f4f)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v4.0 #11122 +/- ##
========================================
Coverage 83.0% 83.0%
========================================
Files 837 838 +1
Lines 316477 316606 +129
========================================
+ Hits 262832 262967 +135
+ Misses 53645 53639 -6 🚀 New features to boost your workflow:
|
| format!( | ||
| "direct-io (O_DIRECT) is not supported for paths `{paths_str}`. Ensure the \ | ||
| filesystem hosting that path supports direct-io, or disable direct-io with \ | ||
| --no-accounts-db-snapshots-direct-io flag.", |
There was a problem hiding this comment.
is this flag a development stopgap? something more like assume direct io everywhere, fatal if it fails by default and a process-wide --allow-no-direct-io that converts it to a warning, would prevent config complexity explosion
There was a problem hiding this comment.
There are two things happening here:
- the flag is thought as escape-hatch for unexpected, but possible issues with direct-io, especially for old kernels + exotic file-systems - those could manifest as outright failure to even open files (the code in this module attempts to detect that) or more subtle issues during runtime (performance, random errors during heavy IO, hopefully no corruption)
- the file-systems that properly reject direct-io do so as
InvalidInput/ "invalid argument" IO error during file open, that might be fine to interpret if you know that new version enabled direct-io in the specific area, but we decided it's better to emit more informational message for less informed operators
Given above, I think the schema:
- statically control if direct-io is enabled through flags
- if it is enabled, but we are able to prove it's not supported, fail with good message
- otherwise use direct-io or fail with less useful errors
is quite fine.
Once we eliminate the need for opt-out from direct-io, e.g. after it's used in 1-2 releases in regularly run parts of the validator, we could eliminate the config options. I'm not sure how do you envision having both --allow-no-direct-io and avoiding config explosion - ability to opt-out implies wiring up the config everywhere.
| *start_progress.write().unwrap() = ValidatorStartProgress::CleaningAccounts; | ||
| let mut timer = Measure::start("clean_accounts_paths"); | ||
| let mut timer = Measure::start("validate_and_clean_accounts_paths"); | ||
| validate_account_paths(config)?; |
There was a problem hiding this comment.
i think ideally this would be done before Validator::new, but current init is a huge mess and i don't expect you to fix it here/first
|
@kskalski can you tag in a sme? |
Problem
Enabling direct I/O for accounts-db creation from snapshot archive (or other snapshot ops in the future) maybe cause cryptic error surfaced to the user.
We could add a bit of code to detect if direct I/O is supported for paths we expect to use it on and signal more descriptive error.
Summary of Changes
metadatamodule inagave-fsthat detects direct I/O support by trying to find any file under specified path and check it using two approaches:statx(STATX_DIOALIGN)- not that not all kernels validator could run on support this syscall, additionally, not all filesystems that do support direct I/O actually implement this syscall flag correctly (e.g. tmpfs), so it is usually not enough to use outcome fromstatxand call it a dayThis PR enables switching direct-io on for snapshot operations by default in #10790
This is an automatic backport of pull request #10957 done by Mergify.