Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve crash consistency on Linux #8815

Merged
merged 8 commits into from
Jan 3, 2024
Merged

Conversation

basil
Copy link
Member

@basil basil commented Jan 1, 2024

Problem

See this talk (transcript):

People seem to think that this is safe because the POSIX spec says that rename is atomic, but that only means rename is atomic with respect to normal operation, that doesn't mean it's atomic on crash.

Solution

While the optimal solution would involve implementing an undo log / rollback journal as described in the talk or in SQLite, this would be a drastic redesign of this core subsystem.

Rather, without breaking compatibility or changing the existing design we can fsync the parent directory after fsyncing the new file. This ensures the parent directory's inode is flushed to disk with the new directory entry for the new file. This reduces the impact of a crash. Currently, we fsync the new file itself, but the directory entry in the parent directory is not necessarily persisted to disk until some arbitrary amount of time passes, and if we crash during this window then we could (in some filesystem/mount option combinations) get the old version of the file when we come back up, though we may have taken other actions in the meantime that assume the new file has been persisted, thus leading to inconsistent state. This PR fsyncs the parent directory right away, reducing the window of time where a crash could have negative consequences to the period of time between the fsync of the file and the fsync of its parent directory, which is now a much smaller and bounded amount of time.

This is also the approach taken by default in Sendmail on Linux in https://github.com/Distrotech/sendmail/blob/547129475fc1db35ae9b893a4782884c68b182fb/sendmail/deliver.c#L933-L986 whose README contains the following:

REQUIRES_DIR_FSYNC      Turn on support for file systems that require to
                call fsync() for a directory if the meta-data in it has
                been changed.  This should be turned on at least for older
                versions of ReiserFS; it is enabled by default for Linux.
                According to some information this flag is not needed
                anymore for kernel 2.4.16 and newer.  We would appreciate
                feedback about the semantics of the various file systems
                available for Linux.

This correctly notes that this second fsync isn't required in all combinations of filesystems and mount options but that it is (or was) required in some of them, and to be safe from a data correctness perspective it is enabled by default. In this PR we do the same in Jenkins. In general, filesystem behavior in this regard is poorly documented and changes frequently over time, so it is best to take the safest approach by default and only allow users to pick a more dangerous approach if they are confident their filesystem configuration can tolerate it.

More to the point, the author of ext3 explicitly states the need to fsync the parent directory on Linux in this post:

The squence above exactly provides desired “atomicity without durability”. It doesn’t guarantee which version of the file will appear in the event of an unexpected crash; if the application needs a guarantee that the new version of the file will be present after a crash, it’s necessary to fsync the containing directory.

The performance impact should be negligible for local disk users. Users of pathologically slow NFS servers might have a problem, in which case an escape hatch has been added to restore the old behavior at the price of decreased crash consistency. If accepted, I will update this page of the documentation.

Testing done

Confirmed after this PR with bpftrace's syncsnoop and opensnoop that the parent directory was being fsynced as expected. Confirmed that the old behavior was restored when setting the escape hatch to false.

Proposed changelog entries

  • Reduce the window of time during which a crash may lead to inconsistent state on Linux.

Proposed upgrade guidelines

N/A

Submitter checklist

Desired reviewers

@mention

Before the changes are marked as ready-for-merge:

Maintainer checklist

@basil basil added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Jan 1, 2024
@basil basil changed the title Improve crash consistency Improve crash consistency on Linux Jan 1, 2024
Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the clear description of the tested cases and the areas of concern.

Copy link
Member

@NotMyFault NotMyFault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is nice. Thanks for including the technical details!

Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Jan 2, 2024
@basil basil added the squash-merge-me Unclean or useless commit history, should be merged only with squash-merge label Jan 2, 2024
@NotMyFault NotMyFault merged commit 444f2de into jenkinsci:master Jan 3, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted squash-merge-me Unclean or useless commit history, should be merged only with squash-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants