Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the recommended File IO API usage pattern with regards to ext4 delayed allocations? #32600

Open
Tragetaschen opened this issue Feb 20, 2020 · 11 comments
Assignees
Labels
area-System.IO os-linux Linux OS (any supported distro) question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@Tragetaschen
Copy link
Contributor

Tragetaschen commented Feb 20, 2020

I have an embedded Linux platform with some flash storage where the file system is ext4 with defaut mount options. I'm running an ASP.NET Core web server with server-side Blazor and this per default stores encryption keys as XML file in the home directory. Recently, I got a system where that XML file had zero length and my application crashed when establishing the Blazor connection.

For reference, the code to store the XML file is here:
https://github.com/dotnet/aspnetcore/blob/v3.1.1/src/DataProtection/DataProtection/src/Repositories/FileSystemXmlRepository.cs#L135-L161

The above code synchronously writes to the file and then renames it from a temporary name to the final name. My file system contained the final file name, but with zero length. The ext4 default options include data=ordered so metadata should be written after file contents, but delayed allocations are active. In this case, the zero length file is THE sign for missing fsync(2)/fdatasync(2) invocations. For reference, here is a LWN article describing what looks to be the issue: https://lwn.net/Articles/322823/

Being an embedded system instead of a proper shut down, power just gets cut frequently and anything not persisted on disk is (expected to be) lost, but the above code pattern properly disposes, so I expect any of three states: no file, a temporary file with contents, or the final file with contents. For example, I never had issues with the SQLite database (except when the file system journaling was disabled by accident…)

Is the above code snippet recommended in combination with ext4 delayed allocations?
Is everything properly fsync'ed in that code path?
Is there a missing pit of success here?

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 20, 2020
@wfurt wfurt added the os-linux Linux OS (any supported distro) label Feb 26, 2020
@wfurt
Copy link
Member

wfurt commented Feb 26, 2020

you can do tempFileStream.Flush(true); With that, you should see fsync() with strace on the underlying file descriptor. This is not done by default as that is pretty expensive operation but I think it does exactly what you want.

@Tragetaschen
Copy link
Contributor Author

I could certainly call Flush(true) to fix this if it was my code, but the mentioned library example doesn't currently and I cannot fix/change this from within my application. This example uses the XElement.Save(string fileName) convenience API, which only ever calls Flush() which is Flush(false). Heck, the entire StreamWriter implementation or File.WriteAllText only ever call Flush().

Beyond these examples I would expect many, many more libraries are omitting calling Flush(true) without a way to intervene. Every such usage is in violation to POSIX guarantees as kernel/file system developers would argue. They all can result in correct metadata and missing file contents when the system crashes / loses power.

FlushAsync() on the other hand always calls down to fsync unconditionally. In that sense, the async code path is always "pretty expensive", certainly more expensive beyond the expected async overhead. That makes it hard to buy into this argument.

Given my environment, I'm willing to pay the "expensive" part to make sure the weird limbo state between no file and a correct file with contents doesn't occur—especially as it avoids bricked embedded instrumentation. If the status quo has to remain (I really think it shouldn't), could there be an opt-in mechanism to make Flush always behave as Flush(true) on FileStream?

For the above library, I have moved its storage folder to a tmpfs as I don't need the file persistent. This of course is a (lucky) special case. For the status quo to be on the safe side it looks like I have to turn off delayed allocations. This is of course an important performance optimization and helps avoiding file system fragmentation in ext4…

@JeremyKuhne JeremyKuhne added this to the Future milestone Mar 5, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 5, 2020
@JeremyKuhne
Copy link
Member

Triage: When making an async flush the cost of the call is going to happen in the background so it isn't equal to the cost of calling Flush(true). You can see the original code comment on this here. Making Flush() always flushToDisk isn't a change we're likely to make.

@carlossanlop carlossanlop added untriaged New issue has not been triaged by the area owner and removed untriaged New issue has not been triaged by the area owner labels Mar 5, 2020
@carlossanlop carlossanlop added the question Answer questions and provide assistance, not an issue with source code or documentation. label Mar 27, 2020
@adamsitnik adamsitnik self-assigned this Jul 19, 2021
@adamsitnik
Copy link
Member

I've taken a closer look at the issue and the referenced blog post (https://lwn.net/Articles/322823/).

I really like this comment:

fd = open("file.new",O_TRUNC|O_WRONLY|O_CREAT);
write(fd, "hi", 2);
close(fd);
rename("file.new", "file");

Are we saying that in ext4, the rename can happen many tens-of-seconds before the data "hi" is actually allocated and written?

If the rename can happen before the data is written, and we get an empty file as the end result I am inclined to say that it's not a .NET bug.

The article also mentions the patch that solved the problem:

Finally, this patch forces block allocation when one file is renamed on top of another. This, too, is aimed at the problem of frequently-rewritten small files.

I was curious to see if I can repro it using Ubuntu 18.04 and an ext4 drive with default mount settings (data=ordered):

adam@adsitnik-ubuntu:~/projects/repros/FileMoveExt4$ cat /proc/mounts | grep test_ext4
/dev/sdb1 /media/adam/test_ext4 ext4 rw,nosuid,nodev,relatime,data=ordered 0 0

Using the following app:

using System;
using System.IO;

namespace FileMoveExt4
{
    class Program
    {
        static void Main(string[] args)
        {
            string directoryPath = args[0];
            string srcFilePath = Path.Combine(directoryPath, $"src_{Guid.NewGuid()}");
            string dstFilePath = Path.Combine(directoryPath, $"dst_{Guid.NewGuid()}");
            
            File.WriteAllText(path: srcFilePath, contents: "abc");
            File.Move(srcFilePath, dstFilePath);
            Console.WriteLine($"The file content: \'{File.ReadAllText(dstFilePath)}\'");
        }
    }
}

And I was not able to repro it:

adam@adsitnik-ubuntu:~/projects/repros/FileMoveExt4$ dotnet run -- "/media/adam/test_ext4"
The file content: 'abc'

@Tragetaschen could you please try my repro app? Is there any chance that you are using an old kernel version that does not contain the mentioned patch? Or am I doing something wrong?

@Tragetaschen
Copy link
Contributor Author

For me, it was a one-time only event, but I don't have access to any of the development environments anymore.

The most important bit was, that the embedded instrument just loses power by pulling the cable frequently. Any tests that do not kill the machine will probably work fine because eventually ext4 will of course write everything as expected if there's power to do it.

Given the date of the OP, this was either 4.19.y or 5.4.y (likely the latter - I pretty much migrated at the same time frame). But since the patch is from 2009, its inclusion should be almost guaranteed. But note that the patch first runs the rename operation and only after that allocates the blocks. I don't know if this is still the case, but this would keep the small window open still.

@TheCollegedude
Copy link

TheCollegedude commented Feb 1, 2022

For me, it was a one-time only event, but I don't have access to any of the development environments anymore.

@Tragetaschen I read your comment in 12830
Once a month all our machines running Ubuntu 20.04 have the same problem with corrupted XML files (cryptography )

@adamsitnik
Copy link
Member

We are rather not going to change the current FileStream.Dispose implementation to always force the OS to flush data to disk (it would be too expensive from performance perspective), but I think that we could introduce an environment variable / app config switch that would enable such behavior.

@Tragetaschen @TheCollegedude would that solve the problem for you?

@Tragetaschen
Copy link
Contributor Author

I would argue that not properly fsync'ing is the macOS approach to performance optimization, but I absolutely get that with the breadth of platforms .NET runs on it's too much to ask. :) Optimizing with these very specific semantics in POSIX in some file systems like ext4 has done a lot of damage over the years.

I'm still not sure I understand the performance argument, especially since fsync is part of the async dispose path, but this doesn't matter.

The app config switch sounds like a very viable option here to make sure .NET in its entirety can ensure the necessary POSIX semantics when needed.

@adamsitnik
Copy link
Member

I would argue that not properly fsync'ing is the macOS approach to performance optimization

I would on the other hand expect close to take care of that for me ;)

especially since fsync is part of the async dispose path,

I don't think it is:

public async override ValueTask DisposeAsync()
{
await _strategy.DisposeAsync().ConfigureAwait(false);
Dispose(false);
GC.SuppressFinalize(this);
}

private async Task FlushAsyncInternal(CancellationToken cancellationToken)

public sealed override ValueTask DisposeAsync()
{
if (_fileHandle != null && !_fileHandle.IsClosed)
{
_fileHandle.ThreadPoolBinding?.Dispose();
_fileHandle.Dispose();
}
return ValueTask.CompletedTask;
}

I'm still not sure I understand the performance argument

Let me start my Linux machine and run some benchmarks for you

@Tragetaschen
Copy link
Contributor Author

Tragetaschen commented May 17, 2022

I would argue that not properly fsync'ing is the macOS approach to performance optimization

I would on the other hand expect close to take care of that for me ;)

…and this nicely summarizes the entire argument about what POSIX guarantees and what it doesn't. Yes, this is what most people (sane people?) would expect, but in corner cases like system crashes or power losses, it's possible that certain data has made into persistent storage and some hasn't because close doesn't actually write it - fsync does.

especially since fsync is part of the async dispose path,

I don't think it is:

[…]

I think I found it two years ago, but the code has changed a lot with the introduction of strategies.

I'm still not sure I understand the performance argument

Let me start my Linux machine and run some benchmarks for you

Just make sure: I get that not fsync'ing is faster than fsyncing - no need to show this.

@adamsitnik
Copy link
Member

I get that not fsync'ing is faster than fsyncing - no need to show this.

I wanted to make sure that we know the actual difference.

Benchmark code:

private const string FilePath = "FileUtils.GetTestFilePath.txt";

[GlobalSetup]
public void Setup() => File.WriteAllBytes(FilePath, new byte[1024]);

[GlobalCleanup]
public void Cleanup() => File.Delete(FilePath);

[Benchmark]
public bool OpenClose()
{
    using (FileStream fileStream = File.OpenWrite(FilePath))
    {
        return fileStream.IsAsync; // return something just to consume the stream
    }
}

[Benchmark]
public bool OpenFlushToDiskClose()
{
    using (FileStream fileStream = File.OpenWrite(FilePath))
    {
        fileStream.Flush(flushToDisk: true);
        
        return fileStream.IsAsync;
    }
}

[Benchmark]
public bool OpenWriteClose()
{
    using (FileStream fileStream = File.OpenWrite(FilePath))
    {
        fileStream.WriteByte(1);
        
        return fileStream.IsAsync;
    }
}

[Benchmark]
public bool OpenWriteFlushToDiskClose()
{
    using (FileStream fileStream = File.OpenWrite(FilePath))
    {
        fileStream.WriteByte(1);
        
        fileStream.Flush(flushToDisk: true);
        
        return fileStream.IsAsync;
    }
}

Results (using ext4 file system):

BenchmarkDotNet=v0.13.1.1786-nightly, OS=ubuntu 18.04
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22263.22
Method Mean
OpenClose 4.668 us
OpenFlushToDiskClose 29.299 us
OpenWriteClose 6.888 us
OpenWriteFlushToDiskClose 6.595 us

Flushing to disk when nothing was actually written is very expensive (x5), but flushing to a file after writing a single byte seems to not add any overhead (it actually does the opposite?!).

Based on this findings I believe that we should perform more benchmarking (flushing after writing various number of bytes to the file, using SSD and non-SSD disks) and based on the results re-consider adding flushing to disk for certain file systems.

We already know how to fetch the file system type:

if (!Interop.Sys.TryGetFileSystemType(this, out Interop.Sys.UnixFileSystemTypes unixFileSystemType))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO os-linux Linux OS (any supported distro) question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

8 participants