Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the Performance of FileStream Async Reads on Windows #402

Closed
feO2x opened this issue Mar 26, 2019 · 3 comments
Closed

On the Performance of FileStream Async Reads on Windows #402

feO2x opened this issue Mar 26, 2019 · 3 comments

Comments

@feO2x
Copy link

feO2x commented Mar 26, 2019

Dear Microsoft performance team,

@atruskie and I performed some benchmarks on the read performance of FileStream on Windows (which can be found here and here). The results are interesting as we both encountered that sync file access is way faster in total execution time than async access across all our tests (ratio distribution of 1.45 to 9.22, with 3.56 being the average ratio). We also saw your FileStream benchmarks lead to similar results.

This surprised us, we would have expected that sync and async file access would take about the same time, with the added benefit that the calling thread is not blocked while the disk controller retrieves the data in the async case. In general, I thought that I/O should always be performed asynchronously. But why is async reading so much slower than sync reading? We cannot really find the reason behind this (and unfortunately, we are no NTFS experts).

Open questions / things that can be done:

  • Benchmark unbuffered FileStreams: according to this Microsoft paper, there is a native way to create FileStreams that do not buffer internally. We could benchmark these, too.
  • Measure the time the calling thread is blocked in sync file access: is there a way to see how long a thread is blocked (maybe with ETW events)? While client apps that occasionally open file streams are not really affected, this could hurt (web) service apps where the calling thread could serve another request.
  • What are general recommendations? Is a statement like "Load your files synchronously by default. Consider async when the file size is greater than X MB" useful?

If you could share some in light on this topic, that would be most appreciated.

@MarcoRossignoli
Copy link
Member

Are you using server or workstation for GC mode(server one could improve throughput due to less gc frequency)?I see a lot of gen0 collection(that slow down perf) I think due to async/await state machine.
BTW I think that you see great difference between async/sync in case of "real program" that heavly use IO in a "random" way(less use of [fast] IO cache etc...) where blocking thread(for real) is a lot worste for scalability(you need more thread for more load with pressure on thread pool) than pay time to time for GC.

/cc @stephentoub @benaadams thoughts?

@feO2x
Copy link
Author

feO2x commented Mar 26, 2019

@MarcoRossignoli The benchmarks were run in workstation mode.

@stephentoub
Copy link
Member

File reads via overlapped I/O has more overhead than not; I'd suggest writing a C benchmark against the Win32 APIs to validate this for yourself. The benefit then of using overlapped is that for long-latency reads, e.g. accessing a file from a network share, you can avoid blocking a thread and get the associated scalability benefits. But FileStream has issues here, e.g. https://github.com/dotnet/corefx/issues/6007, that largely defeat this benefit. There are a bunch of FileStream-related issues in the corefx repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants