-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eio_linux: make it safe to share FDs across domains #440
Conversation
It was previously not safe to share file descriptors between domains. The `fd_sharing.md` test demonstrates the problem: one domain enqueues an operation using an FD and another domain closes it first. `Eio_unix.Private.Rcfd` provides a ref-counted wrapper for file descriptors to fix this problem. It's lock-free and designed to be fast in the common case where an FD is only used from one domain, but safe in the case where it's used from several. The `bench_fd` benchmark is, if anything, slightly faster with this change, though I don't know why. There is a dscheck test that tests Rcfd with all combinations of two users and two closers. It includes an (unsafe) `peek` operation, as the public API already provided this. We might want to remove that at some point and provide a safe `Eio_unix.FD.use` operation instead. There is support here for safely having two domains try to close an FD at once, although eio_linux itself still doesn't support that (you must close from the domain that opened it). It could also allow us to support a `try_close` operation at some point. Note that closing is now always done with `Unix.close`, not uring. Using uring previously avoided races in the single-domain case, but that's not necessary now that we have a general solution. I did have a `remove_or_close` function that would use uring in the common case, but `Unix.close` if an operation was in progress, but that seemed unnecessary.
Notes:
|
On a preliminary review it seems like a sane approach, but I keep wondering, would it not make it easier to condensce both states (the reference counting and the Open/Close) into one Atomic.t ? This way you don't need to do get/put dances, something like: type state =
| Open of (int * Unix.file_desc)
| Closing of (int * (unit -> unit)) On |
I didn't benchmark that, but doing two allocations on every use seemed bad. The current code is only updating an integer and reading a shared location, so no allocations. Also, I think |
Another idea to avoid it is to use the most significant bit of the reference counting as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've found a race, see if I'm making any sense
- closing/closed: No need to do anything, but notifying is harmless. | ||
*) | ||
if Atomic.get t.ops > 0 then () (* Someone else will deal with it. *) | ||
else if Atomic.compare_and_set t.fd prev fully_closed then ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume Domain A is here, it did the 1->0 transition, did not see t.ops be incremented and proceeded to call no_users
.
) | ||
|
||
let get t = | ||
Atomic.incr t.ops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Domain B gets here, meanwhile Domain A already decided to close with no_users
.
Domain B sees Closing
since the atomics are ordered, it aborts by doing put
on line 97.
B sees the same sequence of events of A in put
, and ends up calling no_users
as well, so you end up with a double close. The assumption is that the no_close
of A would be carried over to B, which I think doesn't hold here, both end up calling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put
always does a CAS to Closing ignore
before calling no_users
, so the real function can only get called once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahA! indeed, I've missed the ignore in fully_closed
That might work... let me give it a try... |
After more thought, I don't think using one bit of
So, it's still two atomic locations, but working with I didn't rewrite it all to check, but suspect it won't be much/any simpler (though you're welcome to have a go - it might turn out OK). |
Ah indeed, the callback complicates it. I understand this solves the re-use race which is dangerous, do you also have something in mind to "cancel the operation on other domains ?" The way it is after this change. If multiple domains block on |
The idea is you use Eio's cancellation system to cancel, not close. This is similar to Unix's behaviour too (calling Normally, if you're using things in the obvious way, the structured concurrency will have cancelled all operations by the time the switch finishes and closes the FD. |
CHANGES: New features: - Add eio_posix backend (@talex5 @haesbaert ocaml-multicore/eio#448 ocaml-multicore/eio#477, reviewed by @avsm @patricoferris @polytypic). This replaces eio_luv on all platforms except Windows (which will later switch to its own backend). It is a lot faster, provides access to more modern features (such as `openat`), and can safely share OS resources between domains. - Add subprocess support (@patricoferris @talex5 ocaml-multicore/eio#461 ocaml-multicore/eio#464 ocaml-multicore/eio#472, reviewed by @haesbaert @avsm). This is the low-level API support for eio_linux and eio_posix. A high-level cross-platform API will be added in the next release. - Add `Fiber.fork_seq` (@talex5 ocaml-multicore/eio#460, reviewed by @avsm). This is a light-weight alternative to using a single-producer, single-consumer, 0-capacity stream, similar to a Python generator function. Bug fixes: - eio_linux: make it safe to share FDs across domains (@talex5 ocaml-multicore/eio#440, reviewed by @haesbaert). It was previously not safe to share file descriptors between domains because if one domain used an FD just as another was closing it, and the FD got reused, then the original operation could act on the wrong file. - eio_linux: release uring if Linux is too old (@talex5 ocaml-multicore/eio#476). Avoids a small resource leak. - eio_linux: improve error handling creating pipes and sockets (@talex5 ocaml-multicore/eio#474, spotted by @avsm). If we get an error (e.g. too many FDs) then report it to the calling fiber, instead of exiting the event loop. - eio_linux: wait for uring to finish before exiting (@talex5 ocaml-multicore/eio#470, reviewed by @avsm). If the main fiber raised an exception then it was possible to exit while a cancellation operation was still in progress. - eio_main: make `EIO_BACKEND` handling more uniform (@talex5 ocaml-multicore/eio#447). Previously this environment variable was only used on Linux. Now all platforms check it. - Tell dune about `EIO_BACKEND` (@talex5 ocaml-multicore/eio#442). If this changes, dune needs to re-run the tests. - eio_linux: add some missing close-on-execs (@talex5 ocaml-multicore/eio#441). - eio_linux: `read_exactly` fails to update file offset (@talex5 ocaml-multicore/eio#438). - Work around dune `enabled_if` bug on non-Linux systems (@polytypic ocaml-multicore/eio#475, reviewed by @talex5). - Use raw system call of `getrandom` for glibc versions before 2.25 (@zenfey ocaml-multicore/eio#482). Documentation: - Add `HACKING.md` with hints for working on Eio (@talex5 ocaml-multicore/eio#443, reviewed by @avsm @polytypic). - Improve worker pool example (@talex5 ocaml-multicore/eio#454). - Add more Conditions documentation (@talex5 ocaml-multicore/eio#436, reviewed by @haesbaert). This adds a discussion of conditions to the README and provides examples using them to handle signals. - Condition: fix the example in the docstring (@avsm ocaml-multicore/eio#468). Performance: - Add a network benchmark using an HTTP-like protocol (@talex5 ocaml-multicore/eio#478, reviewed by @avsm @patricoferris). - Add a benchmark for reading from `/dev/zero` (@talex5 ocaml-multicore/eio#439). Other changes: - Add CI for macOS (@talex5 ocaml-multicore/eio#452). - Add tests for `pread`, `pwrite` and `readdir` (@talex5 ocaml-multicore/eio#451). - eio_linux: split into multiple files (@talex5 ocaml-multicore/eio#465 ocaml-multicore/eio#466, reviewed by @avsm). - Update Dockerfile (@talex5 ocaml-multicore/eio#471). - Use dune.3.7.0 (@patricoferris ocaml-multicore/eio#457). - Mint exclusive IDs across domains (@TheLortex ocaml-multicore/eio#480, reported by @haesbaert, reviewed by @talex5). The tracing currently only works with a single domain anyway, but this will change when OCaml 5.1 is released.
CHANGES: New features: - Add eio_posix backend (@talex5 @haesbaert ocaml-multicore/eio#448 ocaml-multicore/eio#477, reviewed by @avsm @patricoferris @polytypic). This replaces eio_luv on all platforms except Windows (which will later switch to its own backend). It is a lot faster, provides access to more modern features (such as `openat`), and can safely share OS resources between domains. - Add subprocess support (@patricoferris @talex5 ocaml-multicore/eio#461 ocaml-multicore/eio#464 ocaml-multicore/eio#472, reviewed by @haesbaert @avsm). This is the low-level API support for eio_linux and eio_posix. A high-level cross-platform API will be added in the next release. - Add `Fiber.fork_seq` (@talex5 ocaml-multicore/eio#460, reviewed by @avsm). This is a light-weight alternative to using a single-producer, single-consumer, 0-capacity stream, similar to a Python generator function. Bug fixes: - eio_linux: make it safe to share FDs across domains (@talex5 ocaml-multicore/eio#440, reviewed by @haesbaert). It was previously not safe to share file descriptors between domains because if one domain used an FD just as another was closing it, and the FD got reused, then the original operation could act on the wrong file. - eio_linux: release uring if Linux is too old (@talex5 ocaml-multicore/eio#476). Avoids a small resource leak. - eio_linux: improve error handling creating pipes and sockets (@talex5 ocaml-multicore/eio#474, spotted by @avsm). If we get an error (e.g. too many FDs) then report it to the calling fiber, instead of exiting the event loop. - eio_linux: wait for uring to finish before exiting (@talex5 ocaml-multicore/eio#470, reviewed by @avsm). If the main fiber raised an exception then it was possible to exit while a cancellation operation was still in progress. - eio_main: make `EIO_BACKEND` handling more uniform (@talex5 ocaml-multicore/eio#447). Previously this environment variable was only used on Linux. Now all platforms check it. - Tell dune about `EIO_BACKEND` (@talex5 ocaml-multicore/eio#442). If this changes, dune needs to re-run the tests. - eio_linux: add some missing close-on-execs (@talex5 ocaml-multicore/eio#441). - eio_linux: `read_exactly` fails to update file offset (@talex5 ocaml-multicore/eio#438). - Work around dune `enabled_if` bug on non-Linux systems (@polytypic ocaml-multicore/eio#475, reviewed by @talex5). - Use raw system call of `getrandom` for glibc versions before 2.25 (@zenfey ocaml-multicore/eio#482). Documentation: - Add `HACKING.md` with hints for working on Eio (@talex5 ocaml-multicore/eio#443, reviewed by @avsm @polytypic). - Improve worker pool example (@talex5 ocaml-multicore/eio#454). - Add more Conditions documentation (@talex5 ocaml-multicore/eio#436, reviewed by @haesbaert). This adds a discussion of conditions to the README and provides examples using them to handle signals. - Condition: fix the example in the docstring (@avsm ocaml-multicore/eio#468). Performance: - Add a network benchmark using an HTTP-like protocol (@talex5 ocaml-multicore/eio#478, reviewed by @avsm @patricoferris). - Add a benchmark for reading from `/dev/zero` (@talex5 ocaml-multicore/eio#439). Other changes: - Add CI for macOS (@talex5 ocaml-multicore/eio#452). - Add tests for `pread`, `pwrite` and `readdir` (@talex5 ocaml-multicore/eio#451). - eio_linux: split into multiple files (@talex5 ocaml-multicore/eio#465 ocaml-multicore/eio#466, reviewed by @avsm). - Update Dockerfile (@talex5 ocaml-multicore/eio#471). - Use dune.3.7.0 (@patricoferris ocaml-multicore/eio#457). - Mint exclusive IDs across domains (@TheLortex ocaml-multicore/eio#480, reported by @haesbaert, reviewed by @talex5). The tracing currently only works with a single domain anyway, but this will change when OCaml 5.1 is released.
CHANGES: New features: - Add eio_posix backend (@talex5 @haesbaert ocaml-multicore/eio#448 ocaml-multicore/eio#477, reviewed by @avsm @patricoferris @polytypic). This replaces eio_luv on all platforms except Windows (which will later switch to its own backend). It is a lot faster, provides access to more modern features (such as `openat`), and can safely share OS resources between domains. - Add subprocess support (@patricoferris @talex5 ocaml-multicore/eio#461 ocaml-multicore/eio#464 ocaml-multicore/eio#472, reviewed by @haesbaert @avsm). This is the low-level API support for eio_linux and eio_posix. A high-level cross-platform API will be added in the next release. - Add `Fiber.fork_seq` (@talex5 ocaml-multicore/eio#460, reviewed by @avsm). This is a light-weight alternative to using a single-producer, single-consumer, 0-capacity stream, similar to a Python generator function. Bug fixes: - eio_linux: make it safe to share FDs across domains (@talex5 ocaml-multicore/eio#440, reviewed by @haesbaert). It was previously not safe to share file descriptors between domains because if one domain used an FD just as another was closing it, and the FD got reused, then the original operation could act on the wrong file. - eio_linux: release uring if Linux is too old (@talex5 ocaml-multicore/eio#476). Avoids a small resource leak. - eio_linux: improve error handling creating pipes and sockets (@talex5 ocaml-multicore/eio#474, spotted by @avsm). If we get an error (e.g. too many FDs) then report it to the calling fiber, instead of exiting the event loop. - eio_linux: wait for uring to finish before exiting (@talex5 ocaml-multicore/eio#470, reviewed by @avsm). If the main fiber raised an exception then it was possible to exit while a cancellation operation was still in progress. - eio_main: make `EIO_BACKEND` handling more uniform (@talex5 ocaml-multicore/eio#447). Previously this environment variable was only used on Linux. Now all platforms check it. - Tell dune about `EIO_BACKEND` (@talex5 ocaml-multicore/eio#442). If this changes, dune needs to re-run the tests. - eio_linux: add some missing close-on-execs (@talex5 ocaml-multicore/eio#441). - eio_linux: `read_exactly` fails to update file offset (@talex5 ocaml-multicore/eio#438). - Work around dune `enabled_if` bug on non-Linux systems (@polytypic ocaml-multicore/eio#475, reviewed by @talex5). - Use raw system call of `getrandom` for glibc versions before 2.25 (@zenfey ocaml-multicore/eio#482). Documentation: - Add `HACKING.md` with hints for working on Eio (@talex5 ocaml-multicore/eio#443, reviewed by @avsm @polytypic). - Improve worker pool example (@talex5 ocaml-multicore/eio#454). - Add more Conditions documentation (@talex5 ocaml-multicore/eio#436, reviewed by @haesbaert). This adds a discussion of conditions to the README and provides examples using them to handle signals. - Condition: fix the example in the docstring (@avsm ocaml-multicore/eio#468). Performance: - Add a network benchmark using an HTTP-like protocol (@talex5 ocaml-multicore/eio#478, reviewed by @avsm @patricoferris). - Add a benchmark for reading from `/dev/zero` (@talex5 ocaml-multicore/eio#439). Other changes: - Add CI for macOS (@talex5 ocaml-multicore/eio#452). - Add tests for `pread`, `pwrite` and `readdir` (@talex5 ocaml-multicore/eio#451). - eio_linux: split into multiple files (@talex5 ocaml-multicore/eio#465 ocaml-multicore/eio#466, reviewed by @avsm). - Update Dockerfile (@talex5 ocaml-multicore/eio#471). - Use dune.3.7.0 (@patricoferris ocaml-multicore/eio#457). - Mint exclusive IDs across domains (@TheLortex ocaml-multicore/eio#480, reported by @haesbaert, reviewed by @talex5). The tracing currently only works with a single domain anyway, but this will change when OCaml 5.1 is released.
It was previously not safe to share file descriptors between domains. The
fd_sharing.md
test demonstrates the problem: one domain enqueues an operation using an FD and another domain closes it first.Eio_unix.Private.Rcfd
provides a ref-counted wrapper for file descriptors to fix this problem. It's lock-free and designed to be fast in the common case where an FD is only used from one domain, but safe in the case where it's used from several.The
bench_fd
benchmark is, if anything, slightly faster with this change, though I don't know why:There is a dscheck test that tests Rcfd with all combinations of two users and two closers.
It includes an (unsafe)
peek
operation, as the public API already provided this. We might want to remove that at some point and provide a safeEio_unix.FD.use
operation instead.There is support here for safely having two domains try to close an FD at once, although eio_linux itself still doesn't support that (you must close from the domain that opened it).
It could also allow us to support a
try_close
operation at some point.Note that closing is now always done with
Unix.close
, not uring. Using uring previously avoided races in the single-domain case, but that's not necessary now that we have a general solution. I did have aremove_or_close
function that would use uring in the common case, butUnix.close
if an operation was in progress, but that seemed over-complicated.I'm intending to use this for the generic eio_posix backend too.