Skip to content

Fix: closing system fd is thread unsafe#16289

Merged
straight-shoota merged 3 commits intocrystal-lang:masterfrom
ysbaddaden:feature/add-crystal-fd-lock
Nov 25, 2025
Merged

Fix: closing system fd is thread unsafe#16289
straight-shoota merged 3 commits intocrystal-lang:masterfrom
ysbaddaden:feature/add-crystal-fd-lock

Conversation

@ysbaddaden
Copy link
Collaborator

@ysbaddaden ysbaddaden commented Oct 28, 2025

This patch implements a reference counted lock to protect IO objects that depend on a reusable system fd (IO::FileDescriptor, File and Socket) to protect them against thread safety issues around close:

  • Thread 1 wants to read from fd 123;
  • The OS preempts Thread 1;
  • Thread 2 closes fd 123;
  • Thread 2 opens something else and the OS reuses fd 123;
  • The OS resumes Thread 1;
  • Thread 1 reads from the reused fd 123!!!

The same issue arises for any operation that would mutate the fd: write, fchown, ftruncate, setsockopt, ... as they risk affecting a reused fd.

NOTE: The lock is currently implemented on the UNIX target only, but we might want to use it on every target. Go uses its fdMutex on every targets.

Extracted from #16209 (follow-up with single reader/writer)
Depends on #16288 (EventLoop#shutdown)
Closes #16127
Obsoletes #16128

This patch implements a reference counted lock to protect IO objects
that depend on a reusable system fd (IO::FileDescriptor, File and
Socket) to protect them against thread safety issues around close:

- Thread 1 wants to read from fd 123;
- The OS preempts Thread 1;
- Thread 2 closes fd 123;
- Thread 2 opens something else and the OS reuses fd 123;
- The OS resumes Thread 1;
- Thread 1 reads from the reused fd 123!!!

The issue arises for any operation that would mutate the fd: write,
fchown, ftruncate, setsockopt, ... as they risk affecting a reused fd
instead of the expected one.
Only operations that can affect the file descriptor are counted, for
example read or write, truncating a file or changing file permissions.

Mere queries with no side effects go through normally because at worst
they will fail (they would have anyway).
@ysbaddaden ysbaddaden force-pushed the feature/add-crystal-fd-lock branch from ef5d08d to 3150772 Compare November 14, 2025 14:28
@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Nov 14, 2025

Rebased on master to remove #16288 that has been merged + its fixup (#16366).

@straight-shoota
Copy link
Member

Is there a particular reason why we're rolling this out only on Unix targets instead of globally?
There is merit in making smaller increments, but that's a bit offset by the extra method overrides only for the Unix implementations (system_read & co).

@ysbaddaden
Copy link
Collaborator Author

Because the issue is on UNIX.

I can move it out of Crystal::System if we believe there's value for every targets' IO::FileDescriptor and Socket.

@straight-shoota
Copy link
Member

It seems useful to share the same implementation across platforms. Even if it's not strictly necessary on Windows, it's easier to maintain if we only have to worry one mechanism.
That's assuming there are no grave downsides to using this on Windows? I presume there might be some performance implications, but closing doesn't seem like a very contested operation.

@ysbaddaden
Copy link
Collaborator Author

Close doesn't create a contention point. The problem is concurrency to the same stdio, file or socket, because we must atomically increment. Many fibers frantically writing to STDOUT will see an impact.

The next step to have a single reader and a single writer (#16209) could be useful on Windows to replace the custom thread communication to read async from the console: when we could merely detach the current thread (#15871) yet make sure only one thread is blocked —which we could use on UNIX to replace the TTY hack (#16353).

@straight-shoota
Copy link
Member

Many fibers frantically writing to STDOUT will see an impact.

That probably produces a big jumble anyway, so it doesn't seem like a very relevant use case.

@ysbaddaden
Copy link
Collaborator Author

If you're careful to buffer your message and to fit within PIPE_BUF then writing to an stdio is atomic (POSIX requirement). In practice it appears to be fine for files.

The tracing feature heavily relies on this.

In practice you don't need to write so frantically as printing every malloc or write something every few microseconds, and using a channel + fiber (as Log does) will completely remove the contention.

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Nov 17, 2025

Anyway: I'll move @fd_lock out of Crystal::System 👍

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Nov 20, 2025

I started moving @fd_lock out of Crystal::System and I don't like it 😢

The explicit relationship between the lock and the fd, for example @fd_lock.reference { LibC.fsync(fd) }, is replaced with a blind lock because the wrapped method might implicitly reference fd, for example @fd_lock.reference { system_fsync }.

That looks bad and feels brittle.

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Nov 20, 2025

I'd prefer to duplicate the behavior in Crystal::System for Windows to protect the handle, and that could come as a follow up.

@straight-shoota
Copy link
Member

There are already a number of indirect reference where the locked block delegates to the event loop that I'm concerned about.
For example, the wrappers at the end of unix/socket.cr.

The complexity of delegation is already quite high between the public API, system implementations and event loop.
Would be great if there was any chance to simplify that somehow.

This is totally not a stopper, though. Maybe we figure out something later (probably not, though 🤷).

@ysbaddaden
Copy link
Collaborator Author

ysbaddaden commented Nov 21, 2025

Tried again, and from the point of view of "protecting the system_ methods" it feels better.

I hit a blocker though: we must implement Crystal::EventLoop::IOCP#shutdown otherwise the refcount won't be decremented and the files could at worst be never closed and fibers get stuck.

It's easy for Socket, but IO::FileDescriptor is another story: we must memorize the pending overlapped ops for every file, and actively cancel them (which may be in whatever IOCP instance, possibly multiple of them). We must also be careful with the STDIN console hack, as well as the blocking read/write calls —can they be canceled?

As for Like the io_uring event loop, I believe we'll want to wait for the follow-up that serializes reads and writes so there can be only one reader and one writer at most.

@straight-shoota
Copy link
Member

As for the io_uring event loop, I believe we'll want to wait for the follow-up that serializes reads and writes so there can be only one reader and one writer at most.

Would that make it simpler for IOCP as well?

@ysbaddaden
Copy link
Collaborator Author

Yes, this is what I meant.

@ysbaddaden ysbaddaden moved this from Review to Approved in Multi-threading Nov 24, 2025
@ysbaddaden ysbaddaden added this to the 1.19.0 milestone Nov 24, 2025
@straight-shoota straight-shoota merged commit bf90884 into crystal-lang:master Nov 25, 2025
49 checks passed
@github-project-automation github-project-automation bot moved this from Approved to Done in Multi-threading Nov 25, 2025
straight-shoota pushed a commit that referenced this pull request Nov 25, 2025
This patch implements a reference counted lock to protect IO objects that depend on a reusable system fd (`IO::FileDescriptor`, `File` and `Socket`) to protect them against thread safety issues around close:

- Thread 1 wants to read from fd 123;
- The OS preempts Thread 1;
- Thread 2 closes fd 123;
- Thread 2 opens something else and the OS reuses fd 123;
- The OS resumes Thread 1;
- Thread 1 reads from the reused fd 123!!!

The same issue arises for any operation that would mutate the fd: `write`, `fchown`, `ftruncate`, `setsockopt`, ... as they risk affecting a reused fd.
@ysbaddaden ysbaddaden deleted the feature/add-crystal-fd-lock branch November 26, 2025 13:59
straight-shoota pushed a commit that referenced this pull request Dec 1, 2025
This patch extends the fdlock to **serialize reads and writes** by extending the reference counted lock with a read lock and a write lock, so taking a reference and locking acts as a single operation instead of two (1. acquire/release the lock; 2. take/return a reference). This avoids a race condition in the polling event loops:

- Fiber 1 then Fiber 2 try to read from `fd`;
- Since `fd` isn't ready, both fibers start waiting;
- When `fd` becomes ready then Fiber 1 is resumed;
- Fiber 1 doesn't read everything and _returns_;
- Since events are edge-triggered, Fiber 2 won't be resumed!!!

With the read lock, fiber 2 will wait on the lock then be resumed by fiber 1 when it returns. A concrete example is multiple fibers waiting to accept on a socket where fiber 1 would keep handling connections, while fiber 2 sits idle.

The other benefit is that it can help to simplify the evloops that will now only deal with a single reader + single writer per `IO` and is required for the io_uring evloop (the MT version requires it).

**NOTE**: While this patch only serializes reads/writes on UNIX at the `Crystal::System`, which is where the bugs are, we will move it into stdlib for all targets in a follow-up. See #16289 (comment)
@crysbot
Copy link
Collaborator

crysbot commented Jan 18, 2026

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/there-is-a-way-to-optimize-this-program/6947/64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:unix topic:multithreading topic:stdlib:runtime

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Closing fd is thread unsafe on UNIX targets BUG: transfering fd=3 to another evloop with pending reader/writer fibers (RuntimeError) Since

4 participants