-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing global semaphore deadlock for *NIX platforms #1603
Conversation
If the program crashed or behaved unexpectedly while a global semaphore was locked, that semaphore would stay locked until the machine was restarted. Unlinking the semaphore upon creation will cause the semaphore to be disposed of once the last process using that semaphore is no longer running.
FYI - the 'sem_init' in the alternate path isn't just deprecated on macOS, but it's outright non functional. |
@@ -106,6 +106,8 @@ hsGlobalSemaphore::hsGlobalSemaphore(int initialValue, const ST::string& name) | |||
|
|||
/* Named semaphore shared between processes */ | |||
fPSema = sem_open(semName.c_str(), O_CREAT, 0666, initialValue); | |||
// Unlink it immediately so it will be freed if we unexpectedly leave it locked | |||
sem_unlink(semName.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. This will cause other processes to get a new semaphore that isn't connected to this semaphore, breaking the process shared nature of the global semaphore.
From sem_unlink documentation:
The named semaphore named name is removed. If the semaphore is in use by
other processes, then name is immediately disassociated with the semaphore,
but the semaphore itself will not be removed until all references
to it have been closed. Subsequent calls to sem_open() using name will
refer to or create a new semaphore named name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may actually be the desired behavior here. The Windows version, when used with logging, creates a process local semaphore. Not a machine local semaphore.
However:
- That may not have been what was intended on Windows.
- The Windows version can still create a global semaphore if the semaphore name is prepended with 'global' (have not tested- that's what the docs claim.)
Certainly open to suggestions here. Not seeing a great way to close the semaphore automatically on an unexpected exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have mixed thoughts on this right now... on one hand, this is the easy solution that solves the common case, and I don't think we currently have any cases where we need multi-process semaphores, but this does potentially introduce a limitation where semaphores can't actually be used across multiple processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirical evidence suggests that Win32 semaphores are created in either the global or session namespace without the local namespace prefix. That is basically the point of the hsGlobalSemaphore
- to be able to wait and signal across processes. The big difference is that Win32 semaphores automatically destroy themselves when the last process holding a HANDLE
to the semaphore terminates. I'm not really sure what to do here, unfortunately. We may need to rethink how we synchronize access to log files when multiple clients are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Just to be clear here - not a shot in the dark. When @dpogue was describing the didn't-launch-til-restart behavior I knew exactly what it was because I've seen it happen before.
It shouldn't happen in normal operation but every once in a while if the client crashes it just gets stuck. Especially worrisome - if the client locked up due to a deadlock it will stay deadlocked when it relaunches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't doubt that this fixes the deadlock on launch at all. The problem is that this fixes it by breaking the global semaphore mechanism.
I guess this issue probably exists on Linux too, but is it worth looking into Mach semaphores for Darwin-based systems? Supposedly this is the primitive on which Darwin implemented named POSIX semaphores: https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/KernelProgramming/synchronization/synchronization.html |
I had started to look at Mach semaphores, but didn't see anything directly analogous to a named semaphore in my cursory glance. There was some reference about passing semaphores across process boundaries, but I didn't see anything that directly mapped to named. Let me know if you find anything. Another suggestion I found - that would work well here but completely upend the interface - is to establish a file based lock on the log file. File system locks - supposedly - are released when the process dies. Technically - you may even be able to rebuild named locks over file system locks. I have not looked too deeply at this path though. |
Looking into file locks as a possible replacement - especially in since the guard here is specifically intended to gate file access. On Windows this function would be LockFileEx:
In POSIX the function would be I would have to check the behavior on macOS. |
I think the other PR is likely the direction to go - so I'll close this one. |
If the program crashed or behaved unexpectedly while a global semaphore was locked, that semaphore would stay locked until the machine was restarted. Unlinking the semaphore upon creation will cause the semaphore to be disposed of once the last process using that semaphore is no longer running.