Skip to content

Fixing global semaphore deadlock for *NIX platforms #1603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Sources/Plasma/CoreLib/hsThread_Unix.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ hsGlobalSemaphore::hsGlobalSemaphore(int initialValue, const ST::string& name)

/* Named semaphore shared between processes */
fPSema = sem_open(semName.c_str(), O_CREAT, 0666, initialValue);
// Unlink it immediately so it will be freed if we unexpectedly leave it locked
sem_unlink(semName.c_str());
Copy link
Member

@Hoikas Hoikas Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. This will cause other processes to get a new semaphore that isn't connected to this semaphore, breaking the process shared nature of the global semaphore.

From sem_unlink documentation:

     The named semaphore named name is removed.  If the semaphore is in use by
     other processes, then name is immediately disassociated with the semaphore,
     but the semaphore itself will not be removed until all references
     to it have been closed.  Subsequent calls to sem_open() using name will
     refer to or create a new semaphore named name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may actually be the desired behavior here. The Windows version, when used with logging, creates a process local semaphore. Not a machine local semaphore.

However:

  • That may not have been what was intended on Windows.
  • The Windows version can still create a global semaphore if the semaphore name is prepended with 'global' (have not tested- that's what the docs claim.)

Certainly open to suggestions here. Not seeing a great way to close the semaphore automatically on an unexpected exit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have mixed thoughts on this right now... on one hand, this is the easy solution that solves the common case, and I don't think we currently have any cases where we need multi-process semaphores, but this does potentially introduce a limitation where semaphores can't actually be used across multiple processes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirical evidence suggests that Win32 semaphores are created in either the global or session namespace without the local namespace prefix. That is basically the point of the hsGlobalSemaphore - to be able to wait and signal across processes. The big difference is that Win32 semaphores automatically destroy themselves when the last process holding a HANDLE to the semaphore terminates. I'm not really sure what to do here, unfortunately. We may need to rethink how we synchronize access to log files when multiple clients are running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Just to be clear here - not a shot in the dark. When @dpogue was describing the didn't-launch-til-restart behavior I knew exactly what it was because I've seen it happen before.

It shouldn't happen in normal operation but every once in a while if the client crashes it just gets stuck. Especially worrisome - if the client locked up due to a deadlock it will stay deadlocked when it relaunches.

Copy link
Member

@Hoikas Hoikas Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't doubt that this fixes the deadlock on launch at all. The problem is that this fixes it by breaking the global semaphore mechanism.

if (fPSema == SEM_FAILED)
{
hsAssert(0, "hsOSException");
Expand Down