You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ Just for the record, POSIX semaphores are implemented in user-space. ]
Glibc implements POSIX semaphores' API: sem_open(), sem_wait(), sem_post(), etc. See the man page for details.
POSIX semaphores open a shared-memory file under /dev/shm -- that's why POSIX semaphores are hard/impossible to implement in Gramine-SGX.
POSIX semaphores use futexes on shared-memory locations, that's why they are typically faster than Sys-V semaphores.
Also note that Sys-V semaphores are system-wide (any process can connect to any semaphore, if it has sufficient permissions), whereas POSIX semaphores are specific to a group of processes. That's why Sys-V semaphores has a problem of system-wide semaphore leaks: if some group of processes terminated without removing its semaphores, the semaphores will be available until next Linux reboot (or manual cleanup of these orphaned semaphores).
I will not describe how Sys-V semaphores work in this issue. Here are the links I found useful:
hard-code the same default values as in Linux sources.
/proc/sysvipc/sem -- read-only, shows all semaphores in the system:
we should not implement this file for now, as it is probably not used by real-world apps,
but implementation should be simple: ask the master process about all semaphores, the master process sends back the results, print these results in the format similar to Linux.
Proposed simplification 1
Ignore permissions and their checks in all 4 syscalls; we should set sem_perm.cuid, sem_perm.uid, sem_perm.cgid, sem_perm.gid, sem_perm.mode, but for simplicity we can ignore their verifications during syscalls.
Well, it may be trivial to implement such checks. If it is, let's implement them immediately. But if it would require some changes in other parts of Gramine, I would leave it as future work.
The calling thread catches a signal: the value of semzcnt is decremented and semop() fails, with errno set to EINTR.
In other words, if there is an interrupt/signal during a blocking semop(), then the syscall must mark this process as "not waiting anymore" and fail with EINTR. This first part is problematic: in the distrubuted logic of Gramine, this would require sending a special "not waiting anymore" message from this process to the leader process, then the parent process must decrement sem::semzcnt, and send the acknowledgement message back... This problem is non-trivial, and it is similar to this issue: #12
So, we should not implement this logic for now. This essentially renders all semaphore operations non-interruptible.
Also, this limitation will most probably affect semtimedop() syscall -- the timeout will probably be useless, because we won't be able to "undo" the operations if timeout is triggered.
Proposed simplification 3
The semantics of SEM_UNDO flag are complicated, especially because SEM_UNDO metadata (semaphore adjustments) are per-process and are kept on execve() and clone(CLONE_SYSVSEM). This would necessiate a separate LibOS handle for each semaphore, and corresponding checkpoint-restore code in Gramine.
Even though the adjustments logic itself is pretty simple, I think we can silently ignore SEM_UNDO for now.
By the way, the Linux Programming Interface book also mentions that SEM_UNDO is not as useful as it may seem, and applications shouldn't rely on this flag really (see Limitations of SEM_UNDO section).
UPDATE 27. March 2023: Apache APR uses SEM_UNDO. Looks like we need to implement SEM_UNDO, but probably we can get away with just silently ignoring it.
Random notes
References to Sys-V semaphores are shared, because the semaphores don't belong to any process. So they are preserved across fork and execve -- in the sense that semid can be accessed by any process. IIUC, Sys-V semaphores do not have a state in the process (other than SEM_UNDO adjustments which we don't plan to support yet), so there is nothing to checkpoint/restore in Gramine. Also note that semaphore IDs (semid integers) are not file descriptors, so they can't be used in e.g. poll().
The standards say: "Where multiple processes are trying to decrease a semaphore by the same amount, it is indeterminate which process will actually be permitted to perform the operation first." Gramine can rely on this to simplify iterating over waiting processes when some semaphore-set becomes available. It is the responsibility of the app to prevent starvation scenarios, not Gramine's.
Sys-V semaphores API is clumsy and overcomplicated. For example, almost all applications use simpler binary semaphores: the semaphore set is reduced to the size of 1, and the semaphore values can be only 0 and 1. I don't think we should do any simplifications based on this, but this is an interesting remark. (Also note that semaphore sets were designed like this to have atomic guarantees -- all semaphores in the set are either processed, or not at all.)
Long time ago, Gramine had an implementation of Sys-V semaphores that was very buggy and limited. See this commit: 356ae6e
In particular, the following files should be modified/created:
libos/include/libos_sysv_sem.h -- new file with structs, enums, constants for Sys-V semaphores
libos/include/libos_ipc.h -- add new functions for Sys-V operations via IPC messages
libos/src/ipc/libos_ipc_worker.c -- add Sys-V specific callbacks
libos/src/ipc/libos_ipc_sysv_sem.c -- new file with glue code to transform from high-level ipc_sysv_sem_xxx() operations to Gramine IPC functionality like ipc_send_msg_and_get_response()
libos/src/sys/libos_sysv_sem.c -- new file with syscall implementations
libos/src/bookkeep/libos_sysv_sem.c -- new file with logic of the leader process (how it iterates through semaphores in the set and decides whether to allow or block requests)
Also tests should be added:
libos/test/ltp -- enable as many Sys-V semaphore tests as possible
libos/test/regression -- add one or two Sys-V semaphore tests: single-process and multi-process, testing:
IPC_PRIVATE
IPC_CREAT
IPC_EXCL
sem_otime (because of the synchronization trick between semget and semctl, we must implement this)
IPC_SET (at least for the trick with sem_otime above)
The main user is Apache web-server and its derivatives like Apache proxy.
Apache web-server and its plugins use Sys-V semaphores (e.g. google for APR_USE_SYSVSEM_SERIALIZE). In particular:
Apache httpd example (that we had for Gramine) uses them
Apache Beam with Flink uses them
On the other hand, Python's multiprocessing package unfortunately does not use Sys-V semaphores but instead uses POSIX semaphores. That's unfortunate, because implementing POSIX semaphores in Gramine/SGX would require allowing untrusted shared memory (/dev/shm), which will probably never happen...
On the other hand, Python's multiprocessing package unfortunately does not use Sys-V semaphores but instead uses POSIX semaphores. That's unfortunate, because implementing POSIX semaphores in Gramine/SGX would require allowing untrusted shared memory (/dev/shm), which will probably never happen...
Description of the feature
System V semaphores (aka Sys-V semaphores) are a primitive for inter-process synchronization. They are not to be confused with newer POSIX semaphores.
Sys-V semaphores use four system calls:
semget()
, currently unimplemented in Graminesemop()
, currently unimplemented in Graminesemctl()
, currently unimplemented in Graminesemtimedop()
, currently unimplemented in Gramine[ Just for the record, POSIX semaphores are implemented in user-space. ]
sem_open()
,sem_wait()
,sem_post()
, etc. See the man page for details./dev/shm
-- that's why POSIX semaphores are hard/impossible to implement in Gramine-SGX.I will not describe how Sys-V semaphores work in this issue. Here are the links I found useful:
Here are some links to the Linux source code on Sys-V semaphores:
Gramine needs to implement
semget()
,semop()
,semctl()
,semtimedop()
/proc/sys/kernel/sem
-- read-only, with hard-coded limits for semaphores:/proc
man page for details on these limits;/proc/sysvipc/sem
-- read-only, shows all semaphores in the system:Proposed simplification 1
Ignore permissions and their checks in all 4 syscalls; we should set
sem_perm.cuid
,sem_perm.uid
,sem_perm.cgid
,sem_perm.gid
,sem_perm.mode
, but for simplicity we can ignore their verifications during syscalls.Well, it may be trivial to implement such checks. If it is, let's implement them immediately. But if it would require some changes in other parts of Gramine, I would leave it as future work.
Proposed simplification 2
The man page for
semop()
says this:In other words, if there is an interrupt/signal during a blocking
semop()
, then the syscall must mark this process as "not waiting anymore" and fail withEINTR
. This first part is problematic: in the distrubuted logic of Gramine, this would require sending a special "not waiting anymore" message from this process to the leader process, then the parent process must decrementsem::semzcnt
, and send the acknowledgement message back... This problem is non-trivial, and it is similar to this issue: #12So, we should not implement this logic for now. This essentially renders all semaphore operations non-interruptible.
Also, this limitation will most probably affect
semtimedop()
syscall -- the timeout will probably be useless, because we won't be able to "undo" the operations if timeout is triggered.Proposed simplification 3
The semantics of
SEM_UNDO
flag are complicated, especially becauseSEM_UNDO
metadata (semaphore adjustments) are per-process and are kept onexecve()
andclone(CLONE_SYSVSEM)
. This would necessiate a separate LibOS handle for each semaphore, and corresponding checkpoint-restore code in Gramine.Even though the adjustments logic itself is pretty simple, I think we can silently ignore
SEM_UNDO
for now.By the way, the Linux Programming Interface book also mentions that
SEM_UNDO
is not as useful as it may seem, and applications shouldn't rely on this flag really (seeLimitations of SEM_UNDO
section).UPDATE 27. March 2023: Apache APR uses
SEM_UNDO
. Looks like we need to implementSEM_UNDO
, but probably we can get away with just silently ignoring it.Random notes
References to Sys-V semaphores are shared, because the semaphores don't belong to any process. So they are preserved across fork and execve -- in the sense that
semid
can be accessed by any process. IIUC, Sys-V semaphores do not have a state in the process (other thanSEM_UNDO
adjustments which we don't plan to support yet), so there is nothing to checkpoint/restore in Gramine. Also note that semaphore IDs (semid
integers) are not file descriptors, so they can't be used in e.g.poll()
.The standards say: "Where multiple processes are trying to decrease a semaphore by the same amount, it is indeterminate which process will actually be permitted to perform the operation first." Gramine can rely on this to simplify iterating over waiting processes when some semaphore-set becomes available. It is the responsibility of the app to prevent starvation scenarios, not Gramine's.
Sys-V semaphores API is clumsy and overcomplicated. For example, almost all applications use simpler binary semaphores: the semaphore set is reduced to the size of 1, and the semaphore values can be only 0 and 1. I don't think we should do any simplifications based on this, but this is an interesting remark. (Also note that semaphore sets were designed like this to have atomic guarantees -- all semaphores in the set are either processed, or not at all.)
Long time ago, Gramine had an implementation of Sys-V semaphores that was very buggy and limited. See this commit: 356ae6e
Notes on implementation
The implementation must closely follow the one for POSIX file locks: gramineproject/graphene#2481
In particular, the following files should be modified/created:
libos/include/libos_sysv_sem.h
-- new file with structs, enums, constants for Sys-V semaphoreslibos/include/libos_ipc.h
-- add new functions for Sys-V operations via IPC messageslibos/src/ipc/libos_ipc_worker.c
-- add Sys-V specific callbackslibos/src/ipc/libos_ipc_sysv_sem.c
-- new file with glue code to transform from high-levelipc_sysv_sem_xxx()
operations to Gramine IPC functionality likeipc_send_msg_and_get_response()
libos/src/sys/libos_sysv_sem.c
-- new file with syscall implementationslibos/src/bookkeep/libos_sysv_sem.c
-- new file with logic of the leader process (how it iterates through semaphores in the set and decides whether to allow or block requests)Also tests should be added:
libos/test/ltp
-- enable as many Sys-V semaphore tests as possiblelibos/test/regression
-- add one or two Sys-V semaphore tests: single-process and multi-process, testing:IPC_PRIVATE
IPC_CREAT
IPC_EXCL
sem_otime
(because of the synchronization trick betweensemget
andsemctl
, we must implement this)IPC_SET
(at least for the trick withsem_otime
above)IPC_RMID
IPC_INFO
GET...
/SET...
operations insemctl()
IPC_NOWAIT
EIDRM
error codeSEM_UNDO
(if we implement it)Random note: maybe the Sync Engine could be useful for Sys-V semaphores (but I doubt): 0e75cad#diff-53c705e096c216c76af82a3affd8a17766e8539c58d1f8b1243ae44010cc74da
Why Gramine should implement it?
The main user is Apache web-server and its derivatives like Apache proxy.
APR_USE_SYSVSEM_SERIALIZE
). In particular:On the other hand, Python's
multiprocessing
package unfortunately does not use Sys-V semaphores but instead uses POSIX semaphores. That's unfortunate, because implementing POSIX semaphores in Gramine/SGX would require allowing untrusted shared memory (/dev/shm
), which will probably never happen...The text was updated successfully, but these errors were encountered: