-
Notifications
You must be signed in to change notification settings - Fork 260
[LibOS] Make POSIX locks interruptible #2522
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)
a discussion (no related file):
Can you amend your commit to have the text from the PR description? Sounds very helpful in understanding what this commit resolves.
a discussion (no related file):
I found this change very readable. I definitely like this Option 4.
LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):
* -EINTR. Either way, we delete the request. */ ret = req->resolved ? req->result : -EINTR;
But ret
could be something other than "interrupted" (if DkEventWait
failed in a weird way). I would expect to return this error then, or have die_or_inf_loop()
after the message "unexpected error from DkEventWait".
LibOS/shim/src/ipc/shim_ipc_fs_lock.c, line 54 at r1 (raw file):
void* data; int ret = ipc_send_msg_and_get_response_with_cancel( g_process_ipc_ids.leader_vmid, msg, &cancel_posix_lock, msg, &data);
I first wanted to suggest that only F_SETLKW
uses this with_cancel
version, but then I noticed that the man page allows EINTR
even for F_SETLK
/F_GETLK
. So you are correct here.
LibOS/shim/test/regression/fcntl_lock.c, line 458 at r1 (raw file):
if (pid == 0) { lock(F_RDLCK, 0, 100);
Why not F_WRLCK
? Just to cover more cases?
This makes it possible for POSIX locks (`F_SETLKW`) to return `-EINTR` when interrupted, e.g. after receiving a signal from `alarm()`. Signed-off-by: Paweł Marczewski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Can you amend your commit to have the text from the PR description? Sounds very helpful in understanding what this commit resolves.
Done.
LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
But
ret
could be something other than "interrupted" (ifDkEventWait
failed in a weird way). I would expect to return this error then, or havedie_or_inf_loop()
after the message "unexpected error from DkEventWait".
This is a race: if DkEventWait
failed in a weird way, the lock operation still might have finished with success, in which case returning something else than 0 would be very confusing for the user. That's why I didn't want to return the error from DkEventWait
.
Do you think die_or_inf_loop()
is a good solution here? I did not want to do that either, because a failed wait shouldn't have any impact on correctness of the lock operation, so I don't think it should be a fatal error.
Another solution would be to return the error from DkEventWait
, but only if the operation has not finished yet (i.e. ret = req->resolved ? req->result : ret_from_wait
). I'm not sure if that's useful, though.
LibOS/shim/src/ipc/shim_ipc_fs_lock.c, line 54 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I first wanted to suggest that only
F_SETLKW
uses thiswith_cancel
version, but then I noticed that the man page allowsEINTR
even forF_SETLK
/F_GETLK
. So you are correct here.
Graphene won't fail with EINTR
on these, though. We will send the cancel request, but it will not find any pending operation, so the original requests will proceed. The same will happen for F_SETLKW
for a lock that can be granted immediately.
LibOS/shim/test/regression/fcntl_lock.c, line 458 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why not
F_WRLCK
? Just to cover more cases?
Oh, I can do F_WRLCK
, same as in the other "eitr" test. Changed.
I'm not very consistent with these, though; some other tests check F_RDLCK
vs. F_WRLCK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)
LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
This is a race: if
DkEventWait
failed in a weird way, the lock operation still might have finished with success, in which case returning something else than 0 would be very confusing for the user. That's why I didn't want to return the error fromDkEventWait
.Do you think
die_or_inf_loop()
is a good solution here? I did not want to do that either, because a failed wait shouldn't have any impact on correctness of the lock operation, so I don't think it should be a fatal error.Another solution would be to return the error from
DkEventWait
, but only if the operation has not finished yet (i.e.ret = req->resolved ? req->result : ret_from_wait
). I'm not sure if that's useful, though.
OK, understood your point. Given that it shouldn't happen in practice, and we have a log warning for this case, I'm resolving.
Signed-off-by: Paweł Marczewski <[email protected]>
Closing as @boryspoplawski (https://github.com/oscarlab/graphene/issues/2517#issuecomment-877881232) wants not to use IPC responses for this. I'll be trying to rewrite this subsystem (and think how it ties with the sync engine). |
Description of the changes
This makes it possible for POSIX locks (
F_SETLKW
) to return-EINTR
when interrupted, e.g. after receiving a signal fromalarm()
. See #2517 for design discussion.Fixes #2510.
How to test this PR?
fcntl_lock
stress-ng
lockf test should no longer hang (see Stress-ng test lockf hangs with Graphene #2510)fcntl16
test is reenabled (the test depends on being able to interrupt a lock request, unfortunately it's racy and the interrupt is not always necessary)This change is