Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[LibOS] Make POSIX locks interruptible #2522

Closed
wants to merge 2 commits into from
Closed

Conversation

pwmarcz
Copy link
Contributor

@pwmarcz pwmarcz commented Jul 8, 2021

Description of the changes

This makes it possible for POSIX locks (F_SETLKW) to return -EINTR when interrupted, e.g. after receiving a signal from alarm(). See #2517 for design discussion.

Fixes #2510.

How to test this PR?

  • I added some test cases to fcntl_lock
  • The stress-ng lockf test should no longer hang (see Stress-ng test lockf hangs with Graphene #2510)
  • LTP fcntl16 test is reenabled (the test depends on being able to interrupt a lock request, unfortunately it's racy and the interrupt is not always necessary)

This change is Reviewable

@pwmarcz pwmarcz marked this pull request as ready for review July 8, 2021 13:11
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)

a discussion (no related file):
Can you amend your commit to have the text from the PR description? Sounds very helpful in understanding what this commit resolves.


a discussion (no related file):
I found this change very readable. I definitely like this Option 4.



LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):

         * -EINTR. Either way, we delete the request.
         */
        ret = req->resolved ? req->result : -EINTR;

But ret could be something other than "interrupted" (if DkEventWait failed in a weird way). I would expect to return this error then, or have die_or_inf_loop() after the message "unexpected error from DkEventWait".


LibOS/shim/src/ipc/shim_ipc_fs_lock.c, line 54 at r1 (raw file):

    void* data;
    int ret = ipc_send_msg_and_get_response_with_cancel(
        g_process_ipc_ids.leader_vmid, msg, &cancel_posix_lock, msg, &data);

I first wanted to suggest that only F_SETLKW uses this with_cancel version, but then I noticed that the man page allows EINTR even for F_SETLK/F_GETLK. So you are correct here.


LibOS/shim/test/regression/fcntl_lock.c, line 458 at r1 (raw file):

    if (pid == 0) {
        lock(F_RDLCK, 0, 100);

Why not F_WRLCK? Just to cover more cases?

This makes it possible for POSIX locks (`F_SETLKW`) to return `-EINTR`
when interrupted, e.g. after receiving a signal from `alarm()`.

Signed-off-by: Paweł Marczewski <[email protected]>
Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you amend your commit to have the text from the PR description? Sounds very helpful in understanding what this commit resolves.

Done.



LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

But ret could be something other than "interrupted" (if DkEventWait failed in a weird way). I would expect to return this error then, or have die_or_inf_loop() after the message "unexpected error from DkEventWait".

This is a race: if DkEventWait failed in a weird way, the lock operation still might have finished with success, in which case returning something else than 0 would be very confusing for the user. That's why I didn't want to return the error from DkEventWait.

Do you think die_or_inf_loop() is a good solution here? I did not want to do that either, because a failed wait shouldn't have any impact on correctness of the lock operation, so I don't think it should be a fatal error.

Another solution would be to return the error from DkEventWait, but only if the operation has not finished yet (i.e. ret = req->resolved ? req->result : ret_from_wait). I'm not sure if that's useful, though.


LibOS/shim/src/ipc/shim_ipc_fs_lock.c, line 54 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I first wanted to suggest that only F_SETLKW uses this with_cancel version, but then I noticed that the man page allows EINTR even for F_SETLK/F_GETLK. So you are correct here.

Graphene won't fail with EINTR on these, though. We will send the cancel request, but it will not find any pending operation, so the original requests will proceed. The same will happen for F_SETLKW for a lock that can be granted immediately.


LibOS/shim/test/regression/fcntl_lock.c, line 458 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not F_WRLCK? Just to cover more cases?

Oh, I can do F_WRLCK, same as in the other "eitr" test. Changed.

I'm not very consistent with these, though; some other tests check F_RDLCK vs. F_WRLCK.

dimakuv
dimakuv previously approved these changes Jul 9, 2021
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)


LibOS/shim/src/fs/shim_fs_lock.c, line 517 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

This is a race: if DkEventWait failed in a weird way, the lock operation still might have finished with success, in which case returning something else than 0 would be very confusing for the user. That's why I didn't want to return the error from DkEventWait.

Do you think die_or_inf_loop() is a good solution here? I did not want to do that either, because a failed wait shouldn't have any impact on correctness of the lock operation, so I don't think it should be a fatal error.

Another solution would be to return the error from DkEventWait, but only if the operation has not finished yet (i.e. ret = req->resolved ? req->result : ret_from_wait). I'm not sure if that's useful, though.

OK, understood your point. Given that it shouldn't happen in practice, and we have a log warning for this case, I'm resolving.

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Jul 19, 2021

Closing as @boryspoplawski (https://github.com/oscarlab/graphene/issues/2517#issuecomment-877881232) wants not to use IPC responses for this. I'll be trying to rewrite this subsystem (and think how it ties with the sync engine).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stress-ng test lockf hangs with Graphene
2 participants