-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Make get_new_id
immediately move ID ownership if needed
#109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @boryspoplawski)
a discussion (no related file):
For my own understanding, is this a scenario for the previous bug:
[P1 is the IPC leader]
P1 forks P2
P2 spawns new thread T3
P2 forks P4 simulteneously T3 exits
[P2 now removes from owned P4 and T3, which triggers "release range 3-4" IPC msg]
P1 receives "release range 3-4" and marks these two as unused
P4 starts and performs `ipc_change_id_owner(4, <vmid>)`
P1 receives ipc_change_id_owner(4) and panics -- ID 4 unknown!
And now your PR fixes it like this:
[P1 is the IPC leader]
P1 forks P2
P2 spawns new thread T3
P2 forks P4 simulteneously T3 exits
P2 removes from owned T3
P2 performs `ipc_change_id_owner(4, <child-vmid>)`
P1 never receives "release range 4" with this new PR, and ID 4 is always marked as used
Is my understanding correct?
LibOS/shim/src/ipc/shim_ipc_pid.c, line 207 at r1 (raw file):
if (range->start != start || range->end != end) { BUG(); }
Looks like a remnant from your gdb session? Not blocking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
For my own understanding, is this a scenario for the previous bug:
[P1 is the IPC leader] P1 forks P2 P2 spawns new thread T3 P2 forks P4 simulteneously T3 exits [P2 now removes from owned P4 and T3, which triggers "release range 3-4" IPC msg] P1 receives "release range 3-4" and marks these two as unused P4 starts and performs `ipc_change_id_owner(4, <vmid>)` P1 receives ipc_change_id_owner(4) and panics -- ID 4 unknown!
And now your PR fixes it like this:
[P1 is the IPC leader] P1 forks P2 P2 spawns new thread T3 P2 forks P4 simulteneously T3 exits P2 removes from owned T3 P2 performs `ipc_change_id_owner(4, <child-vmid>)` P1 never receives "release range 4" with this new PR, and ID 4 is always marked as used
Is my understanding correct?
Almost. There is a difference between releasing and removing from owned IDs.
[P2 now removes from owned P4 and T3, which triggers "release range 3-4" IPC msg]
- P4 was only removed from owned, while T3
was marked as no longer used (hence releasable). This caused the range to be split locally into three ranges: 3-3
, 4-4
and 5-X
The message that went to IPC leader was release range 3-3
, but the leader only knew about range 3-X
, and due to how range lookups currently work, it found this range and deleted it anyway.
Also note these steps happen under a lock:
P2 removes from owned T3
P2 performs `ipc_change_id_owner(4, <child-vmid>)`
LibOS/shim/src/ipc/shim_ipc_pid.c, line 207 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Looks like a remnant from your gdb session? Not blocking though.
I've actually debugged this purely by reading code :)
No, this change is intended. It's just like a release-build assert. I wanted to add this, since the culprit of the original issue was desynchronized state and this was the only place it could be detected (we had an assert, but seems bug reporters ran this in release build).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)
a discussion (no related file):
but the leader only knew about range 3-X, and due to how range lookups currently work, it found this range and deleted it anyway.
Oooh! Ok. That's not a bug? Anyway, resolving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
but the leader only knew about range 3-X, and due to how range lookups currently work, it found this range and deleted it anyway.
Oooh! Ok. That's not a bug? Anyway, resolving.
What do you mean? There is nothing sensible we can do in case of a desync, so I've opted for a loud BUG()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)
a discussion (no related file):
Previously, boryspoplawski (Borys Popławski) wrote…
What do you mean? There is nothing sensible we can do in case of a desync, so I've opted for a loud
BUG()
.
Well, yes, true. Nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required) (waiting on @boryspoplawski)
a issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required) (waiting on @mkow)
Previously, mkow (Michał Kowalczyk) wrote…
a issues
TODO: at rebase
addd1a5
to
91ac1a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @mkow)
Previously, boryspoplawski (Borys Popławski) wrote…
TODO: at rebase
Done.
Previously allocating ID for a new process did not immediately change the ownership of this ID (it happened later in the child process). This could cause issues when one thread does a `fork` syscall and another exits, releasing its own thread ID - IPC leader might get notified to release an ID range, which it thinks is a part of a bigger one, since it was not yet notified about the ID ownership change. This commit causes allocating a new ID to also immediately notify IPC leader about ownership change, atomically w.r.t. other threads in the current process. Signed-off-by: Borys Popławski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)
Jenkins, retest Jenkins-Direct-Sanitizers please (sendfile09 from LTP timed out, known issue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
Previously allocating ID for a new process did not immediately change the ownership of this ID (it happened later in the child process). This could cause a issues when one thread issues a
fork
syscall and another exits, releasing its own thread ID - IPC leader might get notified to release an ID range, which it thinks is a part of a bigger one, since it was not yet notified about the ID ownership change.This commit causes allocating a new ID to also immediately notify IPC leader about ownership change, atomically w.r.t. other threads in the current process.
This change is