Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

LibOS/ipc, async helper: keep refcount to helper thread on exit #321

Merged
merged 1 commit into from
Feb 5, 2019

Conversation

yamahata
Copy link
Contributor

@yamahata yamahata commented Nov 26, 2018

put_thread(self) to free resources by itself is problematic because
the exiting thread is still using them.
Actually put_thread(self) logic in ipc/async helper may cause SEGV
depending on who is the last to put_thread().
If the thread is the last one to free thread area, lock/unlock after
freeing thread area causes segv due to debug message.

Signed-off-by: Isaku Yamahata [email protected]


This change is Reviewable

@oscar-bot
Copy link

Can one of the admins verify this patch?

2 similar comments
@oscar-bot
Copy link

Can one of the admins verify this patch?

@oscar-bot
Copy link

Can one of the admins verify this patch?

@yamahata yamahata closed this Dec 5, 2018
@yamahata yamahata deleted the libos-async-helper-exit branch December 5, 2018 19:06
@yamahata yamahata restored the libos-async-helper-exit branch December 5, 2018 19:07
@yamahata yamahata reopened this Dec 5, 2018
Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify the design: The idea here is that the IPC helper should never be the last one to exit. Rather, the IPC helper thread, when terminated, passes its thread object back to the last "real" thread, which frees the resources?

Reviewable status: 0 of 5 files reviewed, all discussions resolved

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @yamahata)

a discussion (no related file):
+1 to @donporter's question.
Also, could you please write in commit messages what particular commit does and why, not just what's bug? This way it's easier for reviewers to understand it and more consistent with our repository.


Copy link
Contributor Author

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

+1 to @donporter's question.
Also, could you please write in commit messages what particular commit does and why, not just what's bug? This way it's easier for reviewers to understand it and more consistent with our repository.

sure. How about adding the following.

So the solution is to keep reference count of those two helper threads by a (usually master) thread so that the (master) thread will be the last thread to release the reference count. So self destruction can be avoid.


Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @mkow)

a discussion (no related file):

Previously, yamahata wrote…

sure. How about adding the following.

So the solution is to keep reference count of those two helper threads by a (usually master) thread so that the (master) thread will be the last thread to release the reference count. So self destruction can be avoid.

This works for me. Feel free to add some of the verbiage from my question if you like. I think some comments are definitely in order here. Thank you for the fix!


@yamahata yamahata force-pushed the libos-async-helper-exit branch from 8e2de78 to cc4b1c5 Compare January 2, 2019 23:11
Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r2.
Reviewable status: 2 of 5 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required) (waiting on @mkow and @yamahata)


LibOS/shim/src/ipc/shim_ipc_helper.c, line 1085 at r2 (raw file):

 * on success, the reference to the helper thread is returned with
 * reference count incremented.
 * It's caller the responsibility to wait for its exit and release the

Typo (and remove pronoun) - please rephrase as "The caller is responsible to wait for the IPC helper thread to exit ......"


LibOS/shim/src/sys/shim_exit.c, line 147 at r2 (raw file):

    struct shim_thread * async_thread = terminate_async_helper();
    if (async_thread)
        /* TODO: wait for the thread to exit in host */

This seems like an important to-do. Perhaps at least open an issue tracker for this one? It is probably no worse than it is now, but it seems to re-introduce the same bug?


LibOS/shim/src/sys/shim_exit.c, line 153 at r2 (raw file):

    int ret = exit_with_ipc_helper(true, &ipc_thread);
    if (ipc_thread)
        /* TODO: wait for the thread to exit in host */

Same comment.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required) (waiting on @yamahata)

a discussion (no related file):

Previously, donporter (Don Porter) wrote…

This works for me. Feel free to add some of the verbiage from my question if you like. I think some comments are definitely in order here. Thank you for the fix!

Looks good now :)



LibOS/shim/src/shim_async.c, line 322 at r2 (raw file):

/*
 * On succes, the reference to the thread of async helper is returned with

succes -> success

Copy link
Contributor Author

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ) (waiting on @donporter and @mkow)


LibOS/shim/src/shim_async.c, line 322 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

succes -> success

Done.


LibOS/shim/src/ipc/shim_ipc_helper.c, line 1085 at r2 (raw file):

Previously, donporter (Don Porter) wrote…

Typo (and remove pronoun) - please rephrase as "The caller is responsible to wait for the IPC helper thread to exit ......"

Done.


LibOS/shim/src/sys/shim_exit.c, line 147 at r2 (raw file):

Previously, donporter (Don Porter) wrote…

This seems like an important to-do. Perhaps at least open an issue tracker for this one? It is probably no worse than it is now, but it seems to re-introduce the same bug?

Totally agreed.
#440


LibOS/shim/src/sys/shim_exit.c, line 153 at r2 (raw file):

Previously, donporter (Don Porter) wrote…

Same comment.

Done.

Copy link
Contributor Author

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ) (waiting on @donporter and @mkow)


LibOS/shim/src/sys/shim_exit.c, line 147 at r2 (raw file):

Previously, yamahata wrote…

Totally agreed.
#440

Done.

donporter
donporter previously approved these changes Jan 28, 2019
Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ) (waiting on @mkow)

mkow
mkow previously approved these changes Jan 28, 2019
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required)

@mkow
Copy link
Member

mkow commented Jan 28, 2019

Retest this please

@donporter
Copy link
Contributor

@yamahata Please go ahead and squash + rebase to current master. Thanks!

@yamahata yamahata dismissed stale reviews from mkow and donporter via 0585959 February 5, 2019 08:08
@yamahata yamahata force-pushed the libos-async-helper-exit branch from f497750 to 0585959 Compare February 5, 2019 08:08
mkow
mkow previously approved these changes Feb 5, 2019
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)

donporter
donporter previously approved these changes Feb 5, 2019
Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@donporter
Copy link
Contributor

Ok to test

Using put_thread(self) to free resources by itself is problematic because the exiting thread is still using them. Actually, put_thread(self) logic in ipc/async helper may cause SEGV depending on who is the last to put_thread(). If the thread is the last one to free thread area, lock/unlock after freeing thread area causes SEGV due to debug message.

The solution is to keep reference count of those two helper threads by a (usually master) thread so that the (master) thread will be the last thread to release the reference count. This way self destruction can be avoided.

To completely fix this issue, helper threads need to be waited to exit.
The issue is tracked by gramineproject#440.

Signed-off-by: Isaku Yamahata <[email protected]>
@mkow mkow dismissed stale reviews from donporter and themself via e6212cc February 5, 2019 14:28
@mkow mkow force-pushed the libos-async-helper-exit branch from 0585959 to e6212cc Compare February 5, 2019 14:28
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)

@mkow
Copy link
Member

mkow commented Feb 5, 2019

Retest this please

Copy link
Contributor

@donporter donporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mkow mkow merged commit e6212cc into gramineproject:master Feb 5, 2019
@yamahata yamahata deleted the libos-async-helper-exit branch February 28, 2019 20:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants