Skip to content

Conversation

@wenyongh
Copy link
Contributor

Fix the issue that possibly joining a thread after it was detached which
might causes joining hang, by adding wait_count to indicate whether
a thread needs to detach itself or not when it exits.
Add checks for the input exec_env for cluster's join/detach/cancel thread.

node = bh_list_elem_next(node);
}

cluster = bh_list_elem_next(cluster);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we search the whole cluster list?

Seems we should only search the current cluster, otherwise we may get some thread of other wasm instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how to search the current cluster? Seems that we can only get it from exec_env->cluster, right? But if the exec_env has been removed from cluster and destroyed, accessing exec_env will cause crash.

How about, when a cluster is found that it contains the exec_env, then we check whether exec_env->cluster == cluster again? To ensure that the exec_env belongs to that cluster?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about, when a cluster is found that it contains the exec_env, then we check whether exec_env->cluster == cluster again? To ensure that the exec_env belongs to that cluster?

Sounds reasonable, and I think an assert is enough because this should never happen, when it really happens, that means the whole system is in an unstable state

Copy link
Collaborator

@xujuntwt95329 xujuntwt95329 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenyongh wenyongh merged commit 092efbf into bytecodealliance:main Jan 17, 2022
vickiegpt pushed a commit to vickiegpt/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
Fix the issue that joining a detached thread might result in joining hang,
resolve the issue by adding wait_count for a thread's exec_env to indicate
whether a thread needs to detach itself or not when it exits.

And add checks for the input exec_env for cluster's join/detach/cancel thread.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants