Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/tcp: attempt to fix miltuthreaded progress #6

Closed
wants to merge 7 commits into from

Conversation

ooststep
Copy link
Owner

@ooststep ooststep commented Feb 7, 2024

No description provided.

@ooststep ooststep force-pushed the tcp-cq-progress branch 2 times, most recently from bd279ff to 62184c6 Compare February 8, 2024 18:22
@j-xiong
Copy link

j-xiong commented Feb 9, 2024

The progress of CQs and counters can't be separated. They are often the same when bound to the same endpoint(s). We should consider the progress as a "completion object" that is associated with one or more CQs and/or counters and their bound endpoints. That makes it more complicated to assign progress to CQs and counters.

One way to handle it is to assign progress to CQs or counters when they are bound to the endpoint, or when the endpoint is enabled. If conflict happens (more than one progress objects are related to one CQ, counter or endpoint), we have to revert to using a single progress object in the domain (can this be done safely?).

The domain would maintain a list of progress objects so one can make progress for the entire domain.

@ooststep ooststep changed the title prov/tcp: move srx progress ownership to cq prov/tcp: attempt to fix miltuthreaded progress Feb 27, 2024
having a single domain progress engine forces multi threaded
applications to synchronize on the domain's progress lock

instead, move progress to be owned by the ep (or srx)
we need to store the progress or have some way to reference
back to the progress that the saved message belongs to
ofi_cq_progress and xnet_progress both attempt to lock
the ep_list lock so we need to overwrite ofi_ep_bind_cq to
use fid_list_insert with a NULL lock to avoid the deadlock
when closing an ep, we're already locking the ep list lock
to circumvent attempting to lock again when unbinding from the cq,
preempt the unbind without the lock to avoid it down the call stack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants