Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing the last program fd will detach a link #4135

Open
lmb opened this issue Jan 13, 2025 · 4 comments
Open

Closing the last program fd will detach a link #4135

lmb opened this issue Jan 13, 2025 · 4 comments
Assignees
Labels
bug Something isn't working P2
Milestone

Comments

@lmb
Copy link
Collaborator

lmb commented Jan 13, 2025

Describe the bug

Closing the last reference to a program acts like an implicit link detach. See

REQUIRE(hook.attach_link(program_fd, nullptr, 0, &link) == EBPF_SUCCESS);
// Call bpf_object__close() which will close the program fd. That should
// detach the program from the hook and unload the program.
bpf_object__close(unique_object.release());

This is the opposite of what happens on Linux: a live link (pinned or otherwise) will keep the program alive until either the last reference is closed or the link is explicitly detached.

According to @saxena-anurag the ownership in the runtime is currently from program to link, while it should probably be the link holding on to a refcount on the program.

OS information

No response

Steps taken to reproduce bug

  1. Load a program
  2. Attach it via a link
  3. Close the program

Expected behavior

The attachment should persist until the link is closed / detached / unpinned.

Actual outcome

The attachment is torn down.

Additional details

No response

@lmb lmb added the bug Something isn't working label Jan 13, 2025
@shankarseal shankarseal added triaged Discussed in a triage meeting P2 labels Jan 13, 2025
@shankarseal shankarseal added this to the 2502 milestone Jan 13, 2025
@shankarseal shankarseal modified the milestones: 2502, 2504, 2503, Backlog, 2501 Jan 22, 2025
@shankarseal shankarseal modified the milestones: 2502, 2503, 2504, 2505 Jan 31, 2025
zodiacon added a commit to zodiacon/ebpf-for-windows that referenced this issue Feb 20, 2025
@lmb lmb assigned zodiacon and unassigned nigriMSFT Feb 21, 2025
@lmb
Copy link
Collaborator Author

lmb commented Mar 4, 2025

There is an additional wrinkle here that I just discovered: BPF_MAP_TYPE_PROG_ARRAY does not hold on to a refcount of the program either. On Linux this is kind of crucial to be able to build tail call chains.

prog1 -> prog_array -> prog2
                                     -> prog3

If userspace doesnt hold on to prog2 and prog3 somehow the entries are removed from the map and the tail call fails.

The semantics on Linux are that holding on to a prog_array will keep the refcount on programs alive. Losing the last user refcount on the prog_array triggers dropping the refcount on prog2 and prog3. To put it another way, you need to hold on to prog1 and prog_array in user space (or via pinning) for things to work.

Not sure we want to replicate this 100% but user space shouldn't have to hold onto prog2 and prog3 imo.

@lmb lmb removed the triaged Discussed in a triage meeting label Mar 4, 2025
@saxena-anurag
Copy link
Contributor

There is an additional wrinkle here that I just discovered: BPF_MAP_TYPE_PROG_ARRAY does not hold on to a refcount of the program either. On Linux this is kind of crucial to be able to build tail call chains.

prog1 -> prog_array -> prog2
                                     -> prog3

If userspace doesnt hold on to prog2 and prog3 somehow the entries are removed from the map and the tail call fails.

The semantics on Linux are that holding on to a prog_array will keep the refcount on programs alive. Losing the last user refcount on the prog_array triggers dropping the refcount on prog2 and prog3. To put it another way, you need to hold on to prog1 and prog_array in user space (or via pinning) for things to work.

Not sure we want to replicate this 100% but user space shouldn't have to hold onto prog2 and prog3 imo.

So, we had this design earlier where prog_array map takes a reference on the program object when the prog is inserted in the prog array map, but that causes issue of circular references. Hence the design was changed later to have "weak references" from prog_array_map --> prog.

Example of circular reference:
prog1 --> prog_array --> prog2 --> prog_array --> prog3.
Prog2 in the above case will have a reference on prog_array map as it accesses it, but prog_array map will also have a reference on prog2 as prog2 is an entry in prog_array map, causing circular references.

Can you confirm once if Linux also has the issue of circular references?

@lmb
Copy link
Collaborator Author

lmb commented Mar 5, 2025

@saxena-anurag Linux has the same issue, but chose to fix it differently. For program arrays, user references and kernel references are tracked individually. Once the last uref goes away the map is cleared. Details are in the commit message: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c9da161c6517ba12154059d3b965c2cbaf16f90f

@saxena-anurag
Copy link
Contributor

@saxena-anurag Linux has the same issue, but chose to fix it differently. For program arrays, user references and kernel references are tracked individually. Once the last uref goes away the map is cleared. Details are in the commit message: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c9da161c6517ba12154059d3b965c2cbaf16f90f

Just to confirm -- so the user app needs to either keep the FD to the prog array map open or pin the prog array map?
And does pinning the prog array map count as uref?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2
Projects
None yet
Development

No branches or pull requests

5 participants