Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebpf: replace the existing program on update #352

Merged
merged 2 commits into from
Jun 11, 2020

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented May 5, 2020

on a container update, make sure the existing eBPF program is
completely replaced.

Signed-off-by: Giuseppe Scrivano [email protected]

}

memset (&attr, 0, sizeof (attr));
attr.attach_type = BPF_CGROUP_DEVICE;
attr.target_fd = dirfd;
attr.attach_bpf_fd = fd;
#ifdef BPF_F_REPLACE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add kernel requirement (v5.6) in the code comment
torvalds/linux@7dd68b3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I am going to rework the patch and handle better concurrent updates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I'll have to move the check at runtime, not build time

@giuseppe giuseppe marked this pull request as draft May 7, 2020 07:25
@giuseppe giuseppe force-pushed the replace-device branch 3 times, most recently from e00aba6 to 84949c9 Compare May 10, 2020 20:08
@giuseppe giuseppe marked this pull request as ready for review May 10, 2020 20:09
@rhatdan
Copy link
Member

rhatdan commented May 11, 2020

LGTM
@AkihiroSuda PTAL

const int MAX_ATTEMPTS = 20;
int attempt;

for (attempt = 0;; attempt++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe attempt = 0; attempt < MAX_ATTEMPTS; attempt++ for readability

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rewrite using the condition inside of the for loop but there is some extra logic that decides whether to retry or not.

@AkihiroSuda
Copy link

cc @cyphar PTAL

@AkihiroSuda
Copy link

@cyphar LGTY?

@giuseppe
Copy link
Member Author

fine to merge?

Copy link

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay as a first draft, but I think we should rethink how we're generating these programs. It might be much nicer in the long term to store the device rules in an eBPF map (which we can atomically swap out if we use a map-of-maps and BPF_F_LOCK). That way we would never have to replace the program, and could just update the eBPF map whenever we needed to. It would also allow us to read the current rule-set from userspace.

{
#ifdef BPF_F_REPLACE
memset (&attr, 0, sizeof (attr));
attr.prog_id = progs[0];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic (replacing whatever program is attached) and the corresponding limitations (not being able to replace a program if there is more than one) seems a bit fishy. I think a much better solution would be to either pin the program, or to store the id and load-time into some state file (which I assume crun has) and then to use that to fetch the original program.

To be fair, it's unlikely anybody else will be attaching BPF_CGROUP_DEVICE programs to a container but this seems a bit fishy to me.

return crun_make_error (err, errno, "cannot open existing eBPF program");
}
#else
return crun_make_error (err, 0, "eBPF program already configured");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BPF_F_REPLACE isn't supported -- which is the case for most kernels -- you should insert the new program then remove the old one. It won't be atomic, but it'll be much better than this.

@cyphar
Copy link

cyphar commented May 21, 2020

Also you mentioned you'd do a BPF_F_REPLACE check at runtime -- did you want to include that in this PR?

@giuseppe
Copy link
Member Author

This is okay as a first draft, but I think we should rethink how we're generating these programs. It might be much nicer in the long term to store the device rules in an eBPF map (which we can atomically swap out if we use a map-of-maps and BPF_F_LOCK). That way we would never have to replace the program, and could just update the eBPF map whenever we needed to. It would also allow us to read the current rule-set from userspace.

Thanks for the review I had initially tried to implement the devices controller using an eBPF map, but I had issues dealing with wildcards and I am not sure how we could handle them without enumeration or using a more complex map of maps kind of data structure. Do you have any suggestions for it?

In the current implementation, I've assumed that there is a single writer/owner for the cgroup, so there should never been two programs installed. That is the reason for the weird check.

Multiple writers were somehow supported with pinning in the previous versions of this PR but there were also some potential race conditions with multiple crun update running at the same time.

In this version I've tried to deal with concurrent updates and avoid any potential of race conditions.

Currently there is no fallback when BPF_F_REPLACE is missing, just a nicer error message.

Would the fallback for BPF_F_REPLACE look like the following?

  1. get a file lock
  2. read old ID from status or pinned path
  3. write new program
  4. delete old program by ID
  5. if the delete fails: try to delete the new installed program and goto 7
  6. store the new ID/pin it
  7. release the lock

What I don't like is that if something goes wrong we can end up with multiple eBPF programs installed that we lose track of.

@giuseppe giuseppe force-pushed the replace-device branch 3 times, most recently from b31711d to 42adc81 Compare May 23, 2020 21:06
@giuseppe
Copy link
Member Author

I've changed the implementation and pushed a new version.

It still assumes that crun is the only writer to the cgroup, any other eBPF program is removed on an update.

BPF_F_REPLACE is used only when there is a single program already installed for the cgroup.

In any other case, now crun first stores what programs are already installed for the cgroup, attaches the new program, then it removes all the programs previously installed. Multiple crun update racing with each other are still safe since there is always at least one program attached to the cgroup.

@AkihiroSuda
Copy link

@cyphar LGTY?

@AkihiroSuda
Copy link

@cyphar PTAL

Signed-off-by: Giuseppe Scrivano <[email protected]>
on a container update, make sure the existing eBPF programs are
completely replaced.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member Author

@rhatdan this is ready to go, if there are other comments we can address them later.

@rhatdan
Copy link
Member

rhatdan commented Jun 11, 2020

LGTM

@rhatdan rhatdan merged commit f09475b into containers:master Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants