Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpfman-operator: Support Load/Attach Split #347

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anfredette
Copy link
Contributor

@anfredette anfredette commented Dec 9, 2024

This commit introduces support for bpfman’s load-attach split from bpfman PR
#1354
and includes major
refactoring and CRD changes to align with this new model.

Key Changes:

  • Remove "*Program" CRDs and related code in favor of the BpfApplication CRD,
    which can contain multiple eBPF programs.
  • Replace BpfProgram CRD with BpfApplicationState, which stores per-node state
    information for all eBPF programs in a BpfApplication.
  • Rename CRDs for better alignment with Kubernetes conventions:
    • BpfApplication => ClusterBpfApplication (Cluster-scoped)
    • BpfNsApplication => BpfApplication (Namespace-scoped)
  • The BpfApplication CRDs have been modified to include an optional list of
    attach points for each program, enabling pre-loading programs before
    attachment and supporting dynamic attachment updates.
  • Redesign the bpfman-agent controller to natively support BpfApplications.

@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from e352df4 to 0531a5f Compare December 10, 2024 22:15
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 0% with 740 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@adda074). Learn more about missing BASE report.
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
apis/v1alpha1/zz_generated.deepcopy.go 0.00% 651 Missing ⚠️
apis/v1alpha1/shared_types.go 0.00% 34 Missing ⚠️
cmd/bpfman-operator/main.go 0.00% 16 Missing ⚠️
apis/v1alpha1/bpf_application_state_types.go 0.00% 14 Missing ⚠️
...is/v1alpha1/cluster_bpf_application_state_types.go 0.00% 14 Missing ⚠️
apis/v1alpha1/zz_generated.register.go 0.00% 6 Missing ⚠️
cmd/bpfman-agent/main.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #347   +/-   ##
=======================================
  Coverage        ?   31.27%           
=======================================
  Files           ?       66           
  Lines           ?     7630           
  Branches        ?        0           
=======================================
  Hits            ?     2386           
  Misses          ?     5028           
  Partials        ?      216           
Flag Coverage Δ
unittests 31.27% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from 113089c to 60d0115 Compare December 10, 2024 22:31
@anfredette
Copy link
Contributor Author

anfredette commented Dec 10, 2024

All of the examples will need to be updated with the new CRD format before the Kubernetes integration tests will pass.

Copy link
Contributor

mergify bot commented Dec 18, 2024

@anfredette, this pull request is now in conflict and requires a rebase.

Copy link
Contributor

mergify bot commented Jan 9, 2025

@anfredette, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Jan 9, 2025
@mergify mergify bot removed the needs-rebase label Jan 9, 2025
@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from 10366ab to 0c7f942 Compare January 9, 2025 14:57
Copy link
Contributor

mergify bot commented Jan 22, 2025

@anfredette, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Jan 22, 2025
@anfredette anfredette changed the title WIP: Initial pass at BpfApplication CRD update for load/attach split WIP: New bpfman-operator design and CRDs to support the load/attach split Jan 22, 2025
@anfredette
Copy link
Contributor Author

This is still a WIP, but it's working for cluster-scoped XDP programs.

Here's a sample of the kubectl output from installing and deleting an XDP program:
https://gist.github.com/anfredette/4433aaa58518db5d4d14ed2bf4218f55

If you're looking at the code, the new operator and agent code is in the app-operator and app-agent directories, while the old code is still in bpfman-operator and bpfman-agent. I plan to eventually delete the old directories and rename the new ones, but I'm keeping the old code around for comparison and to pull from as needed.

@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from 2799666 to d68ab2a Compare January 22, 2025 17:19
@mergify mergify bot removed the needs-rebase label Jan 22, 2025
@anfredette anfredette changed the title WIP: New bpfman-operator design and CRDs to support the load/attach split Initial bpfman-operator support for Load/Attach Split Jan 23, 2025
@anfredette anfredette force-pushed the load-attach-split branch 3 times, most recently from 69a1b95 to 9e5cb54 Compare January 24, 2025 22:51
Copy link
Contributor

mergify bot commented Feb 11, 2025

@anfredette, this pull request is now in conflict and requires a rebase.

Copy link
Contributor

mergify bot commented Feb 11, 2025

@anfredette, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Feb 11, 2025
@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from cd9642e to fe2ed08 Compare February 19, 2025 18:59
@mergify mergify bot removed the needs-rebase label Feb 19, 2025
Copy link
Contributor

@dave-tucker dave-tucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback on the API. The naming comments we can probably handle in a follow up as this API change isn't going to be backwards compatible anyway.
We should however fix the fentry/fexit and preferably drop type: Xdp before merge IMHO

PROJECT Outdated
controller: true
domain: bpfman.io
kind: TcxNsProgram
kind: BpfNsApplication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed this would be BpfApplication and ClusterBpfApplication
But IIRC we wanted to get Billy's change in first before we changed the naming since this API change is the breaking one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PROJECT Outdated
controller: true
domain: bpfman.io
kind: BpfNsApplication
kind: BpfNsApplicationState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would then be ClusterBpfApplicationState

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

programs:
- type: Kprobe
- bpffunctionname: kprobe_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like bpffunctioname. It should either be bpfFunctionName or just functionName since we're in a Bpf CRD.. or hey, even just function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made it "bpffunctionname" after much discussion a while back when we started using the bpf function name instead of a name defined in the section declaration. We also wanted to disambiguate this function name from the name of the function that some bpf programs attach to.

That said, I don't like it either :-). I changed "bpfunctionname" to just "name", and the "func_name" and "function_name" to just "function".

programs:
- type: Kprobe
- bpffunctionname: kprobe_test
type: Kprobe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, is case significant here? It seems to me that kprobe or KPROBE or Kprobe should probably work. I'd prefer to use the lowercase version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. Left unchecked, I don't think Kubernetes would care, but it is case sensitive in that kubernetes passes these values through to the controller code in the case that the user enters them. However, we also have validation logic that only allows one case, so in our case (no pun intended) it is case sensitive. If we wanted to support case insensitivity, we'd have to handle it in our code.

I made everything lower case. We can disuss whether to use camelCase instead.

offset: 0
retprobe: false
- bpffunctionname: kretprobe_test
type: Kprobe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose type here?
It can be inferred based on which of the optional fields are set so it seems redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented this as union, and Type is the "unionDiscriminator". This tells us which of the "unionMembers" will be populated, and validation logic ensures that if the Type is X, the program info for X is populated. @msherif1234 used this pattern for the original BpfApplication CRD. There may be another way to implement the API, but this works.

priority: 55
direction: ingress
- type: TCX
attach_points:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be links as we're expressing desired state.
The verb might be attach, but we're expecting links in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

offset: 0
retprobe: false
- type: Tracepoint
attach_points:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 inconsistent rules re: casing. k8s CRDs are supposed to be camelCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the JSON representation supposed to be camelCase?

retprobe: false
- type: Tracepoint
attach_points:
- func_name: try_to_wake_up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above: funcName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now just function

direction: ingress
- type: TCX
attach_points:
- interfaceselector:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interfaceSelector. I'm not going to comment on this anymore.
We seriously need to take a pass through the API to fix the inconsistencies.
We can do that in this PR, or later

Comment on lines 112 to 118
- bpffunctionname: fentry_test
type: Fentry
fentry:
function_name: do_unlinkat
attach: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- bpffunctionname: fentry_test
type: Fentry
fentry:
function_name: do_unlinkat
attach: true
- function: fentry_test
fentry:
links:
- function_name: do_unlinkat

This shouldn't deviate from the pattern in the other program types.
The fact we need the fn_name for load() and we can only attach it to one place are implementation details we shouldn't be exposing in the API.

Copy link
Contributor Author

@anfredette anfredette Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fentry and Fexit now follow the same pattern (mostly).

Copy link
Contributor

mergify bot commented Feb 20, 2025

@anfredette, this pull request is now in conflict and requires a rebase.

@mergify mergify bot added the needs-rebase label Feb 20, 2025
@anfredette anfredette force-pushed the load-attach-split branch 4 times, most recently from c919dac to abaee3e Compare February 25, 2025 13:37
@mergify mergify bot removed the needs-rebase label Feb 25, 2025
@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from d28f8a9 to 8e72399 Compare February 25, 2025 23:27
Copy link
Contributor

@msherif1234 msherif1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about renaming the agent files with cl is that for clusterscope ?

@@ -42,15 +42,9 @@ const (
// ProgTypeKprobe refers to the Kprobe program type.
ProgTypeKprobe EBPFProgType = "Kprobe"

// ProgTypeKretprobe refers to the Kprobe program type.
ProgTypeKretprobe EBPFProgType = "Kretprobe"

// ProgTypeUprobe refers to the Uprobe program type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why u deleted the kproberet and uproberet program types ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They weren't being used.

// MapOwnerSelector is used to select the loaded eBPF program this eBPF program
// will share a map with. The value is a label applied to the BpfProgram to select.
// The selector must resolve to exactly one instance of a BpfProgram on a given node
// or the eBPF program will not load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer have an object called BpfProgram so we should avoid using it everywhere in code and comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of a lot of it, but there's still more work to do.

return nil, fmt.Errorf("failed to get container pids: %v", err)
}

if containerInfo != nil && len(*containerInfo) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit no need to check the len range will handle that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@anfredette
Copy link
Contributor Author

not sure about renaming the agent files with cl is that for clusterscope ?

Yes. I was planning to rename things as follows:
cluster-scoped related files: cl_*
namespace-scoped related files: ns_*

@anfredette anfredette force-pushed the load-attach-split branch 2 times, most recently from 020133b to ca275d1 Compare March 2, 2025 22:42
@anfredette anfredette changed the title Initial bpfman-operator support for Load/Attach Split bpfman-operator: Support Load/Attach Split Mar 2, 2025
This commit introduces support for bpfman’s load-attach split from bpfman PR
refactoring and CRD changes to align with this new model.

Key Changes:

- Remove "*Program" CRDs and related code in favor of the BpfApplication CRD,
  which can contain multiple eBPF programs.
- Replace BpfProgram CRD with BpfApplicationState, which stores per-node state
  information for all eBPF programs in a BpfApplication.
- Rename CRDs for better alignment with Kubernetes conventions:
  - BpfApplication => ClusterBpfApplication (Cluster-scoped)
  - BpfNsApplication => BpfApplication (Namespace-scoped)
- The BpfApplication CRDs have been modified to include an optional list of
  attach points for each program, enabling pre-loading programs before
  attachment and supporting dynamic attachment updates.
- Redesign the bpfman-agent controller to natively support BpfApplications.

Signed-off-by: Andre Fredette <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants