Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backends/ebpf: Track header offset in bytes rather than bits. #4327

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

thomascalvert-xlnx
Copy link
Member

@thomascalvert-xlnx thomascalvert-xlnx commented Jan 10, 2024

In summary, this is an optimisation which replaces the parser/deparser's unsigned ebpf_packetOffsetInBits variable (which tracks the current field offset from start of packet) with a u8* ebpf_headerStart pointer (which points to the start of the current header). There are a couple advantages to this approach:

  1. Generates less instructions since the pointer is only advanced once per header instead of every field. This matters not only for performance, but also to fit within the kernel verifier's instruction count limit.
  2. Compilers which consume the generated code can make assumptions about header alignment being byte-aligned. This particularly matters for hardware targets where additional barrel shifting could be quite expensive.

Note that restricting headers to be byte-aligned is not new, in fact it's already enforced by the eBPF backend.

The eBPF backend code is leveraged by other backends: uBPF & TC. For uBPF it was not possible to convert similarly due to it supporting advance() calls with arbitrary bit values - not sure whether it would be acceptable to restrict this too. TC looks like it would be amenable to these changes however since it is in active development I wanted to avoid the potential for conflicts. If we do want to port these changes to those backends it should be fairly mechanical since the code looks almost the same.

@fruffy fruffy added the ebpf Topics related to the eBPF back end label Jan 10, 2024
@fruffy fruffy requested review from osinstom and tatry January 10, 2024 15:30
@fruffy
Copy link
Collaborator

fruffy commented Jan 10, 2024

Finally, this PR does start with 2 commits which are technically a follow-up to #4160 so it's arguably a bit cheeky to bundle them in here. Please let me know if having a separate PR is preferable and I will gladly split this one.

Can you point out the concrete follow-ups?

@tatry @osinstom tagging you for this PR since there is no dedicated maintainer for the eBPF core. Let me know if this code is not part of your responsibility.

@thomascalvert-xlnx
Copy link
Member Author

PR does start with 2 commits which are technically a follow-up to #4160

Can you point out the concrete follow-ups?

@fruffy Sorry if it wasn't clear - I'm referring to these two commits specifically:

@fruffy
Copy link
Collaborator

fruffy commented Jan 10, 2024

PR does start with 2 commits which are technically a follow-up to #4160

Can you point out the concrete follow-ups?

@fruffy Sorry if it wasn't clear - I'm referring to these two commits specifically:

* [backends/ebpf: Fix errors tests failing when run with -j](https://github.com/p4lang/p4c/pull/4327/commits/a41707f15a6c09e122a14befe4e38d22fd2a5926)

* [backends/ebpf: Improve docs to reflect recent XDP model addition.](https://github.com/p4lang/p4c/pull/4327/commits/9817ee2820b9e1ebb00629eddc5ec2da2c528c94)

I can give these a quick review if you factor them out.

@thomascalvert-xlnx
Copy link
Member Author

With the latest update, I believe that all issues in the code have been fixed now. The only remaining CI failure is a transient network error which is unrelated to the changes here.

W: Failed to fetch http://download.opensuse.org/repositories/home:/p4lang/xUbuntu_20.04/InRelease 503 Service Unavailable [IP: 195.135.223.226 80]

@jafingerhut
Copy link
Contributor

@thomascalvert-xlnx I poked Github CI to force it to re-run the failed job. Hopefully it will work this time.

@fruffy fruffy requested a review from usha1830 January 26, 2024 20:19
@fruffy
Copy link
Collaborator

fruffy commented Jan 26, 2024

Also pinging @komaljai and @Sosutha on this. It may concern you.

@usha1830
Copy link
Contributor

usha1830 commented Feb 5, 2024

@vbnogueira Could you please check if this impacts the TC tests

Copy link
Contributor

@usha1830 usha1830 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me.
I would prefer to wait for @vbnogueira to confirm that TC tests work fine before we merge this PR.

@usha1830 usha1830 added the p4tc Topics related to the P4-TC back end label Feb 5, 2024
Copy link
Contributor

@usha1830 usha1830 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.template output from TC backend is empty in many cases. It is not expected to e empty. Can you please look into this @thomascalvert-xlnx ?

@thomascalvert-xlnx
Copy link
Member Author

@usha1830 could you please provide a reproducer cmdline? In my p4tc test runs, all the template files I see are non-empty.

@usha1830
Copy link
Contributor

usha1830 commented Feb 5, 2024

@usha1830 could you please provide a reproducer cmdline? In my p4tc test runs, all the template files I see are non-empty.

I see empty files in this PR only.
For example testdata/p4tc_samples_outputs/default_action_example.template,
image
image

@thomascalvert-xlnx
Copy link
Member Author

thomascalvert-xlnx commented Feb 5, 2024

@usha1830 good spot! In fact the files aren't empty, the issue is that for some reason the commit sets their executable bit (without modifying contents) and that is Github's way of showing this permissions change.

I'm not sure why this happened, sorry, I have edited the last commit to clear executable bit on all template files. Note that there are still a handful of .template files which show up as empty because their executable bit is now being cleared - if you click 'View file' option you can see that they are not truly empty.

@usha1830
Copy link
Contributor

usha1830 commented Feb 5, 2024

@usha1830 good spot! In fact the files aren't empty, the issue is that for some reason the commit sets their executable bit (without modifying contents) and that is Github's way of showing this permissions change.

I'm not sure why this happened, sorry, I have edited the last commit to clear executable bit on all template files. Note that there are still a handful of .template files which show up as empty because their executable bit is now being cleared - if you click 'View file' option you can see that they are not truly empty.

Yes I did 'View file' and saw those files had some content but wasn't sure why it was showing empty. Thanks for clarifying!

@thomascalvert-xlnx
Copy link
Member Author

Thanks for looking Usha.

Rebased to top of main branch - there were a couple minor merge conflicts around the dynamic_cast removal but otherwise no other changes.

@vbnogueira
Copy link
Contributor

@vbnogueira Could you please check if this impacts the TC tests

I'll run the tests and check here

@thomascalvert-xlnx
Copy link
Member Author

There are some failures in the latest test runs - these are new p4tc tests added since the rebase, I can easily update their outputs too.

However when doing so I noticed the template permissions thing cropping up again, and traced it down to this bit of code which explicitly sets their executable bit:

https://github.com/p4lang/p4c/blob/main/backends/tc/backend.cpp#L128-L132

so I guess it would be correct to update all the expected files to have the executable bit set. This happens as a matter of course with the new P4TEST_REPLACE invocation so is easy enough to effect.

This patch also sets the executable bit on all .template files. This is
done by the TC backend when they are created, however for some reason
never made it to the reference outputs, and the checker script doesn't
look at permissions.
auto-merge was automatically disabled February 6, 2024 10:53

Head branch was pushed to by a user without write access

@thomascalvert-xlnx
Copy link
Member Author

Rebased to top of tree.

@fruffy fruffy merged commit c5c38a7 into p4lang:main Feb 6, 2024
16 checks passed
@thomascalvert-xlnx thomascalvert-xlnx deleted the ebpf-offsets branch February 6, 2024 13:26
@jhsmt
Copy link

jhsmt commented Feb 6, 2024

From overview, I dont this change would affect functionality of P4TC but we wanted to do some quick testing before the merge and look at the assembler diff. No harm done, we can respond if issues surface.
I have a question about the process: @vbnogueira stated we were going to review - but the code was still merged. What is the process?

@fruffy
Copy link
Collaborator

fruffy commented Feb 6, 2024

I'll apologize, I misread vbnoguiera's comment as approval. I can revert this PR and reopen it. The process is as you would expect. Only once every stakeholder approves code can get merged.

We can add a CODEOWNERS file to actually enforce this: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners

But this can lead to a lot of email spam.

@jhsmt
Copy link

jhsmt commented Feb 6, 2024

I'll apologize, I misread vbnoguiera's comment as approval. I can revert this PR and reopen it.

No need. If we notice anything unusual we will open another PR. So far it does look from the assembler that the change removes about 1 instruction and should have no impact on functionality. But we will test.

The process is as you would expect. Only once every stakeholder approves code can get merged.

We can add a CODEOWNERS file to actually enforce this: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners

But this can lead to a lot of email spam.

True. Maybe just tag the stakeholders in the issue and they will get email notifications?

@fruffy
Copy link
Collaborator

fruffy commented Feb 7, 2024

True. Maybe just tag the stakeholders in the issue and they will get email notifications?

Who should we tag on PRs concerning P4TC and the eBPF back end?

@jhsmt
Copy link

jhsmt commented Feb 7, 2024

off top of my head: @komaljai @Sosutha @usha1830 @vbnogueira @jhsmt @tammela

@fruffy
Copy link
Collaborator

fruffy commented Feb 7, 2024

Okay, we could add code owners for the respective folders but without requiring reviews.

@jhsmt
Copy link

jhsmt commented Feb 7, 2024

Okay, we could add code owners for the respective folders but without requiring reviews.
@komaljai @Sosutha @usha1830 would be more appropriate as "code owners" - we interact with them.
We just wanted to be in sync with any changes affecting P4TC. That should probably be resolved when we have sufficient testing in place. Not sure how to resolve this ;->

@jhsmt
Copy link

jhsmt commented Feb 27, 2024

From overview, I dont this change would affect functionality of P4TC but we wanted to do some quick testing before the merge and look at the assembler diff. No harm done, we can respond if issues surface. I have a question about the process: @vbnogueira stated we were going to review - but the code was still merged. What is the process?

Argh, we are now finding this broke us. Sorry we were busy on other things to catch this sooner. The better solution is to probably to restrict this change to just the ebpf backend (and leave the tc side alone). @vbnogueira can we try reverting this on the tc side and see if that resolves our issue? @usha1830 @komaljai the end results in most tests is the change chops off the payload and only emits the headers. We do generate the parser as a separate ebpf program whereas the ebpf backend has everything in one ebpf program - perhaps thats why it wasnt obvious earlier.
We really need to integrate p4testgen to catch these things instead of (currently) depending on humans (in this case inspection via tcpdump).

@fruffy
Copy link
Collaborator

fruffy commented Feb 27, 2024

@thomascalvert-xlnx there is a regression unfortunately.

We can revert this commit and try to patch things separately. Or we could try to remove the parts affecting the TC back end.

We really need to integrate p4testgen to catch these things instead of (currently) depending on humans (in this case inspection via tcpdump).

I can help with getting this bootstrapped. :)

@jhsmt
Copy link

jhsmt commented Feb 27, 2024

@thomascalvert-xlnx there is a regression unfortunately.

We can revert this commit and try to patch things separately. Or we could try to remove the parts affecting the TC back end.

We really need to integrate p4testgen to catch these things instead of (currently) depending on humans (in this case inspection via tcpdump).

I can help with getting this bootstrapped. :)

Appreciated!
There is some effort from @usha1830 @komaljai and company. Given you are the author of this would help us get there sooner for sure.

@thomascalvert-xlnx
Copy link
Member Author

the end results in most tests is the change chops off the payload and only emits the headers

@jhsmt it's not obvious to me how this change could cause the symptom you describe. Could you please provide more details and ideally a reproducer?

restrict this change to just the ebpf backend (and leave the tc side alone)

Don't think that's easy because the TC backend calls the eBPF backend directly for the code generation. If you look at the changes in this PR, the stuff in backends/tc is actually only a couple lines changed.

In my opinion the best solution would be to find and fix the cause of the aforementioned regression. Ideally even add a (automated!) test for it. I am happy to help but would need a reproducer in order to investigate.

@vbnogueira
Copy link
Contributor

the end results in most tests is the change chops off the payload and only emits the headers

@jhsmt it's not obvious to me how this change could cause the symptom you describe. Could you please provide more details and ideally a reproducer?

If you take a look at, for example, the simple_exact_example program (https://github.com/p4lang/p4c/pull/4327/files#diff-5bf9741635b560c009974b716fb1f045b92ce83161bb98e591779c7f40950f1dR120).
You'll see that it's using the packet size itself to calculate outHeaderOffset and not ebpf_packetOffsetInBits which tells how many header bits the parser processed.
So OutHeaderOffset will also account for the payload in the difference, which will lead it to chop the payload

@vbnogueira
Copy link
Contributor

We also are having issues loading the generated calculator parser program after this patch (https://github.com/p4lang/p4c/blob/main/testdata/p4tc_samples_outputs/calculator_parser.c).

Reproducing it is simple, just compile the calculator_parser.c and try to load the resulting object file
For example:

tc actions add action bpf obj generated/calculator_parser.o section p4tc/parse

Here is a sample of the verifier output:

48: (71) r1 = *(u8 *)(r7 +14)         ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R7=pkt(r=30)
; select_0 = (((((u32)(((u16)tmp_0.p << 8) | ((u16)tmp_2.four & 0xff)) << 8) & ((1 << 24) - 1)) | (((u32)tmp_4.ver & 0xff) & ((1 << 24) - 1))) & ((1 << 24) - 1));
49: (67) r1 <<= 16                    ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=0xff0000,var_off=(0x0; 0xff0000))
; tmp_2.four = (u8)((load_byte(pkt, BYTES(ebpf_packetOffsetInBits))));
50: (71) r2 = *(u8 *)(r7 +31)
invalid access to packet, off=31 size=1, R7(id=0,off=31,r=30)
R7 offset is outside of the packet
processed 48 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'tc_parse_func': failed to load: -13
libbpf: failed to load object 'generated/calculator_parser.o'
bad action parsing
parse_action: bad value (5:bpf)!

@jhsmt
Copy link

jhsmt commented Feb 27, 2024

the end results in most tests is the change chops off the payload and only emits the headers

@jhsmt it's not obvious to me how this change could cause the symptom you describe. Could you please provide more details and ideally a reproducer?

restrict this change to just the ebpf backend (and leave the tc side alone)

Don't think that's easy because the TC backend calls the eBPF backend directly for the code generation. If you look at the changes in this PR, the stuff in backends/tc is actually only a couple lines changed.

In my opinion the best solution would be to find and fix the cause of the aforementioned regression. Ideally even add a (automated!) test for it. I am happy to help but would need a reproducer in order to investigate.

@vbnogueira explained the issue. The simple_exact_example prog diff may show why the payload chopping happens. The parser example is showing a different sympton.
We do have vagrant machines which use the examples from https://github.com/p4tc-dev/P4TC-examples-pub but unfortunately p4c still has some bugs for p4tc which we are working on, so if you generate it you will have to hand edit a few things beyond your issue. We could do the fixing for you and give you access to a vagrant machine if that would help

@jhsmt
Copy link

jhsmt commented Feb 27, 2024

BTW, neither @vbnogueira nor myself mentioned this earlier: But reverting the patch resolves the issues for us

@thomascalvert-xlnx
Copy link
Member Author

thomascalvert-xlnx commented Feb 28, 2024

You'll see that it's using the packet size itself to calculate outHeaderOffset and not ebpf_packetOffsetInBits which tells how many header bits the parser processed.

@vbnogueira thank you for the explanation. Does the following patch help solve the issue?

--- a/backends/tc/ebpfCodeGen.cpp
+++ b/backends/tc/ebpfCodeGen.cpp
@@ -268,6 +268,9 @@ void TCIngressPipelinePNA::emit(EBPF::CodeBuilder *builder) {
         builder->newline();
         builder->emitIndent();
         builder->appendFormat("unsigned %s = hdrMd->%s;", offsetVar.c_str(), offsetVar.c_str());
+        builder->newline();
+        builder->emitIndent();
+        builder->appendFormat("%s = %s + BYTES(%s);", headerStartVar.c_str(), packetStartVar.c_str(), offsetVar.c_str());
     }
     builder->newline();
     emitHeadersFromCPUMAP(builder);

It results in adding one line near the start of the process function:

    unsigned ebpf_packetOffsetInBits = hdrMd->ebpf_packetOffsetInBits;
    hdr_start = pkt + BYTES(ebpf_packetOffsetInBits);                   <--- THIS LINE ADDED

@vbnogueira
Copy link
Contributor

--- a/backends/tc/ebpfCodeGen.cpp
+++ b/backends/tc/ebpfCodeGen.cpp
@@ -268,6 +268,9 @@ void TCIngressPipelinePNA::emit(EBPF::CodeBuilder *builder) {
         builder->newline();
         builder->emitIndent();
         builder->appendFormat("unsigned %s = hdrMd->%s;", offsetVar.c_str(), offsetVar.c_str());
+        builder->newline();
+        builder->emitIndent();
+        builder->appendFormat("%s = %s + BYTES(%s);", headerStartVar.c_str(), packetStartVar.c_str(), offsetVar.c_str());
     }
     builder->newline();
     emitHeadersFromCPUMAP(builder);

You'll see that it's using the packet size itself to calculate outHeaderOffset and not ebpf_packetOffsetInBits which tells how many header bits the parser processed.

@vbnogueira thank you for the explanation. Does the following patch help solve the issue?

Yes, it does.
Thank you.
Just tested it.

@thomascalvert-xlnx
Copy link
Member Author

Yes, it does.
Thank you.
Just tested it.

Excellent - thanks for testing.

I believe that I have a fix for the calculator verifier failure too, just trying to validate it now. Will post a PR with both patches once done.

thomascalvert-xlnx added a commit to thomascalvert-xlnx/p4c that referenced this pull request Feb 28, 2024
thomascalvert-xlnx added a commit to thomascalvert-xlnx/p4c that referenced this pull request Feb 28, 2024
thomascalvert-xlnx added a commit to thomascalvert-xlnx/p4c that referenced this pull request Feb 29, 2024
thomascalvert-xlnx added a commit to thomascalvert-xlnx/p4c that referenced this pull request Feb 29, 2024
github-merge-queue bot pushed a commit that referenced this pull request Feb 29, 2024
* tc: Fix ingress pipeline not restoring header pointer from metadata.

Issue reported in #4327.

* tc: Fix lookahead not restoring offsetVar.

Issue reported in #4327.

* Update p4tc golden outputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ebpf Topics related to the eBPF back end p4tc Topics related to the P4-TC back end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants