Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead code in dash_pipeline.p4 PNA version #399

Closed
fruffy opened this issue Jul 11, 2023 · 24 comments
Closed

Dead code in dash_pipeline.p4 PNA version #399

fruffy opened this issue Jul 11, 2023 · 24 comments
Assignees

Comments

@fruffy
Copy link

fruffy commented Jul 11, 2023

I ran some coverage benchmarks on the dash_pipeline.p4 program using P4Testgen and I noticed that there is some dead code in the dash PNA pipeline.
In the nrve_encap action there is this check:
https://github.com/sonic-net/DASH/blob/main/dash-pipeline/bmv2/dash_nvgre.p4#L48

However, this code can never be executed. nrve_encap is only called when route_service_tunnel was previously executed. And route_service_tunnel always sets the IPv4 header invalid here when it calls service_tunnel_encode.

Is this because dash_pipeline.p4 is not complete yet? What is the intended behavior?

@chrispsommers
Copy link
Collaborator

@marian-pritsak please take a look, thanks

@chrispsommers
Copy link
Collaborator

@fruffy Thanks for reporting this. I would like to understand how you ran this test and if it might be something we can incorporate into the DASH CI/CD pipeline? In DASH we use a p4c docker container which is a slimmed-down version of p4lang/p4c derived by taking only the one bmv2 backend (to keep the image smaller). It would not be too hard to add in the p4testgen backend and call it upon every build, if it would catch dead code like this. Can you provide instructions on how you ran p4testgen to expose this dead code? Thanks!

@fruffy
Copy link
Author

fruffy commented Jul 11, 2023

@fruffy Thanks for reporting this. I would like to understand how you ran this test and if it might be something we can incorporate into the DASH CI/CD pipeline? In DASH we use a p4c docker container which is a slimmed-down version of p4lang/p4c derived by taking only the one bmv2 backend (to keep the image smaller). It would not be too hard to add in the p4testgen backend and call it upon every build, if it would catch dead code like this. Can you provide instructions on how you ran p4testgen to expose this dead code? Thanks!

P4Testgen supports monitoring of statement coverage (and other P4 nodes) when generating tests. With that we can also implement different types of test-case-generation strategies to produce more targeted tests. For example,

./p4testgen --target dpdk --arch pna --std p4-16 -I/p4/p4c/build/p4include --test-backend METADATA --seed 1 --max-tests 0 --out-dir p4c/build/results/dash-pipeline-pna --path-selection GREEDY_STATEMENT_SEARCH --stop-metric MAX_STATEMENT_COVERAGE --track-coverage STATEMENTS dash-pipeline/bmv2/dash_pipeline.p4  -DTARGET_DPDK_PNA --print-coverage

will try to cover all statements in the program and then stop once it has achieved that. For this particular program achieving full coverage was not possible and test-case-generation did not terminate.

We could use P4Testgen to check that we can cover everything but that is subject to a bunch of limitations:

  • The statement coverage algorithm is a heuristic and may not always find the path to cover a particular statement early.
  • If code is dead P4Testgen will simply not terminate. So we have to use some form of bound.

@fruffy
Copy link
Author

fruffy commented Jul 11, 2023

In parallel, I have been working on a tool that can provably detect this type of dead-code (and other missing optimizations). https://github.com/fruffy/flay
This should work better for CI, but it is not ready yet.

@chrispsommers
Copy link
Collaborator

If code is dead P4Testgen will simply not terminate. So we have to use some form of bound.
Could you elaborate? Does the program run forever and manual intervention (e.g. a timeout in the calling process) need to be imposed? I guess this could be achieved in some Python test wrapper.

@fruffy
Copy link
Author

fruffy commented Jul 11, 2023

Could you elaborate? Does the program run forever and manual intervention (e.g. a timeout in the calling process) need to be imposed? I guess this could be achieved in some Python test wrapper.

P4Testgen will generate tests until there are no more paths. For programs such as dash_pipeline this can mean upwards of 200-300k tests. Likely even many million if the program grows in complexity.

The best solution is probably to expect full statement coverage within a reasonable amount of tests (maybe a thousand?). But since the algorithm is still a heuristic that could also fail if the program changes. That is something we would have to try out.

@chrispsommers
Copy link
Collaborator

Thanks @fruffy Based on this, I think it's a bit early to add routine CI tests on DASH using P4testgen, but let's keep up the experimenting. Also I'll try to monitor progress on flay, thanks for pointing it out.

@jnfoster
Copy link

Just wanted to chime in to say that this is awesome!

@KrisNey-MSFT
Copy link
Collaborator

Any HW can use this tool to exercise the paths - thank you!

@fruffy
Copy link
Author

fruffy commented Jul 14, 2023

Any HW can use this tool to exercise the paths - thank you!

Not sure if I understand correctly. Are you asking about ways to generate tests that could exercise the paths we try to cover? Unfortunately, the DPDK PNA target does not yet have a testing pipeline, but we are working on one.

@KrisNey-MSFT
Copy link
Collaborator

hi @fruffy - we were just looking and this (and grateful for the contribution) in the DASH Community Call :)
We weren't asking...I think we understand that this is is a work in progress for the future - and were making a statement that in the future the goal would be to use the tool to exercise the paths. And thank you!

@fruffy
Copy link
Author

fruffy commented Aug 11, 2023

Has this been confirmed or is there an easy fix to this? I am maintaining a PR on P4C which contributes a snapshot of the dash program as a compiler test. I would like to contribute the version without dead code. Mostly for selfish reasons, we are planning to run testing benchmarks on this particular version.

@chrispsommers
Copy link
Collaborator

@marian-pritsak Have you had a chance to look at this? Thanks.

@chrispsommers
Copy link
Collaborator

@jfingerh I took a closer look at this and I have a suspicion this is actually an artifact of translating

hdr.ipv4.total_len = hdr.inner_ipv4.total_len*(bit<16>)(bit<1>)hdr.inner_ipv4.isValid() + \
, which contains multiple conditional expressions, and https://github.com/sonic-net/DASH/blob/main/dash-pipeline/bmv2/dash_nvgre.p4#L48, which is the p4-dpdk compatible equivalent with the expression terms "unrolled" into individual ifs and expressions which are later added up in
hdr.ipv4.total_len = (ETHER_HDR_SIZE + IPV4_HDR_SIZE + UDP_HDR_SIZE +
. This then exposed the part of the original combined expression which is never valid and p4testgen found it!

So, we could change the code to remove all the IPv4 components because per @fruffy 's observations, this is never called for IPv4 case. I'd want to leave a comment in the code mentioning it would need to be revised to handle IPv4. Thoughts?

@jafingerhut
Copy link
Contributor

So my attempt in writing the version of the code with if statements was to faithfully duplicate the behavior of the original code, which uses multiplications by (bit<1>) hdr.ipv4.isValid() and similar expressions for other header valid bits.

Any tool that checks "line coverage" will get 100% coverage for the original, but not my modified version. A tool that somehow decided to check that all possible values of sub-expressions like (bit<1>) hdr.ipv4.isValid() in the original code would have caught this in the original code, too. However, deciding which kinds of case coverage one wants in such expressions is a bit of a difficult thing to define, I believe.

If there is dead code in my translated version, I believe there are also "dead sub-expressions" in the original code that could be removed.

@chrispsommers
Copy link
Collaborator

If there is dead code in my translated version, I believe there are also "dead sub-expressions" in the original code that could be removed.

Agreed. @marian-pritsak could you please comment? Thanks.

@jafingerhut
Copy link
Contributor

I would also guess that in future versions of the DASH P4 code, the branches that are currently dead code could easily become live code again, if additional packet handling cases are added in other parts of the program executed before this code is executed.

@fruffy
Copy link
Author

fruffy commented Sep 13, 2023

So my attempt in writing the version of the code with if statements was to faithfully duplicate the behavior of the original code, which uses multiplications by (bit<1>) hdr.ipv4.isValid() and similar expressions for other header valid bits.

Any tool that checks "line coverage" will get 100% coverage for the original, but not my modified version. A tool that somehow decided to check that all possible values of sub-expressions like (bit<1>) hdr.ipv4.isValid() in the original code would have caught this in the original code, too. However, deciding which kinds of case coverage one wants in such expressions is a bit of a difficult thing to define, I believe.

If there is dead code in my translated version, I believe there are also "dead sub-expressions" in the original code that could be removed.

Is this version local or part of a PR?

@jafingerhut
Copy link
Contributor

Is this version local or part of a PR?

It is a few lines earlier than the link to a line of code you gave in the original issue, here: https://github.com/sonic-net/DASH/blob/main/dash-pipeline/bmv2/dash_nvgre.p4#L35-L42

That code is only compiled if you #define the symbol TARGET_BMV2_V1MODEL, whereas the code with the if statement version is only compiled if you #define the symbol TARGET_DPDK_PNA.

The reason for the #ifdef is that as of 2023-Apr when I created the TARGET_DPDK_PNA version, p4c-dpdk did not support multiplication of 2 run-time variables (and probably still does not). The p4c BMv2 back end does not fully support if statements within action bodies.

If you want to experiment with the TARGET_BMV2_V1MODEL version of the code with p4testgen, I would guess that you need to define that symbol, and NOT define TARGET_DPDK_PNA, and then you will get a different version of the P4 program.

@fruffy
Copy link
Author

fruffy commented Sep 13, 2023

Oh, I see what you mean. Yes, for the v1model version we cover all statements, but not for the pna version because of the way the code is structured. In some case that exposed a dormant issue.

@jafingerhut
Copy link
Contributor

Right. There are I think Verilog coverage tools that go more detailed than line coverage, and check for something like "subexpression coverage", that might expose the "dead cases" in the TARGET_BMV2_V1MODEL version of the code. There is probably a widely-accepted name for what I am calling "subexpression coverage", but if so, I do not know that name.

I asked one hardware verification person for names of other kinds of code coverage, and he mentioned these:

  • branch
  • FSM
  • toggle
  • condition
  • and maybe expression depending on simulator

I do not have definitions of all of those, but the Wikipedia page on code coverage describes a few others besides function and line coverage with examples: https://en.wikipedia.org/wiki/Code_coverage

I mention these not because I think p4testgen ought to do all of these things, by the way. Mainly just for me looking for the right widely-used terms (if any) for the more detailed kinds of coverage that have been implemented for other programming languages.

@chrispsommers
Copy link
Collaborator

@fruffy We discussed at length in today's WG meeting (myself, @jafingerhut, @marian-pritsak) and the consensus was:

  • it is true that given today's P4 code and choice of tunneling options, the particular path of code you identified in Dead code in dash_pipeline.p4 PNA version #399 (comment) will never be executed
  • this may not hold true forever if other features/tunneling options are added in the future
  • It is not a "bug," it is future-proofing the code in case other options arise which would make this code path live.

All this means for you is you can decide how to handle the dead-code detection in your use of DASH code as a test case. Perhaps you want to "expect" this and not "hard-fail." We hope you do decide to use DASH code as a test case. In fact at some point we could add p4testgen to our CI pipeline. Let us know how the effort described in #399 (comment) proceeds.

We appreciate your bringing this to our attention, it resulted in better awareness and demonstrates the power of your tool. I am closing this issue, but feel free to add to the conversation thread here.

@fruffy
Copy link
Author

fruffy commented Oct 3, 2023

Thanks for keeping me posted on this! I will merge a patched version to the p4c repository.

@KrisNey-MSFT
Copy link
Collaborator

@r12f for visibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants