Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map written LocationSets to program locations (loc_t) instead of IR::Expression*s #4797

Merged
merged 5 commits into from
Jul 19, 2024

Conversation

kfcripps
Copy link
Contributor

Adds a modified version of loc_t from the midend def_use pass to the frontend one. Please see discussions in #4500, #4507, #4548 for more context. I did not add any test cases as @mihaibudiu's PRs #4539 and #4727 mask the root problem, but I did verify that these changes also fix the test cases from the below issues with an older version of the compiler that did not have Mihai's fixes yet.

Fixes #4500
Fixes #4507
Fixes #4548

Comment on lines 490 to 564
/// For each expression the location set it writes
hvec_map<const IR::Expression *, const LocationSet *> writes;
/// For each program location the location set it writes
ordered_map<loc_t, const LocationSet *> writes;
Copy link
Contributor Author

@kfcripps kfcripps Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl Let me know if there is a way to use hvec_map here instead of ordered_map. I reverted to ordered_map because I ran into a bunch of errors when changing the key to loc_t.

Copy link
Contributor

@asl asl Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordered_map is terrible expensive. And this is one of the most hottest places. I would probably suggest that every PR that contains changes to use-def in this way to contain runtimes of:

test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512

and

ctest -R "p14_to_16/testdata/p4_14_samples/switch_20160512/switch.p4"

They are more or less the same source code, just running a bit different set of passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of running the above commands on this branch:

test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512

7930 ms
7918 ms
7949 ms

ctest -R "p14_to_16/testdata/p4_14_samples/switch_20160512/switch.p4"

23.14 s
23.19 s
23.00 s

Main branch:

test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512

7858 ms
7854 ms
7873 ms

ctest -R "p14_to_16/testdata/p4_14_samples/switch_20160512/switch.p4"

22.78 s
22.94 s
23.19 s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordered_map is terrible expensive. And this is one of the most hottest places. I would probably suggest that every PR that contains changes to use-def in this way to contain runtimes of:

@asl changed to hvec_map

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to test this is to remove the passes that were introduced to tree-ify the DAG before def-use (e.g., Cloner in simplifyDefUse.h). Unfortunately not all of them can be removed from the compiler, since other passes may require trees as well.

// A location in the program. Includes the context from the visitor, which needs to
// be copied out of the Visitor::Context objects, as they are allocated on the stack and
// will become invalid as the IR traversal continues
struct loc_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be pretty expensive. def_use is already expensive in memory.
We had closed #151, but I am not sure the fundamental problem is really solved.
Maybe it's not a problem in practice.

Copy link
Contributor Author

@kfcripps kfcripps Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asl can probably comment on the current memory usage of def_use, but I'm not sure that this PR makes it noticeably worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had closed #151, but I am not sure the fundamental problem is really solved.

It is not solved. The overhead is just a bit smaller. Still:

  • Lots of malloc traffic (e.g. joinDefinitions, or getPoints, or writes, etc. – they all allocate as crazy)
  • Lots of maps everywhere and lots of map lookups.

In many cases use-def is the single pass that runs for a significant part of the whole frontend

frontends/p4/def_use.h Outdated Show resolved Hide resolved
Comment on lines 490 to 564
/// For each expression the location set it writes
hvec_map<const IR::Expression *, const LocationSet *> writes;
/// For each program location the location set it writes
ordered_map<loc_t, const LocationSet *> writes;
Copy link
Contributor

@asl asl Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ordered_map is terrible expensive. And this is one of the most hottest places. I would probably suggest that every PR that contains changes to use-def in this way to contain runtimes of:

test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512

and

ctest -R "p14_to_16/testdata/p4_14_samples/switch_20160512/switch.p4"

They are more or less the same source code, just running a bit different set of passes.

frontends/p4/def_use.h Outdated Show resolved Hide resolved
}
private:
// TODO: Make absl::flat_hash_set instead?
std::unordered_set<loc_t> &cached_locs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use an hvec_set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any hvec_set in the tree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to ensure stable addresses of values around insertions. So, flat_hash_set is not an option here. You might want to give node_hash_set a try:

Suggested change
std::unordered_set<loc_t> &cached_locs;
absl::node_hash_set<loc_t, Util::Hash> &cached_locs;

Copy link
Contributor Author

@kfcripps kfcripps Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not doing any iteration of cached_locs, so stable insertion value ordering does not matter here. If you think absl::node_hash_set would be better (e.g. for performance reasons), feel free to push your changes to this branch as I ran into some errors when trying your suggestion and I'd rather not spend the time debugging this unless there is a good reason to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about stable iteration order. The issue is much more subtle, but is very important: you are doing (though I would prefer ->first here):

    return &*cached_locs.insert(tmp).first;

Note that you are returning the address of hash map entry. Therefore you need to ensure this address will not change during insertions / deletions. STL unordered containers guarantee this (as each bucket is essentially a list). flat_hash_set does not guarantee this, node_hash_set does. See https://abseil.io/docs/cpp/guides/container for more information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you are returning the address of hash map entry. Therefore you need to ensure this address will not change during insertions / deletions.

Ok, I see what you mean now.

I'm not convinced that using std::unordered_set is a problem in the first place, so I will not spend the time trying to get node_hash_set to work, but feel free to push such changes to this branch if you know how to do it and think that it will improve the performance.

(but I will at least update the comment if you don't want to make these changes)

@kfcripps
Copy link
Contributor Author

The way to test this is to remove the passes that were introduced to tree-ify the DAG before def-use (e.g., Cloner in simplifyDefUse.h). Unfortunately not all of them can be removed from the compiler, since other passes may require trees as well.

This causes some tests to fail. I will investigate this later.

@kfcripps kfcripps marked this pull request as draft July 12, 2024 22:13
@kfcripps
Copy link
Contributor Author

The way to test this is to remove the passes that were introduced to tree-ify the DAG before def-use (e.g., Cloner in simplifyDefUse.h). Unfortunately not all of them can be removed from the compiler, since other passes may require trees as well.

This causes some tests to fail. I will investigate this later.

@mihaibudiu This doesn't work because of a problem I mentioned earlier in #4548 (comment). If you disable the Cloner and RemoveHidden passes in my branch and build test testdata/p4_16_samples/issue3650.p4, you'll notice that we have a SwitchStatement that has multiple equal SwitchCase children, so the ProgramPoints for each child SwitchCase's statement are considered to be equal, and we'll encounter the "Overwriting definitions" assertion. To fix this we can consider creating a unique id for each child that belongs to a given parent IR::Node during traversal, and use this additional piece of info to help uniquely identify each ProgramPoint and loc_t. This is a slightly different problem so I'd prefer that we fix it in a different future PR instead of this one.

Disabling Cloner and RemoveHidden on this branch causes the following tests to fail:

        300 - p4/testdata/p4_16_samples/header-stack-ops-bmv2.p4 (Failed)
        355 - p4/testdata/p4_16_samples/issue1127-bmv2.p4 (Failed)
        650 - p4/testdata/p4_16_samples/issue3650.p4 (Failed)
        660 - p4/testdata/p4_16_samples/issue3884.p4 (Failed)
        1229 - p4/testdata/p4_16_samples/dash/dash-pipeline-pna-dpdk.p4 (Failed)
        1230 - p4/testdata/p4_16_samples/dash/dash-pipeline-v1model-bmv2.p4 (Failed)

and on main branch, it causes the following tests to fail:

          9 - p4/testdata/p4_16_samples/psa-example-parser-checksum.p4 (Failed)
         10 - p4/testdata/p4_16_samples/action-bind.p4 (Failed)
         44 - p4/testdata/p4_16_samples/arith2-inline-bmv2.p4 (Failed)
        103 - p4/testdata/p4_16_samples/control-hs-index-test5.p4 (Failed)
        132 - p4/testdata/p4_16_samples/drop-bmv2.p4 (Failed)
        141 - p4/testdata/p4_16_samples/enum-bmv2.p4 (Failed)
        187 - p4/testdata/p4_16_samples/gauntlet_action_mux-bmv2.p4 (Failed)
        188 - p4/testdata/p4_16_samples/gauntlet_action_return-bmv2.p4 (Failed)
        205 - p4/testdata/p4_16_samples/gauntlet_exit_combination_18-bmv2.p4 (Failed)
        212 - p4/testdata/p4_16_samples/gauntlet_exit_combination_3-bmv2.p4 (Failed)
        223 - p4/testdata/p4_16_samples/gauntlet_function_return-bmv2.p4 (Failed)
        227 - p4/testdata/p4_16_samples/gauntlet_hdr_function_cast-bmv2.p4 (Failed)
        235 - p4/testdata/p4_16_samples/gauntlet_index_4-bmv2.p4 (Failed)
        236 - p4/testdata/p4_16_samples/gauntlet_index_5-bmv2.p4 (Failed)
        237 - p4/testdata/p4_16_samples/gauntlet_index_6-bmv2.p4 (Failed)
        238 - p4/testdata/p4_16_samples/gauntlet_index_7-bmv2.p4 (Failed)
        241 - p4/testdata/p4_16_samples/gauntlet_indirect_hdr_assign_1-bmv2.p4 (Failed)
        255 - p4/testdata/p4_16_samples/gauntlet_mux_validity-bmv2.p4 (Failed)
        258 - p4/testdata/p4_16_samples/gauntlet_nested_switch-bmv2.p4 (Failed)
        272 - p4/testdata/p4_16_samples/gauntlet_side_effect_order_4-bmv2.p4 (Failed)
        292 - p4/testdata/p4_16_samples/hash-extern-bmv2.p4 (Failed)
        298 - p4/testdata/p4_16_samples/header-bmv2.p4 (Failed)
        300 - p4/testdata/p4_16_samples/header-stack-ops-bmv2.p4 (Failed)
        324 - p4/testdata/p4_16_samples/internet_checksum1-bmv2.p4 (Failed)
        327 - p4/testdata/p4_16_samples/invalid-hdr-warnings2.p4 (Failed)
        355 - p4/testdata/p4_16_samples/issue1127-bmv2.p4 (Failed)
        378 - p4/testdata/p4_16_samples/issue1470-bmv2.p4 (Failed)
        384 - p4/testdata/p4_16_samples/issue1538.p4 (Failed)
        387 - p4/testdata/p4_16_samples/issue1544-1-bmv2.p4 (Failed)
        388 - p4/testdata/p4_16_samples/issue1544-2-bmv2.p4 (Failed)
        390 - p4/testdata/p4_16_samples/issue1544-bmv2.p4 (Failed)
        392 - p4/testdata/p4_16_samples/issue1566-bmv2.p4 (Failed)
        393 - p4/testdata/p4_16_samples/issue1566.p4 (Failed)
        408 - p4/testdata/p4_16_samples/issue1739-bmv2.p4 (Failed)
        436 - p4/testdata/p4_16_samples/issue1955.p4 (Failed)
        449 - p4/testdata/p4_16_samples/issue2104-1.p4 (Failed)
        468 - p4/testdata/p4_16_samples/issue2175.p4 (Failed)
        472 - p4/testdata/p4_16_samples/issue2205-1-bmv2.p4 (Failed)
        478 - p4/testdata/p4_16_samples/issue2221-bmv2.p4 (Failed)
        479 - p4/testdata/p4_16_samples/issue2225-bmv2.p4 (Failed)
        495 - p4/testdata/p4_16_samples/issue2287-bmv2.p4 (Failed)
        497 - p4/testdata/p4_16_samples/issue2288-2.p4 (Failed)
        498 - p4/testdata/p4_16_samples/issue2288.p4 (Failed)
        502 - p4/testdata/p4_16_samples/issue2314.p4 (Failed)
        504 - p4/testdata/p4_16_samples/issue2321.p4 (Failed)
        505 - p4/testdata/p4_16_samples/issue2330-1.p4 (Failed)
        506 - p4/testdata/p4_16_samples/issue2330.p4 (Failed)
        509 - p4/testdata/p4_16_samples/issue2343-bmv2.p4 (Failed)
        512 - p4/testdata/p4_16_samples/issue2345-2.p4 (Failed)
        515 - p4/testdata/p4_16_samples/issue2345.p4 (Failed)
        518 - p4/testdata/p4_16_samples/issue2359.p4 (Failed)
        534 - p4/testdata/p4_16_samples/issue2488-bmv2.p4 (Failed)
        578 - p4/testdata/p4_16_samples/issue2844-enum.p4 (Failed)
        593 - p4/testdata/p4_16_samples/issue304.p4 (Failed)
        650 - p4/testdata/p4_16_samples/issue3650.p4 (Failed)
        660 - p4/testdata/p4_16_samples/issue3884.p4 (Failed)
        681 - p4/testdata/p4_16_samples/issue461-bmv2.p4 (Failed)
        705 - p4/testdata/p4_16_samples/issue561-bmv2.p4 (Failed)
        741 - p4/testdata/p4_16_samples/issue982.p4 (Failed)
        768 - p4/testdata/p4_16_samples/logging-bmv2.p4 (Failed)
        773 - p4/testdata/p4_16_samples/m_psa-dpdk-non-zero-arg-default-action-08.p4 (Failed)
        800 - p4/testdata/p4_16_samples/newtype2.p4 (Failed)
        843 - p4/testdata/p4_16_samples/pna-dpdk-add_on_miss1.p4 (Failed)
        855 - p4/testdata/p4_16_samples/pna-dpdk-parser-state-err.p4 (Failed)
        869 - p4/testdata/p4_16_samples/pna-elim-hdr-copy-dpdk.p4 (Failed)
        899 - p4/testdata/p4_16_samples/pna-mux-dismantle.p4 (Failed)
        900 - p4/testdata/p4_16_samples/pna-subparser.p4 (Failed)
        901 - p4/testdata/p4_16_samples/pna-too-big-label-name-dpdk.p4 (Failed)
        913 - p4/testdata/p4_16_samples/predication_issue.p4 (Failed)
        914 - p4/testdata/p4_16_samples/predication_issue_1.p4 (Failed)
        918 - p4/testdata/p4_16_samples/proliferation1.p4 (Failed)
        928 - p4/testdata/p4_16_samples/psa-basic-counter-bmv2.p4 (Failed)
        958 - p4/testdata/p4_16_samples/psa-dpdk-non-zero-arg-default-action-08.p4 (Failed)
        989 - p4/testdata/p4_16_samples/psa-e2e-cloning-basic-bmv2.p4 (Failed)
        991 - p4/testdata/p4_16_samples/psa-example-counters-bmv2.p4 (Failed)
        992 - p4/testdata/p4_16_samples/psa-example-digest-bmv2.p4 (Failed)
        1009 - p4/testdata/p4_16_samples/psa-example-dpdk-varbit-bmv2.p4 (Failed)
        1034 - p4/testdata/p4_16_samples/psa-i2e-cloning-basic-bmv2.p4 (Failed)
        1041 - p4/testdata/p4_16_samples/psa-meter7-bmv2.p4 (Failed)
        1042 - p4/testdata/p4_16_samples/psa-multicast-basic-2-bmv2.p4 (Failed)
        1043 - p4/testdata/p4_16_samples/psa-multicast-basic-bmv2.p4 (Failed)
        1044 - p4/testdata/p4_16_samples/psa-multicast-basic-corrected-bmv2.p4 (Failed)
        1045 - p4/testdata/p4_16_samples/psa-parser-error-test-bmv2.p4 (Failed)
        1047 - p4/testdata/p4_16_samples/psa-recirculate-no-meta-bmv2.p4 (Failed)
        1048 - p4/testdata/p4_16_samples/psa-register-complex-bmv2.p4 (Failed)
        1049 - p4/testdata/p4_16_samples/psa-register-read-write-2-bmv2.p4 (Failed)
        1050 - p4/testdata/p4_16_samples/psa-register-read-write-bmv2.p4 (Failed)
        1055 - p4/testdata/p4_16_samples/psa-resubmit-bmv2.p4 (Failed)
        1061 - p4/testdata/p4_16_samples/psa-unicast-or-drop-bmv2.p4 (Failed)
        1062 - p4/testdata/p4_16_samples/psa-unicast-or-drop-corrected-bmv2.p4 (Failed)
        1071 - p4/testdata/p4_16_samples/redundant_parsers_dangling_unused_parser_decl.p4 (Failed)
        1126 - p4/testdata/p4_16_samples/stack-bmv2.p4 (Failed)
        1127 - p4/testdata/p4_16_samples/stack-bvec-bmv2.p4 (Failed)
        1135 - p4/testdata/p4_16_samples/std_meta_inlining.p4 (Failed)
        1153 - p4/testdata/p4_16_samples/subparser-with-header-stack-bmv2.p4 (Failed)
        1182 - p4/testdata/p4_16_samples/two-functions.p4 (Failed)
        1184 - p4/testdata/p4_16_samples/two_ebpf.p4 (Failed)
        1210 - p4/testdata/p4_16_samples/v1model-special-ops-bmv2.p4 (Failed)
        1229 - p4/testdata/p4_16_samples/dash/dash-pipeline-pna-dpdk.p4 (Failed)
        1230 - p4/testdata/p4_16_samples/dash/dash-pipeline-v1model-bmv2.p4 (Failed)
        1231 - p4/testdata/p4_16_samples/fabric_20190420/fabric.p4 (Failed)
        1232 - p4/testdata/p4_16_samples/omec/up4.p4 (Failed)
        1233 - p4/testdata/p4_16_samples/pins/pins_fabric.p4 (Failed)
        1234 - p4/testdata/p4_16_samples/pins/pins_middleblock.p4 (Failed)
        1235 - p4/testdata/p4_16_samples/pins/pins_wbb.p4 (Failed)
        1250 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test1.p4 (Failed)
        1252 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test11.p4 (Failed)
        1253 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test12.p4 (Failed)
        1255 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test2.p4 (Failed)
        1256 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test3.p4 (Failed)
        1257 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test4.p4 (Failed)
        1258 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test5.p4 (Failed)
        1259 - p4/testdata/p4_16_samples/parser-inline/parser-inline-test6.p4 (Failed)

so these changes at least seem to be accomplishing their intended purpose.

@mihaibudiu
Copy link
Contributor

If I understand what you are saying, this means that the context representation you are using is incomplete. You need to keep the child number as well.

@kfcripps
Copy link
Contributor Author

kfcripps commented Jul 13, 2024

If I understand what you are saying, this means that the context representation you are using is incomplete. You need to keep the child number as well.

It may be incomplete, but it is still an improvement over the current implementation, so additional improvements can be made separately. And the future improvements will need to affect both loc_t and ProgramPoint, not just loc_t.

@asl
Copy link
Contributor

asl commented Jul 14, 2024

It may be incomplete, but it is still an improvement over the current implementation, so additional improvements can be made separately. And the future improvements will need to affect both loc_t and ProgramPoint, not just loc_t.

Maybe we can use some well-known and proven approaches? E.g. some flavour of value numbering?

@kfcripps
Copy link
Contributor Author

It may be incomplete, but it is still an improvement over the current implementation, so additional improvements can be made separately. And the future improvements will need to affect both loc_t and ProgramPoint, not just loc_t.

Maybe we can use some well-known and proven approaches? E.g. some flavour of value numbering?

That would probably be even better. We can consider doing this in a future PR.

@kfcripps
Copy link
Contributor Author

This PR should be ready now. Please see #4810 for the implementation of "some flavour of value numbering" suggested by @asl.

@kfcripps kfcripps marked this pull request as ready for review July 16, 2024 01:36
@kfcripps kfcripps requested review from asl and mihaibudiu July 16, 2024 01:36
@fruffy fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Jul 16, 2024
Copy link
Contributor

@grg grg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asl
Copy link
Contributor

asl commented Jul 17, 2024

@kfcripps Are there updated benchmarking results? Or the ones above are still valid and there were only correctness changes?

std::size_t P4::loc_t::hash() const {
if (!parent) return Util::Hash{}(node->id);

return Util::Hash{}(node->id, parent->hash());
Copy link
Contributor

@asl asl Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an idea how "deep" the recursion here could be? Maybe it would worth to memoize the hash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is as deep as the deepest IR::Node in an IR::Node DAG. In many of the P4-16 programs that I have analyzed it does not get egregiously deep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... this is quite a lot actually and it could be very deep especially after inlining. I would probably check the effects of memoization. We can use zero hash value as a thombstone, in a very rare case of zero value hash collision, well, we'd have a single level of recursion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried unsuccessfully to memoize the hashes: kfcripps@04b7a4e

Let me know if you have any suggestions on how to accomplish this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion worked. Thanks!

// In this case parentLoc is the loc of n's direct parent.
const P4::loc_t *ComputeWriteSet::getLoc(const IR::Node *n, const loc_t *parentLoc) {
loc_t tmp{n, parentLoc};
return &*cached_locs.insert(tmp).first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return &*cached_locs.insert(tmp).first;
return &*cached_locs.emplace(n, parentLoc).first;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was basically copied from the midend's version of loc_t. Maybe tmp was left for debugging purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this doesn't seem to work:

/local/kfcripps/repos/p4c/frontends/p4/def_use.cpp:776:46:   required from here
/usr/include/c++/9/ext/new_allocator.h:146:4: error: new initializer expression list treated as compound expression [-fpermissive]
  146 |  { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/9/ext/new_allocator.h:146:4: error: no matching function for call to 'P4::loc_t::loc_t(const P4::loc_t*&)'
In file included from /local/kfcripps/repos/p4c/frontends/p4/def_use.cpp:17:
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note: candidate: 'P4::loc_t::loc_t()'
   39 | struct loc_t {
      |        ^~~~~
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note:   candidate expects 0 arguments, 1 provided
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note: candidate: 'constexpr P4::loc_t::loc_t(const P4::loc_t&)'
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note:   no known conversion for argument 1 from 'const P4::loc_t*' to 'const P4::loc_t&'
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note: candidate: 'constexpr P4::loc_t::loc_t(P4::loc_t&&)'
/local/kfcripps/repos/p4c/frontends/p4/def_use.h:39:8: note:   no known conversion for argument 1 from 'const P4::loc_t*' to 'P4::loc_t&&'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, you'd likely need to implement proper constructor ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsuccessful yet again: kfcripps@2275e49

Let me know if any suggestions on how to achieve this

const P4::loc_t *ComputeWriteSet::getLoc(const Visitor::Context *ctxt) {
if (!ctxt) return nullptr;
loc_t tmp{ctxt->node, getLoc(ctxt->parent)};
return &*cached_locs.insert(tmp).first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return &*cached_locs.insert(tmp).first;
return &*cached_locs.emplace(ctxt->node, getLoc(ctxt->parent)).first;

if (p->node == n) return getLoc(p);
auto rv = getLoc(ctxt);
loc_t tmp{n, rv};
return &*cached_locs.insert(tmp).first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return &*cached_locs.insert(tmp).first;
return &*cached_locs.emplace(n, getLoc(ctxt)).first;

frontends/p4/def_use.h Outdated Show resolved Hide resolved
}
private:
// TODO: Make absl::flat_hash_set instead?
std::unordered_set<loc_t> &cached_locs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to ensure stable addresses of values around insertions. So, flat_hash_set is not an option here. You might want to give node_hash_set a try:

Suggested change
std::unordered_set<loc_t> &cached_locs;
absl::node_hash_set<loc_t, Util::Hash> &cached_locs;

@kfcripps
Copy link
Contributor Author

@kfcripps Are there updated benchmarking results? Or the ones above are still valid and there were only correctness changes?

@asl

New results on this branch (as of 50c0f9f):

test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512

7799 ms
7774 ms
7799 ms

ctest -R "p14_to_16/testdata/p4_14_samples/switch_20160512/switch.p4"

23.16 s
22.72 s
22.76 s

@kfcripps kfcripps requested a review from asl July 17, 2024 23:00
@asl asl force-pushed the simplify-def-use-context branch from 50c0f9f to 9d1fb35 Compare July 18, 2024 01:14
Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case for #4507 seems to be missing here.
Should we revert #4727?

@fruffy
Copy link
Collaborator

fruffy commented Jul 18, 2024

What about #4385?

@asl
Copy link
Contributor

asl commented Jul 18, 2024

Re-run benchmarking in less noisy setup, 10 iterations + warmup:

Command Mean [s] Min [s] Max [s] Relative
test/gtestp4c-main --gtest_filter=P4CParserUnroll.switch_20160512 5.113 ± 0.161 4.860 5.378 1.01 ± 0.04
test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512 5.063 ± 0.126 4.883 5.203 1.00

I would say no impact

@kfcripps
Copy link
Contributor Author

What about #4385?

@fruffy Although the problem described in #4385 was introduced by the same offending PR as the other mentioned issues, it does not necessarily mean that the root cause is the same. This PR does not fix # 4385.

@kfcripps
Copy link
Contributor Author

A test case for #4507 seems to be missing here.

See #4797 (comment).. Mihai's PRs masked the root problem, so I do not know of any test cases that pass with this PR but not on main branch. This PR addresses a theoretical problem (described in #4548 (comment)) that probably affects real P4 programs, but I do not currently have any real examples that aren't already working on main branch to test. I believe that #4539 actually fixed the crashes of the exact P4 programs pasted in #4500 and #4507 (so if they weren't added in #4359, they should have been added there, not in this PR).

Should we revert #4727?

We can probably revert it, but based on the changes to test outputs in https://github.com/p4lang/p4c/pull/4727/files, there may be other benefits to leaving the added constant folding pass? Also, to be safe, I'd rather not revert any of Mihai's fixes until all related problems with SimplifyDefUse have been fixed (I opened #4810 and #4811 to address the remaining issues).

@fruffy
Copy link
Collaborator

fruffy commented Jul 18, 2024

See #4797 (comment).. Mihai's PRs masked the root problem, so I do not know of any test cases that pass with this PR but not on main branch. This PR addresses a theoretical problem (described in #4548 (comment)) that probably affects real P4 programs, but I do not currently have any real examples that aren't already working on main branch to test. I believe that #4539 actually fixed the crashes of the exact P4 programs pasted in #4500 and #4507 (so if they weren't added in #4359, they should have been added there, not in this PR).

My main issue is that we are closing #4507 but it looks likes we are not adding the test case reported in it. For the other two issues we have added the tests already. We can add it here or in a separate PR.

@kfcripps kfcripps requested review from asl and removed request for asl July 18, 2024 14:14
class StorageFactory;
class LocationSet;

// A location in the program. Includes the context from the visitor, which needs to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Three slashes. Same goes the other comments.

…_set instead of std::set for cached_locs

Signed-off-by: Kyle Cripps <[email protected]>
Signed-off-by: Kyle Cripps <[email protected]>
Signed-off-by: Kyle Cripps <[email protected]>
@kfcripps kfcripps added this pull request to the merge queue Jul 19, 2024
Merged via the queue into p4lang:main with commit 18467ee Jul 19, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
6 participants