-
Notifications
You must be signed in to change notification settings - Fork 245
Adding task-based simulator for PCGs #1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Stylistically it matches the existing code, which is much appreciated 🙂
Obviously tests need to get finished off and there are probably some minor edge cases to be handled, but as an initial draft this looks really good
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/src/machine_mapping.cc line 123 at r1 (raw file):
std::unordered_set<UpwardOpenMultiDiEdge> incoming_edges = get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform(
If the ordering here doesn't matter why is this a vector?
lib/compiler/src/machine_mapping.cc line 124 at r1 (raw file):
get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform( as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {
Suggestion:
as_vector(get_incoming_edges(g, node)), [&](UpwardOpenMultiDiEdge const &input_edge) {lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Suggestion:
for (auto const &[edge, _] : frontier_machine_views) {lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Maybe cleaner to just get the set of nodes with no incoming closed edges?
lib/compiler/src/machine_mapping.cc line 176 at r1 (raw file):
if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) { return occupied[d] == false })) {
See all_of in containers.h
Code quote:
if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) {
return occupied[d] == false
})) {lib/compiler/test/src/test_parallel_cost_estimator.cpp line 10 at r1 (raw file):
TEST_CASE("parallel_estimate_cost") { // Test graph structure // /-- 1 --\
Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against
lib/compiler/test/src/test_parallel_cost_estimator.cpp line 11 at r1 (raw file):
// Test graph structure // /-- 1 --\ // 0 - - 3
Suggestion:
// 0 - - 3lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):
std::unordered_map<Node, std::unordered_set<Node>> get_successors(DiGraphView const &g,
Alternatively, implement this as get_predecessors on the flipped graph
wmdi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 4 of 5 files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)
… and related objects
Marsella8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 7 files reviewed, 8 unresolved discussions (waiting on @lockshaw)
lib/compiler/src/machine_mapping.cc line 124 at r1 (raw file):
get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform( as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {
Done.
lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Done.
lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Maybe cleaner to just get the set of nodes with no incoming closed edges?
Done.
lib/compiler/src/machine_mapping.cc line 176 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See
all_ofincontainers.h
Done.
lib/compiler/test/src/test_parallel_cost_estimator.cpp line 10 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against
Done.
lib/compiler/test/src/test_parallel_cost_estimator.cpp line 11 at r1 (raw file):
// Test graph structure // /-- 1 --\ // 0 - - 3
Done.
wmdi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)
Marsella8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 13 files reviewed, 8 unresolved discussions (waiting on @lockshaw, @Marsella8, and @wmdi)
lib/compiler/src/machine_mapping.cc line 123 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
If the ordering here doesn't matter why is this a
vector?
Done.
lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Alternatively, implement this as
get_predecessorson the flipped graph
Done.
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r4, 5 of 9 files at r5, all commit messages.
Reviewable status: 9 of 13 files reviewed, 6 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):
using SubParallelComputationGraphView = OutputLabelledOpenMultiDiGraphView<Operator, ParallelTensor>;
Prefer dtgen-wrapped type (see current SubParallelComputationGraph, which honestly should probably be a View anyway so feel free to just change it to be a View)
lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):
} struct TimedNode { // Node and associated finishing time
Prefer dtgen over visitable
lib/compiler/src/cost_estimator.cc line 68 at r5 (raw file):
std::unordered_set<Node> copy(frontier); for (Node const &node : copy) { auto views = device_mapping.machine_views.at(node);
Prefer actual type over auto
Code quote:
autolib/compiler/src/cost_estimator.cc line 73 at r5 (raw file):
devices.begin(), devices.end(), [&occupied](device_id_t d) { return occupied[d] == false; })) {
Prefer functions from containers.h
Code quote:
if (std::all_of(
devices.begin(), devices.end(), [&occupied](device_id_t d) {
return occupied[d] == false;
})) {lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):
})) { float cost = node_estimate_cost(node, g, estimator, device_mapping); processing.push({node, current_time + cost});
Pulling some of this out into separate lambdas might improve readability (say, a lambda named start_node_processing)
lib/pcg/src/device_id.cc line 19 at r5 (raw file):
// Most likely not the best way to do it. device_id_t operator+(device_id_t device, size_t increment) { if (std::holds_alternative<gpu_id_t>(device)) {
Refactor to use dtgen (but otherwise this isn't bad, just replace the static_casts with actual field accesses)
Is this an operation we actually want to support for device_id_t, or should this just be done in the context of other functions?
lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):
Previously, Marsella8 wrote…
Done.
Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)
Marsella8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, 2 of 8 files at r4, 9 of 9 files at r5.
Reviewable status: 11 of 14 files reviewed, 6 unresolved discussions (waiting on @lockshaw)
lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer dtgen-wrapped type (see current
SubParallelComputationGraph, which honestly should probably be aViewanyway so feel free to just change it to be aView)
Tried to sync my branch so as to dtgen and wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. the machine-related one)?
lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
dtgenovervisitable
See comment above
Also, we will always use dtgen going forward for all structs, including the ones that are only for private use / limited to a single module right?
lib/compiler/src/cost_estimator.cc line 68 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer actual type over
auto
Done.
lib/compiler/src/cost_estimator.cc line 73 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer functions from
containers.h
Done.
lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Pulling some of this out into separate lambdas might improve readability (say, a lambda named
start_node_processing)
Done, let me know if this is ok.
As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?
lib/pcg/src/device_id.cc line 19 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Refactor to use
dtgen(but otherwise this isn't bad, just replace thestatic_casts with actual field accesses)Is this an operation we actually want to support for
device_id_t, or should this just be done in the context of other functions?
Due to problems above with dtgen, will clean up everything into the machine-related branch.
I don't think we'll ever intend for the user to directly increment device_id_t types, but it is useful for the functions that are part of machine_views.cc and associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)
lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)
Now it's changed.
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @Marsella8)
lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):
Previously, Marsella8 wrote…
Tried to sync my branch so as to
dtgenand wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. themachine-relatedone)?
You're going to need to merge with repo-refactor anyway, so if you can't get dtgen to work I'm guessing the branch won't be mergeable. That said, if you want I'm already making the change to this specific struct in a different PR, so you can skip this one (though it would still be good to update the other structs such as TimedNode in this PR)
lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):
Previously, Marsella8 wrote…
See comment above
Also, we will always use
dtgengoing forward for all structs, including the ones that are only for private use / limited to a single module right?
I'd say use it when you can--it only captures a limited subset of what C++ structs can do (e.g., no methods, etc.), but it also removes a lot of difficulty of making your structs support expected behaviors (printability, equality, hashing, etc.) yourself, so if you struct fits in its semantic model I'd say use it
lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):
Previously, Marsella8 wrote…
Done, let me know if this is ok.
As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?
We use lambdas for functions that aren't reusable across the codebase. They're especially useful when you want to manipulate local variables, like occupied here, as the lambdas can manipulate them directly. When in doubt I'd say start with lambdas
lib/compiler/src/cost_estimator.cc line 41 at r6 (raw file):
}; bool predecessors_have_been_processed(
This should be a lambda
lib/compiler/src/cost_estimator.cc line 44 at r6 (raw file):
std::unordered_set<Node> const &predecessors, std::unordered_set<TimedNode> processed) { std::unordered_set<Node> simple_processed =
processed_nodes?
Code quote:
simple_processedlib/compiler/src/cost_estimator.cc line 47 at r6 (raw file):
transform(processed, [](TimedNode const &tn) { return tn.node; }); return all_of(predecessors, [&simple_processed](Node p) {
Suggestion:
return all_of(predecessors, [&](Node p) {lib/compiler/src/cost_estimator.cc line 48 at r6 (raw file):
return all_of(predecessors, [&simple_processed](Node p) { return simple_processed.find(p) != simple_processed.end();
contains from utils/containers.h
lib/compiler/src/cost_estimator.cc line 55 at r6 (raw file):
MachineMapping const &device_mapping) { return device_mapping.machine_views.at(node).device_ids(); }
Should be moved to the corresponding header for MachineMapping
Code quote:
std::vector<device_id_t> get_devices(Node const &node,
MachineMapping const &device_mapping) {
return device_mapping.machine_views.at(node).device_ids();
}lib/compiler/src/cost_estimator.cc line 106 at r6 (raw file):
std::vector<device_id_t> devices = get_devices(node, device_mapping); if (all_of(devices, [&occupied](device_id_t d) { return occupied[d] == false; })) {
Don't worry about explicitly managing lambda references unless there's a good reason (e.g., they're going to live outside of local scope)
Suggestion:
[&](device_id_t d) { return occupied[d] == false; })) {lib/compiler/src/cost_estimator.cc line 117 at r6 (raw file):
finish_node_processing(finished); // Adding candidates to the frontier
FYI these comments aren't really adding a huge amount here and can probably be removed. With the lambdas added the code is readable enough that I don't think they're contributing much over the code itself
lib/compiler/src/cost_estimator.cc line 121 at r6 (raw file):
std::unordered_set<Node> predecessors = get_predecessors(g, successor); if (predecessors_have_been_processed(predecessors, processed)) {
Why not just have predecessors_have_been_processed call get_predecessors directly?
Code quote:
predecessors_have_been_processed(predecessors, processed)lib/compiler/test/CMakeLists.txt line 13 at r6 (raw file):
doctest utils-test-common pcg
Shouldn't be necessary, pcg is already pulled in by compiler (but this isn't a huge deal)
lib/pcg/src/device_id.cc line 19 at r5 (raw file):
Previously, Marsella8 wrote…
Due to problems above with
dtgen, will clean up everything into themachine-relatedbranch.I don't think we'll ever intend for the user to directly increment
device_id_ttypes, but it is useful for the functions that are part ofmachine_views.ccand associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)
In that case I'd just do the unwrapping manually in machine_views.cc, etc. Only expose an add operator if it's actually a valid operation to add device ids in general, which I'm not sure is the case (though I could be convinced of)
lib/pcg/src/device_id.cc line 17 at r6 (raw file):
} // Most likely not the best way to do it.
?
lib/pcg/src/machine_view.cc line 30 at r6 (raw file):
StridedRectangleSide side = this->rect.at(ff_dim_t{0}); for (device_id_t id = this->start; id < this->start + side.get_num_points(); id = id + 1) {
The stride of the side doesn't appear to be used?
lib/pcg/src/machine_view.cc line 30 at r6 (raw file):
StridedRectangleSide side = this->rect.at(ff_dim_t{0}); for (device_id_t id = this->start; id < this->start + side.get_num_points(); id = id + 1) {
Suggestion:
id++) {lib/pcg/src/machine_view.cc line 33 at r6 (raw file):
ids.push_back(id); } return ids;
Either move the declaration of ids in a scope or the return out a scope, having them in different scopes is weird and confusing
lib/pcg/src/strided_rectangle.cc line 21 at r6 (raw file):
StridedRectangleSide side = this->sides.at(dim); return side; }
Suggestion:
StridedRectangleSide StridedRectangle::at(ff_dim_t const &dim) const {
return this->sides.at(dim);
}|
Closing in favor of #1565. |
Description of changes:
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is