Use cycle aware algorithm for propagating markers #4645

bluss · 2024-06-29T09:14:23Z

This is my attempt at this - I don't exactly have much prior experience with
Uv's codebase and w.r.t graph algorithms I'm pretty rusty.
Let's look at this critically and try to find out if it does the right thing.

The strongly connected components algorithm both finds all cyclic
subgraphs, and a topological order of them. This seems to be a good
way to replace Topo with something cycle-aware.

The new algorithm uses the fact that in a strongly connected component,
all vertices reach each other through some edge. That means we can
combine markers of all internal edges in a component.

As an optimization, only outgoing edges pointing out of the current
component are updated. I didn't look outside this function, so don't
know if that's a reasonable choice.

There are useful pictures in this article:
https://en.wikipedia.org/wiki/Strongly_connected_component

Test Plan

Using existing tests

petgraph::Graph is a delicate datastructure where node, edge identifiers are only stable across certain operations. Edges have to be removed carefully.

The strongly connected components algorithm both finds all cyclic subgraphs, and a topological order of them. This seems to be a good way to replace Topo with something cycle-aware. The new algorithm uses the fact that in a strongly connected component, all vertices reach each other through some edge. That means we can combine markers of all internal edges in a component. As an optimization, only outgoing edges pointing out of the current component are updated. I didn't look outside this function, so don't know if that's a reasonable choice.

Peiffap · 2024-06-29T17:02:51Z

Any reason in particular you chose Kosaraju's algorithm over Tarjan's? I thought the latter was faster.

charliermarsh

Thank you for looking at this -- I really appreciate it! I'm trying to reason through this with examples... Given this graph:

I think we want the marker on the blue node to be A and B and E, but with this strategy, would it be... (A or B or C or D) and E? Since we'd OR the edges into the scc along with all the edges within the scc?

bluss · 2024-06-29T17:16:53Z

Oops, I haven't given and/or any thought at all, just moved existing code around, so you're probably right.

If we take that picture, but you had two edge at the start, A and A', both with the same target.

Then you want to start with A or A'?
Then you update with the cycle (A or A') and B and C and D
Then you update E to the following: (A or A') and B and C and D and E?
Is this what you mean? Then I'd again need to make a difference between edge incoming to scc and edge inside the scc, which is easy to do.

charliermarsh · 2024-06-29T17:21:32Z

Same picture but more colors:

Each edge is a marker condition, and we want each node to have the complete marker condition required for it to be included in a a resolution... So I think the desired end state here is:

Yellow: A
Red: A and B
Green: A and B and C
Blue: A and B and E

Yellow could also be written as: A or (A and B and C and D), but that just simplifies to A. In general, what I was trying to do was: edges into a node are OR'd, edges out of a node are AND'd.

bluss · 2024-06-29T17:27:17Z

@Peiffap the reason was that this implementation is non-recursive so I know it don't have any stack issues. Don't know if it matters, though 🙂

bluss · 2024-06-29T17:32:36Z

Thanks. So I'm trying to find the rule for the cycle.

Before we propagate, what's the node weight for Yellow, it's already A? And for Red, already B?

charliermarsh · 2024-06-29T17:34:33Z

It starts off as None actually. Coming into the propagation, all the nodes have a None marker -- the markers only exist on the edges, but we want to compute the "full" marker for each node.

bluss · 2024-06-29T21:23:05Z

There is no nice property for each component/cyclic part as a unit. I think the current code (feedback arc set) seems ok so far then. I don't know how it selects edges to remove, so if it makes the wrong choice the result could be incorrect?

Based on two cases I've looked at, reverse postorder seems to work well, it would skip the edges we want to skip? (No definition - probably doesn't hold up) I think your graph has exactly one initial node (one root), which makes this easier.

Peiffap · 2024-06-29T22:42:51Z

I'm trying to follow along and understand what you are trying to do here (both with this particular implementation and the general propagation problem you are attempting to solve).

If I understand correctly, you are trying to find, for each node N in the graph, all possible paths starting from a root node (i.e. a node with no incoming edges) and ending in N, and then you want to combine the markers found along the various paths. Is that right?

charliermarsh · 2024-06-29T23:23:24Z

If I understand correctly, you are trying to find, for each node N in the graph, all possible paths starting from a root node (i.e. a node with no incoming edges) and ending in N, and then you want to combine the markers found along the various paths. Is that right?

Yes, that's right.

Peiffap · 2024-06-29T23:41:14Z

Thanks for confirming.
Okay. What's stopping us from simply doing the following?

DFS from the root nodes, propagating markers along as long as there is no cycle; when there is a cycle, you simply backtrack without propagating (because the propagation would lead to something like (markers to A) or ((markers to A) and (markers in cycle)), which simplifies to (markers to A)).

bluss · 2024-06-30T08:22:10Z

I was looking at this graph. It had to be a bit complicated to show that the simple approaches I was thinking of didn't work. (You could hope this graph is not realistic but I think it's a possible graph?)

Finding all paths without cycles in the path sounds right. For example, to the node 3 I think all these are valid:

A, D
B, C, F, G
B, C, F, H, J, D

You could imagine the two last paths being realistic and relevant when the marker A is inactive?

It's probably an optimization to run the all paths search in each SCC separately. I think the cycle-free thing means that we can ask for that no path has a prefix that is another complete path (to that node). It's only necessary to find paths starting with the incident edges to the SCC (we should always have some edge incident, because the root is never in a cycle, hopefully?)

@Peiffap So if we run dfs from both node 2 and 4, then we find all the required paths from 2 and 4 to 3. I think that the dfs from the root (0) would not find all the required paths to 3.

graph in Rust

    let mut gr = petgraph::Graph::new();
    let n0 = gr.add_node("0");
    let n1 = gr.add_node("1");
    let n2 = gr.add_node("2");
    let n3 = gr.add_node("3");
    let n4 = gr.add_node("4");
    let n5 = gr.add_node("5");
    let n6 = gr.add_node("6");
    let n7 = gr.add_node("7");
    let n8 = gr.add_node("8");
    gr.add_edge(n0, n2, "A");
    gr.add_edge(n0, n1, "B");
    gr.add_edge(n2, n3, "D");
    gr.add_edge(n3, n6, "I");
    gr.add_edge(n3, n4, "E");
    gr.add_edge(n1, n4, "C");
    gr.add_edge(n4, n5, "F");
    gr.add_edge(n5, n6, "H");
    gr.add_edge(n6, n2, "J");
    gr.add_edge(n5, n7, "K");
    gr.add_edge(n6, n8, "L");
    gr.add_edge(n5, n3, "G");

Peiffap · 2024-06-30T11:14:08Z

(You could hope this graph is not realistic but I think it's a possible graph?)

I'm not sure, Charlie should have a clearer picture of the kinds of graphs we can run into here.

Finding all paths without cycles in the path sounds right. For example, to the node 3 I think all these are valid:

A, D

B, C, F, G

B, C, F, H, J, D

You could imagine the two last paths being realistic and relevant when the marker A is inactive?

I agree, they should all be considered I think.

It's probably an optimization to run the all paths search in each SCC separately. I think the cycle-free thing means that we can ask for that no path has a prefix that is another complete path (to that node).

Yes, that's how I'm interpreting cycle-free too. Knowing an SCC basically allows us to say that node N in the SCC can be reached through ORing all possible paths to any of the nodes in the SCC, where each path represents an ANDing of the edges in it, then from that node to N.
For example (SCC with nodes A, B, N): if we know all the paths to A and B and we know that there is an SCC, we can already add

((path 1 to `A`) or (path 2 to `A`) ... (path n_a to `A`) and `A -> N`) or 
((path 1 to `B`) or (path 2 to `B`) ... (path n_b to `B`) and `B -> N`)

to N.
The thing I'm not sure about is that you need to do this -- as I see it, a proper DFS search (more on that later) should take care of discovering this already.

It's only necessary to find paths starting with the incident edges to the SCC (we should always have some edge incident, because the root is never in a cycle, hopefully?)

As I understand it, a node is a root node if and only if it has no incident edges. You shouldn't have any root nodes in cycles.

So if we run dfs from both node 2 and 4, then we find all the required paths from 2 and 4 to 3. I think that the dfs from the root (0) would not find all the required paths to 3.

Thanks for making the example. I meant to use a modified version of DFS that only backtracks when a node is visited twice in the current path. I believe that would work, i.e. it would eventually get the right answer, but I'm not sure what implications it has for complexity (we are effectively looking at all possible simple paths which is O(n!) for complete graphs... but that's kind of what we need to do?).

bluss · 2024-06-30T16:24:17Z

I looked at computing all non repeating paths here: petgraph/petgraph#650 It can be used on the whole graph or a subgraph (intention: used on one strongly connected component at a time.) I'm not sure if it does the useful thing we need, but it might.

I wanted to write the algorithm separately from the uv logic, I think it's better that way and easier to understand the code. (Not saying the code has to live in petgraph.)
In this case, paths are useful as Vec<EdgeId> (does this have a different name than path?) but in some other graph algorithms, paths are Vec<NodeId>.

As I understand it, a node is a root node if and only if it has no incident edges. You shouldn't have any root nodes in cycles.

I meant the root of the graph in uv's code - the white node in @charliermarsh's picture and node 0 in my picture. IIUC, this is always present and it means any nontrivial SCC (has more than one node) does not contain the root and always has edges incident to the SCC (from the root or from other predecessor scc). This is useful in my example - we know that there are incident edges, we only need to compute paths from 2 and 4 in the scc?

charliermarsh · 2024-06-30T20:51:38Z

@bluss - did you determine that doing what we did previously but with reverse-postorder rather than topo doesn't work?

Peiffap · 2024-06-30T23:10:19Z

That PR looks nice! I'll give it a more detailed look later, but it makes a lot of sense to me to focus on trails (i.e. no repeated edges) rather than paths (no repeated nodes).

I think you can make the assumption that the starting node never has incident edges; I believe it just represents a starting state, i.e. with no propagated markers. If that isn't the case yet for some reason, you should still be able to extend the graph by adding a new "root" node with no incident edges?

I think you're right about incident edges; for the example graph, 2 and 4 are the only possible entry points to the SCC, so you could limit yourself to computing the trails to those and then computing the trails starting in those to the rest of the SCC. Perhaps it also makes sense to only compute trails to nodes that have outgoing edges from the SCC (i.e. 5 and 6), since all the "internal" nodes (only 3 in your example) in the SCC will be covered by the trails from (2 or 4) to (5 or 6)?

charliermarsh · 2024-06-30T23:11:55Z

(One slight hitch is that we want to remove markers if you have an inbound edge that doesn't have any markers. Right now that's represented by None, but I'm considering using MarkerTree::And(vec![]) (the true marker) so that I can differentiate between "nodes we haven't visited yet" and "nodes that have the true marker).)

Peiffap · 2024-06-30T23:27:31Z

if you have an inbound edge that doesn't have any markers

Do you mean an inbound edge from the root node, or any inbound edge? If it's the former and you apply the change you're discussing, wouldn't you end up with true or ... (where the ellipsis is all the other trails to that node) which should get simplified to just true?

charliermarsh · 2024-06-30T23:34:00Z

Do you mean an inbound edge from the root node, or any inbound edge? If it's the former and you apply the change you're discussing, wouldn't you end up with true or ... (where the ellipsis is all the other trails to that node) which should get simplified to just true?

Yes that's right -- inbound edge from the root node (propagated outward), so we could simplify at the end.

bluss · 2024-07-01T17:11:16Z

@charliermarsh yes I think with my example graph it doesn't work to visit in that order.

bluss force-pushed the propagate-markers branch 2 times, most recently from b104dcc to f139e6b Compare June 29, 2024 09:28

bluss added 2 commits June 29, 2024 12:56

Remove edges from Graph in reverse order

8e5a293

petgraph::Graph is a delicate datastructure where node, edge identifiers are only stable across certain operations. Edges have to be removed carefully.

bluss force-pushed the propagate-markers branch from f139e6b to 7e09dd9 Compare June 29, 2024 10:56

charliermarsh reviewed Jun 29, 2024

View reviewed changes

charliermarsh mentioned this pull request Jun 29, 2024

Sort indexes during graph edge removal #4649

Merged

bluss mentioned this pull request Jun 30, 2024

New algorithm all_non_repeating_paths_from petgraph/petgraph#650

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cycle aware algorithm for propagating markers #4645

Use cycle aware algorithm for propagating markers #4645

bluss commented Jun 29, 2024 •

edited

Loading

Peiffap commented Jun 29, 2024

charliermarsh left a comment

bluss commented Jun 29, 2024 •

edited

Loading

charliermarsh commented Jun 29, 2024

bluss commented Jun 29, 2024

bluss commented Jun 29, 2024

charliermarsh commented Jun 29, 2024

bluss commented Jun 29, 2024 •

edited

Loading

Peiffap commented Jun 29, 2024

charliermarsh commented Jun 29, 2024

Peiffap commented Jun 29, 2024

bluss commented Jun 30, 2024

Peiffap commented Jun 30, 2024

bluss commented Jun 30, 2024 •

edited

Loading

charliermarsh commented Jun 30, 2024

Peiffap commented Jun 30, 2024

charliermarsh commented Jun 30, 2024

Peiffap commented Jun 30, 2024

charliermarsh commented Jun 30, 2024

bluss commented Jul 1, 2024

Use cycle aware algorithm for propagating markers #4645

Are you sure you want to change the base?

Use cycle aware algorithm for propagating markers #4645

Conversation

bluss commented Jun 29, 2024 • edited Loading

Test Plan

Peiffap commented Jun 29, 2024

charliermarsh left a comment

Choose a reason for hiding this comment

bluss commented Jun 29, 2024 • edited Loading

charliermarsh commented Jun 29, 2024

bluss commented Jun 29, 2024

bluss commented Jun 29, 2024

charliermarsh commented Jun 29, 2024

bluss commented Jun 29, 2024 • edited Loading

Peiffap commented Jun 29, 2024

charliermarsh commented Jun 29, 2024

Peiffap commented Jun 29, 2024

bluss commented Jun 30, 2024

Peiffap commented Jun 30, 2024

bluss commented Jun 30, 2024 • edited Loading

charliermarsh commented Jun 30, 2024

Peiffap commented Jun 30, 2024

charliermarsh commented Jun 30, 2024

Peiffap commented Jun 30, 2024

charliermarsh commented Jun 30, 2024

bluss commented Jul 1, 2024

bluss commented Jun 29, 2024 •

edited

Loading

bluss commented Jun 29, 2024 •

edited

Loading

bluss commented Jun 29, 2024 •

edited

Loading

bluss commented Jun 30, 2024 •

edited

Loading