Use matrices in FloydWarshallState #38

jiahao · 2015-04-04T14:33:52Z

The internal data are now stored as dense matrices instead of Vector{Vector}.
No longer makes the unnecessary transformation from the former to the latter in the computation of floyd_warshall_shortest_paths.

Closes #37

coveralls · 2015-04-04T14:36:57Z

Coverage decreased (-0.01%) to 94.92% when pulling 74386dc on jiahao:cjh/floydmatrix into 19be9c7 on JuliaGraphs:master.

jiahao · 2015-04-04T14:39:43Z

I think you may want to adjust your failure threshold on the Coveralls test.

sbromberger · 2015-04-04T14:53:48Z

src/floyd-warshall.jl

-    dists = fill(convert(Float64,Inf), (n_v,n_v))
-    parents = zeros(Int, (n_v,n_v))
+    S = typeof(one(T) + one(T))
+    dists = fill(typemax(S), n_v, n_v)


This will break things down the line. I'd prefer to keep edge weights as Float64 since we have an isinf test for it. Can we get rid of the type parameterization here and keep it Float64?

I did think about it, but the only problem I could see is when the longest path length reaches typemax(T), which is only a problem for small integers like UInt8. The advantage of doing T<:Real is that Floyd-Warshall will work transparently on other types like BigFloats and Rationals.

The issue is that this will all break horribly if someone decides to use UInt8, and we have no "max test" like isinf for these other types (except for constructing something with typemax and kludging around that way).

I've avoided type parameterization in LightGraphs precisely because of the complexity it introduced with Graphs.jl. LightGraphs is designed to be quick, easy to understand, but not as flexible as Graphs.

@sbromberger does the failure in the case of UInt8 outweigh the benefit of being able to run with Rational edge weights?

I genuinely do not see how the type parameter use in this function leads to any additional user-facing complexity, loss of code readability or loss of performance. Appealing to the complexity of Graph.jl seems like a slippery slope argument to me - the complexity there comes from forcing users to assign types to everything, even to initialize an empty graph for metadata the user doesn't necessarily care about; here, the eltype is very naturally expressed as a tag for the user specified data, which is the entire point of Julia.

For each type, we need a value that represents "no edge exists". For floats, it's Inf, and we have Base.isinf as a test.

If we can do that with a single (preferably Base) function across all these parameterized types, I think there's an argument for this sort of flexibility.

Keep in mind that this will be the first type parameterization in LightGraphs. I am very worried about a slippery slope here. (It might be unfounded.)

Keep in mind that this will be the first type parameterization in LightGraphs.

Well, it turns out that this statement is false. There are already polymorphic constructors for Graph and Digraph. Furthermore, the type parameters in those functions probably should not be T<:Number, but rather T<:Real. The entries of adjacency matrices are counts of edges (possibly signed due to directedness), and I do not know of any nonintegral or nonreal generalizations of the notion of adjacency matrix. Distances must be real. So in either case the matrix must have real entries, as opposed to complex or something even more exotic.

Well, it turns out that this statement is false.

Yes, it is - my apologies. This was a recent commit to LightGraphs for @jpfairbanks' work with spectral graph theory. The reason they're Number rather than Integer (and this makes the case unequivocally for the proposed change we're discussing here) is because we wanted to be able to pass distance matrices - as well as adjacency matrices - in to create the graph.

Just out of interest, why can't one have complex distances?

It is not that the typemax should be a different type from "the number in question", rather, it is that the type of "the number in question" can change under addition

OK, but if we had the proposed isconnected function, then wouldn't we run into trouble if we looked at the typemax of the sum? That is, assume we wanted a distance matrix of Uint8 (humor me for a second). What value would we pick for "no edge between these vectors"? If we used the sum, it would have to be typemax(Uint64), which would be unintuitive to the user and would mean that he couldn't pass in a matrix of Uint8 for any incomplete graph.

For your isconnected function, are you think that this will based on dist[i,j] != inf? We could build isconnected on the output of running connected components and then having the output of connected components support a query of connectedness

g = Graph(somedata) state = connectedcomponents(g) for vtx1,vtx2 in someiterator if isconnected(state, vtx1, vtx2) f(vtx1,vtx2) end end

We should probably support this api regardless of what we do on F-W
Connected components should be much faster to compute than all pairs shortest paths. But once you have computed APSP you have all the information to answer connected components.

Also I agree with @jiahao on the type stuff. I would use rational arithmetic for making an illustrative example in a paper or when giving a class. I think people prefer to do fraction arithmetic over floating point arithmetic in their head.

isconnected is only for determining whether or not a given distance matrix element represents a valid distance when we've determined that values in a distance matrix need to be taken into account when performing shortest-paths calculations. For everything else we have a very fast has_edge (or, if you need something more bare-metal, finclist and binclist, but those I think might change to be adjacency lists instead of incidence lists if I can get the performance a touch better).

Keep in mind that distance matrices are deliberately outside the scope of LightGraphs, and the more code we write to support them the less clean the implementation becomes.

(Ah - and if we're talking about testing whether components (subgraphs) are connected, that's a different issue and one that I'd prefer to keep in a separate discussion thread. We should probably have a way to do this, but we need a "view" into the graph itself that describes subgraphs, and that's pretty complicated - and will break memory and speed efficiencies.)

sbromberger · 2015-04-04T15:01:50Z

@jiahao how do I change the sensitivity of coveralls? This has been a pain in the rear end for most other commits and I'd love to squelch the minor failures.

jiahao · 2015-04-04T15:02:40Z

There is a setting buried deep in Coveralls where you can adjust the threshold that defines a test failure. There is also another setting somewhere to turn off the comments.

sbromberger · 2015-04-04T15:04:58Z

Found it and changed so it won't comment. Decrease threshold is now 2% for failure.

jiahao · 2015-04-06T06:13:06Z

Would something like this work?
isconnected{T}(x::T) = (x < typemax(one(T)))

~~Not quite. You don't need the {T} in this function signature.~~ (edit: wrong, you do need it, since T is used in the function body.) Also, you can make it even faster by not creating an instance of one(T):

isconnected(x::T) = x < typemax(T)

If there is any performance degradation with this change, you can play with @inline at the call site to adjust the inlining behavior.

jiahao · 2015-04-06T06:15:58Z

(Should we just overload isinf?)

Definitely not. Generic functions in Julia should never be overloaded with use-case-specific definitions of convenience. If you do, you will propagate the newly defined method out to all code that uses isinf and quite possibly break other people's code the moment they do using LightGraphs.

sbromberger · 2015-04-06T08:13:26Z

I'm missing something here:

julia> isconnected(x::T) = x < typemax(T)
ERROR: UndefVarError: T not defined

Did you mean, perhaps, isconnected(x::Number) = x < typemax(x)? (Which breaks on BigInt.)

sbromberger · 2015-04-06T08:31:10Z

src/floyd-warshall.jl

    g::AbstractGraph;
-    edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0))
+    edge_dists::AbstractArray{T,2} = zeros(0,0)
 )

    use_dists = issparse(edge_dists)? nnz(edge_dists > 0) : !isempty(edge_dists)


Am I seeing a bug here? This should certainly be
use_dists = issparse(edge_dists)? nnz(edge_dists) > 0 : !isempty(edge_dists), no? I'll push a hotfix now.

We need to include tests that contains sparse graphs.

ETA: this was a bug. Fixed.

Fixed via #42.

jiahao · 2015-04-06T15:11:31Z

Oops, I meant isconnected{T}(x::T) = x < typemax(T) (forgot that T is used inside this function). You can qualify the type parameter with T<:Real but it is not necessary (T<:Number is too general because that includes complex numbers and other such things that cannot be ordered).

jiahao · 2015-04-06T15:14:37Z

I think we have to leave BigInts out. They don't have a maximal element, so Floyd-Warshall as normally stated cannot work on them. What one could do is set the sentinel value to nv + 1 if there are no distances specified, but then one wouldn't need to deal with BigInts anymore since everything internally is an Int.

sbromberger · 2015-04-06T15:27:13Z

What one could do is set the sentinel value to nv + 1 if there are no distances specified

If there are no distances specified then we don't use sentinel values or the distance matrix at all - we default in F-W to Array(Float64,(0,0)) which causes has_distances to return false, and then we just assume every edge has a distance of 1.0.

The other thing we might consider is using zero() as the sentinel value with one exception: if countnz() == 0, then we default to edge distances of 1.0. What do you think about that change? It would allow us to use any type that has a zero() and + defined.

jiahao · 2015-04-06T15:45:06Z

If there are no distances specified then we don't use sentinel values or the distance matrix at all

Yeah, that was what I was trying to say, sorry if that was unclear.

I'm not keen on the idea of zero() as a sentinel value. Why would

if countnz() == 0, then we default to edge distances of 1.0

work?

sbromberger · 2015-04-06T15:56:32Z

I'm not keen on the idea of zero() as a sentinel value. Why would

if countnz() == 0, then we default to edge distances of 1.0

work?

Because we'd explicitly test for it. That is, if you pass in an array of zeros (dense or sparse, doesn't matter), we treat that as "use the default distance for all edges", which is 1. The reason we can do this is because LightGraphs doesn't (currently) support zero-weighted edges, so an edge distance of 0 is really undefined right now.

sbromberger · 2015-04-15T18:11:19Z

@jiahao - any objection to moving forward with this, and using countnz() == 0 as the test as to whether we're using default edge distances? I'd like to incorporate your changes.

sbromberger · 2015-04-17T17:09:19Z

src/floyd-warshall.jl

    g::AbstractGraph;
-    edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0))
+    edge_dists::AbstractArray{T,2} = zeros(0,0)


Should this be zeros(T,0,0) instead?

No. A method definition with a default argument is exactly equivalent to two separate method definitions, one of which sets the default value.

f{T}(x::T=0) = x

is exactly equivalent to

f{T}(x::T) = x f() = f(0)

Therefore, the default argument must already specify a concrete type and cannot refer to a typevar. (You cannot have a method signature f{T}().)

Were you to change the method definition to f{T}(x::T=zero(T)), you would get a runtime error because T is not defined at runtime in the scope of the method that runs zero(T).

sbromberger · 2015-05-29T16:22:26Z

@jiahao are you interested in re-engaging in discussion? I'd like to move towards adopting this if you're still up for it. There are a couple things to work out, and then I think we're set....

jiahao · 2015-05-30T11:12:23Z

Yes I'd like to, but my time is very sparse until after JuliaCon.

jiahao · 2015-05-30T11:23:02Z

Regarding countnz() == 0, I think we can give it a try.

jiahao · 2015-05-30T11:23:33Z

Are there any other outstanding issues?

sbromberger · 2015-05-30T18:50:55Z

I'm trying a different approach per https://groups.google.com/d/msg/julia-users/K-RJznFeHHM/HWvwk9EdwoYJ:

# we use this type to represent an array of default distances - that is,
# this type simulates an array of a specified type but returns the one()
# of that type.
type DefaultDistance{T<:Number} <: AbstractArray{T,2}
end

@inline getindex{T<:Number}(::DefaultDistance{T}, ::Int, ::Int) = one(T)

This should allow us to use different types for edge distances.

sbromberger · 2015-05-30T20:18:09Z

... and my new code is twice as slow in Dijkstra as using Float64 for distances. :( Gonna see where I can optimize, but it looks like it's in the MutableBinaryHeap code, which I can't change.

jiahao · 2015-05-31T01:39:52Z

If you're using a binary heap, won't that mean that indexing into it will cost O(log N)?

I don't see the benefit of using anything other than a dense array, because the way FW works, the only structure you can take advantage of is if you have a disconnected forest - in every other case, you end up with a dense distance matrix. This is a fundamental property of FW which cannot be changed without using much fancier techniques such as fast multipole.

sbromberger · 2015-05-31T01:41:37Z

Oh - sorry for the confusion. I'm talking about Dijkstra, not F-W. The test was for the new typed distance measurements, and it slowed down Dijkstra by 100%. I think I've got it nailed now, though.

jiahao · 2015-05-31T01:53:46Z

Ok, let me know if there's anything left to discuss in this PR (which is FW only).

sbromberger · 2015-05-31T02:40:26Z

Well, if we can get the typed distance working, then that addresses one of the big issues above. Let me see what I can do and I'll report back.

sbromberger · 2015-05-31T17:34:46Z

Alright - I think I got it. Can you please check out https://github.com/JuliaGraphs/LightGraphs.jl/tree/newdijkstra2 ?

(The branch name is a bit of a misnomer - I did the F-W fixes as well - now typed!)

sbromberger · 2015-06-05T04:44:28Z

I'll close this out since I took care of several things in the latest commit, two of which were introduced here:

F-W now uses dense matrices; and
edge distances can be type-specific (no more Float64 only).

Use matrices in FloydWarshallState

74386dc

sbromberger reviewed Apr 4, 2015
View reviewed changes

sbromberger reviewed Apr 6, 2015
View reviewed changes

This was referenced Apr 6, 2015

Redesign floyd_warshall_shortest_paths? #37

Closed

fix bug in shortest path algorithms with use_dists; abstract that functi... #42

Merged

sbromberger reviewed Apr 17, 2015
View reviewed changes

sbromberger closed this Jun 5, 2015

Use matrices in FloydWarshallState #38

Use matrices in FloydWarshallState #38

Conversation

jiahao commented Apr 4, 2015

coveralls commented Apr 4, 2015

jiahao commented Apr 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbromberger commented Apr 4, 2015

jiahao commented Apr 4, 2015

sbromberger commented Apr 4, 2015

jiahao commented Apr 6, 2015

jiahao commented Apr 6, 2015

sbromberger commented Apr 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiahao commented Apr 6, 2015

jiahao commented Apr 6, 2015

sbromberger commented Apr 6, 2015

jiahao commented Apr 6, 2015

sbromberger commented Apr 6, 2015

sbromberger commented Apr 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbromberger commented May 29, 2015

jiahao commented May 30, 2015

jiahao commented May 30, 2015

jiahao commented May 30, 2015

sbromberger commented May 30, 2015

sbromberger commented May 30, 2015

jiahao commented May 31, 2015

sbromberger commented May 31, 2015

jiahao commented May 31, 2015

sbromberger commented May 31, 2015

sbromberger commented May 31, 2015

sbromberger commented Jun 5, 2015