Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Use matrices in FloydWarshallState #38

Closed
wants to merge 1 commit into from

Conversation

jiahao
Copy link
Contributor

@jiahao jiahao commented Apr 4, 2015

The internal data are now stored as dense matrices instead of Vector{Vector}.
No longer makes the unnecessary transformation from the former to the latter in the computation of floyd_warshall_shortest_paths.

Closes #37

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 94.92% when pulling 74386dc on jiahao:cjh/floydmatrix into 19be9c7 on JuliaGraphs:master.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 4, 2015

I think you may want to adjust your failure threshold on the Coveralls test.

dists = fill(convert(Float64,Inf), (n_v,n_v))
parents = zeros(Int, (n_v,n_v))
S = typeof(one(T) + one(T))
dists = fill(typemax(S), n_v, n_v)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break things down the line. I'd prefer to keep edge weights as Float64 since we have an isinf test for it. Can we get rid of the type parameterization here and keep it Float64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about it, but the only problem I could see is when the longest path length reaches typemax(T), which is only a problem for small integers like UInt8. The advantage of doing T<:Real is that Floyd-Warshall will work transparently on other types like BigFloats and Rationals.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that this will all break horribly if someone decides to use UInt8, and we have no "max test" like isinf for these other types (except for constructing something with typemax and kludging around that way).

I've avoided type parameterization in LightGraphs precisely because of the complexity it introduced with Graphs.jl. LightGraphs is designed to be quick, easy to understand, but not as flexible as Graphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbromberger does the failure in the case of UInt8 outweigh the benefit of being able to run with Rational edge weights?

I genuinely do not see how the type parameter use in this function leads to any additional user-facing complexity, loss of code readability or loss of performance. Appealing to the complexity of Graph.jl seems like a slippery slope argument to me - the complexity there comes from forcing users to assign types to everything, even to initialize an empty graph for metadata the user doesn't necessarily care about; here, the eltype is very naturally expressed as a tag for the user specified data, which is the entire point of Julia.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each type, we need a value that represents "no edge exists". For floats, it's Inf, and we have Base.isinf as a test.

If we can do that with a single (preferably Base) function across all these parameterized types, I think there's an argument for this sort of flexibility.

Keep in mind that this will be the first type parameterization in LightGraphs. I am very worried about a slippery slope here. (It might be unfounded.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that this will be the first type parameterization in LightGraphs.

Well, it turns out that this statement is false. There are already polymorphic constructors for Graph and Digraph. Furthermore, the type parameters in those functions probably should not be T<:Number, but rather T<:Real. The entries of adjacency matrices are counts of edges (possibly signed due to directedness), and I do not know of any nonintegral or nonreal generalizations of the notion of adjacency matrix. Distances must be real. So in either case the matrix must have real entries, as opposed to complex or something even more exotic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out that this statement is false.

Yes, it is - my apologies. This was a recent commit to LightGraphs for @jpfairbanks' work with spectral graph theory. The reason they're Number rather than Integer (and this makes the case unequivocally for the proposed change we're discussing here) is because we wanted to be able to pass distance matrices - as well as adjacency matrices - in to create the graph.

Just out of interest, why can't one have complex distances?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not that the typemax should be a different type from "the number in question", rather, it is that the type of "the number in question" can change under addition

OK, but if we had the proposed isconnected function, then wouldn't we run into trouble if we looked at the typemax of the sum? That is, assume we wanted a distance matrix of Uint8 (humor me for a second). What value would we pick for "no edge between these vectors"? If we used the sum, it would have to be typemax(Uint64), which would be unintuitive to the user and would mean that he couldn't pass in a matrix of Uint8 for any incomplete graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your isconnected function, are you think that this will based on dist[i,j] != inf? We could build isconnected on the output of running connected components and then having the output of connected components support a query of connectedness

g = Graph(somedata)
state = connectedcomponents(g)
for vtx1,vtx2 in someiterator
    if isconnected(state, vtx1, vtx2)
         f(vtx1,vtx2)
    end
end

We should probably support this api regardless of what we do on F-W
Connected components should be much faster to compute than all pairs shortest paths. But once you have computed APSP you have all the information to answer connected components.

Also I agree with @jiahao on the type stuff. I would use rational arithmetic for making an illustrative example in a paper or when giving a class. I think people prefer to do fraction arithmetic over floating point arithmetic in their head.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isconnected is only for determining whether or not a given distance matrix element represents a valid distance when we've determined that values in a distance matrix need to be taken into account when performing shortest-paths calculations. For everything else we have a very fast has_edge (or, if you need something more bare-metal, finclist and binclist, but those I think might change to be adjacency lists instead of incidence lists if I can get the performance a touch better).

Keep in mind that distance matrices are deliberately outside the scope of LightGraphs, and the more code we write to support them the less clean the implementation becomes.

(Ah - and if we're talking about testing whether components (subgraphs) are connected, that's a different issue and one that I'd prefer to keep in a separate discussion thread. We should probably have a way to do this, but we need a "view" into the graph itself that describes subgraphs, and that's pretty complicated - and will break memory and speed efficiencies.)

@sbromberger
Copy link
Owner

@jiahao how do I change the sensitivity of coveralls? This has been a pain in the rear end for most other commits and I'd love to squelch the minor failures.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 4, 2015

There is a setting buried deep in Coveralls where you can adjust the threshold that defines a test failure. There is also another setting somewhere to turn off the comments.

@sbromberger
Copy link
Owner

Found it and changed so it won't comment. Decrease threshold is now 2% for failure.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 6, 2015

Would something like this work?
isconnected{T}(x::T) = (x < typemax(one(T)))

Not quite. You don't need the {T} in this function signature. (edit: wrong, you do need it, since T is used in the function body.) Also, you can make it even faster by not creating an instance of one(T):

isconnected(x::T) = x < typemax(T)

If there is any performance degradation with this change, you can play with @inline at the call site to adjust the inlining behavior.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 6, 2015

(Should we just overload isinf?)

Definitely not. Generic functions in Julia should never be overloaded with use-case-specific definitions of convenience. If you do, you will propagate the newly defined method out to all code that uses isinf and quite possibly break other people's code the moment they do using LightGraphs.

@sbromberger
Copy link
Owner

I'm missing something here:

julia> isconnected(x::T) = x < typemax(T)
ERROR: UndefVarError: T not defined

Did you mean, perhaps, isconnected(x::Number) = x < typemax(x)? (Which breaks on BigInt.)

g::AbstractGraph;
edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0))
edge_dists::AbstractArray{T,2} = zeros(0,0)
)

use_dists = issparse(edge_dists)? nnz(edge_dists > 0) : !isempty(edge_dists)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I seeing a bug here? This should certainly be
use_dists = issparse(edge_dists)? nnz(edge_dists) > 0 : !isempty(edge_dists), no? I'll push a hotfix now.

We need to include tests that contains sparse graphs.

ETA: this was a bug. Fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed via #42.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 6, 2015

Oops, I meant isconnected{T}(x::T) = x < typemax(T) (forgot that T is used inside this function). You can qualify the type parameter with T<:Real but it is not necessary (T<:Number is too general because that includes complex numbers and other such things that cannot be ordered).

@jiahao
Copy link
Contributor Author

jiahao commented Apr 6, 2015

I think we have to leave BigInts out. They don't have a maximal element, so Floyd-Warshall as normally stated cannot work on them. What one could do is set the sentinel value to nv + 1 if there are no distances specified, but then one wouldn't need to deal with BigInts anymore since everything internally is an Int.

@sbromberger
Copy link
Owner

What one could do is set the sentinel value to nv + 1 if there are no distances specified

If there are no distances specified then we don't use sentinel values or the distance matrix at all - we default in F-W to Array(Float64,(0,0)) which causes has_distances to return false, and then we just assume every edge has a distance of 1.0.

The other thing we might consider is using zero() as the sentinel value with one exception: if countnz() == 0, then we default to edge distances of 1.0. What do you think about that change? It would allow us to use any type that has a zero() and + defined.

@jiahao
Copy link
Contributor Author

jiahao commented Apr 6, 2015

If there are no distances specified then we don't use sentinel values or the distance matrix at all

Yeah, that was what I was trying to say, sorry if that was unclear.

I'm not keen on the idea of zero() as a sentinel value. Why would

if countnz() == 0, then we default to edge distances of 1.0

work?

@sbromberger
Copy link
Owner

I'm not keen on the idea of zero() as a sentinel value. Why would

if countnz() == 0, then we default to edge distances of 1.0

work?

Because we'd explicitly test for it. That is, if you pass in an array of zeros (dense or sparse, doesn't matter), we treat that as "use the default distance for all edges", which is 1. The reason we can do this is because LightGraphs doesn't (currently) support zero-weighted edges, so an edge distance of 0 is really undefined right now.

@sbromberger
Copy link
Owner

@jiahao - any objection to moving forward with this, and using countnz() == 0 as the test as to whether we're using default edge distances? I'd like to incorporate your changes.

g::AbstractGraph;
edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0))
edge_dists::AbstractArray{T,2} = zeros(0,0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be zeros(T,0,0) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. A method definition with a default argument is exactly equivalent to two separate method definitions, one of which sets the default value.

f{T}(x::T=0) = x

is exactly equivalent to

f{T}(x::T) = x
f() = f(0)

Therefore, the default argument must already specify a concrete type and cannot refer to a typevar. (You cannot have a method signature f{T}().)

Were you to change the method definition to f{T}(x::T=zero(T)), you would get a runtime error because T is not defined at runtime in the scope of the method that runs zero(T).

@sbromberger
Copy link
Owner

@jiahao are you interested in re-engaging in discussion? I'd like to move towards adopting this if you're still up for it. There are a couple things to work out, and then I think we're set....

@jiahao
Copy link
Contributor Author

jiahao commented May 30, 2015

Yes I'd like to, but my time is very sparse until after JuliaCon.

@jiahao
Copy link
Contributor Author

jiahao commented May 30, 2015

Regarding countnz() == 0, I think we can give it a try.

@jiahao
Copy link
Contributor Author

jiahao commented May 30, 2015

Are there any other outstanding issues?

@sbromberger
Copy link
Owner

I'm trying a different approach per https://groups.google.com/d/msg/julia-users/K-RJznFeHHM/HWvwk9EdwoYJ:

# we use this type to represent an array of default distances - that is,
# this type simulates an array of a specified type but returns the one()
# of that type.
type DefaultDistance{T<:Number} <: AbstractArray{T,2}
end

@inline getindex{T<:Number}(::DefaultDistance{T}, ::Int, ::Int) = one(T)

This should allow us to use different types for edge distances.

@sbromberger
Copy link
Owner

... and my new code is twice as slow in Dijkstra as using Float64 for distances. :( Gonna see where I can optimize, but it looks like it's in the MutableBinaryHeap code, which I can't change.

@jiahao
Copy link
Contributor Author

jiahao commented May 31, 2015

If you're using a binary heap, won't that mean that indexing into it will cost O(log N)?

I don't see the benefit of using anything other than a dense array, because the way FW works, the only structure you can take advantage of is if you have a disconnected forest - in every other case, you end up with a dense distance matrix. This is a fundamental property of FW which cannot be changed without using much fancier techniques such as fast multipole.

@sbromberger
Copy link
Owner

Oh - sorry for the confusion. I'm talking about Dijkstra, not F-W. The test was for the new typed distance measurements, and it slowed down Dijkstra by 100%. I think I've got it nailed now, though.

@jiahao
Copy link
Contributor Author

jiahao commented May 31, 2015

Ok, let me know if there's anything left to discuss in this PR (which is FW only).

@sbromberger
Copy link
Owner

Well, if we can get the typed distance working, then that addresses one of the big issues above. Let me see what I can do and I'll report back.

@sbromberger
Copy link
Owner

Alright - I think I got it. Can you please check out https://github.com/JuliaGraphs/LightGraphs.jl/tree/newdijkstra2 ?

(The branch name is a bit of a misnomer - I did the F-W fixes as well - now typed!)

@sbromberger
Copy link
Owner

I'll close this out since I took care of several things in the latest commit, two of which were introduced here:

  1. F-W now uses dense matrices; and
  2. edge distances can be type-specific (no more Float64 only).

@sbromberger sbromberger closed this Jun 5, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redesign floyd_warshall_shortest_paths?
4 participants