-
Notifications
You must be signed in to change notification settings - Fork 185
Conversation
I think you may want to adjust your failure threshold on the Coveralls test. |
dists = fill(convert(Float64,Inf), (n_v,n_v)) | ||
parents = zeros(Int, (n_v,n_v)) | ||
S = typeof(one(T) + one(T)) | ||
dists = fill(typemax(S), n_v, n_v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break things down the line. I'd prefer to keep edge weights as Float64 since we have an isinf
test for it. Can we get rid of the type parameterization here and keep it Float64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about it, but the only problem I could see is when the longest path length reaches typemax(T)
, which is only a problem for small integers like UInt8
. The advantage of doing T<:Real
is that Floyd-Warshall will work transparently on other types like BigFloat
s and Rational
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that this will all break horribly if someone decides to use UInt8, and we have no "max test" like isinf for these other types (except for constructing something with typemax
and kludging around that way).
I've avoided type parameterization in LightGraphs precisely because of the complexity it introduced with Graphs.jl. LightGraphs is designed to be quick, easy to understand, but not as flexible as Graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbromberger does the failure in the case of UInt8
outweigh the benefit of being able to run with Rational
edge weights?
I genuinely do not see how the type parameter use in this function leads to any additional user-facing complexity, loss of code readability or loss of performance. Appealing to the complexity of Graph.jl seems like a slippery slope argument to me - the complexity there comes from forcing users to assign types to everything, even to initialize an empty graph for metadata the user doesn't necessarily care about; here, the eltype is very naturally expressed as a tag for the user specified data, which is the entire point of Julia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each type, we need a value that represents "no edge exists". For floats, it's Inf
, and we have Base.isinf
as a test.
If we can do that with a single (preferably Base) function across all these parameterized types, I think there's an argument for this sort of flexibility.
Keep in mind that this will be the first type parameterization in LightGraphs. I am very worried about a slippery slope here. (It might be unfounded.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that this will be the first type parameterization in LightGraphs.
Well, it turns out that this statement is false. There are already polymorphic constructors for Graph and Digraph. Furthermore, the type parameters in those functions probably should not be T<:Number
, but rather T<:Real
. The entries of adjacency matrices are counts of edges (possibly signed due to directedness), and I do not know of any nonintegral or nonreal generalizations of the notion of adjacency matrix. Distances must be real. So in either case the matrix must have real entries, as opposed to complex or something even more exotic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it turns out that this statement is false.
Yes, it is - my apologies. This was a recent commit to LightGraphs for @jpfairbanks' work with spectral graph theory. The reason they're Number
rather than Integer
(and this makes the case unequivocally for the proposed change we're discussing here) is because we wanted to be able to pass distance matrices - as well as adjacency matrices - in to create the graph.
Just out of interest, why can't one have complex distances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not that the typemax should be a different type from "the number in question", rather, it is that the type of "the number in question" can change under addition
OK, but if we had the proposed isconnected
function, then wouldn't we run into trouble if we looked at the typemax
of the sum? That is, assume we wanted a distance matrix of Uint8
(humor me for a second). What value would we pick for "no edge between these vectors"? If we used the sum, it would have to be typemax(Uint64)
, which would be unintuitive to the user and would mean that he couldn't pass in a matrix of Uint8
for any incomplete graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your isconnected
function, are you think that this will based on dist[i,j] != inf
? We could build isconnected
on the output of running connected components and then having the output of connected components support a query of connectedness
g = Graph(somedata)
state = connectedcomponents(g)
for vtx1,vtx2 in someiterator
if isconnected(state, vtx1, vtx2)
f(vtx1,vtx2)
end
end
We should probably support this api regardless of what we do on F-W
Connected components should be much faster to compute than all pairs shortest paths. But once you have computed APSP you have all the information to answer connected components.
Also I agree with @jiahao on the type stuff. I would use rational arithmetic for making an illustrative example in a paper or when giving a class. I think people prefer to do fraction arithmetic over floating point arithmetic in their head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isconnected
is only for determining whether or not a given distance matrix element represents a valid distance when we've determined that values in a distance matrix need to be taken into account when performing shortest-paths calculations. For everything else we have a very fast has_edge
(or, if you need something more bare-metal, finclist and binclist, but those I think might change to be adjacency lists instead of incidence lists if I can get the performance a touch better).
Keep in mind that distance matrices are deliberately outside the scope of LightGraphs, and the more code we write to support them the less clean the implementation becomes.
(Ah - and if we're talking about testing whether components (subgraphs) are connected, that's a different issue and one that I'd prefer to keep in a separate discussion thread. We should probably have a way to do this, but we need a "view" into the graph itself that describes subgraphs, and that's pretty complicated - and will break memory and speed efficiencies.)
@jiahao how do I change the sensitivity of coveralls? This has been a pain in the rear end for most other commits and I'd love to squelch the minor failures. |
There is a setting buried deep in Coveralls where you can adjust the threshold that defines a test failure. There is also another setting somewhere to turn off the comments. |
Found it and changed so it won't comment. Decrease threshold is now 2% for failure. |
isconnected(x::T) = x < typemax(T) If there is any performance degradation with this change, you can play with |
Definitely not. Generic functions in Julia should never be overloaded with use-case-specific definitions of convenience. If you do, you will propagate the newly defined method out to all code that uses |
I'm missing something here:
Did you mean, perhaps, |
g::AbstractGraph; | ||
edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0)) | ||
edge_dists::AbstractArray{T,2} = zeros(0,0) | ||
) | ||
|
||
use_dists = issparse(edge_dists)? nnz(edge_dists > 0) : !isempty(edge_dists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I seeing a bug here? This should certainly be
use_dists = issparse(edge_dists)? nnz(edge_dists) > 0 : !isempty(edge_dists)
, no? I'll push a hotfix now.
We need to include tests that contains sparse graphs.
ETA: this was a bug. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via #42.
Oops, I meant |
I think we have to leave |
If there are no distances specified then we don't use sentinel values or the distance matrix at all - we default in F-W to The other thing we might consider is using |
Yeah, that was what I was trying to say, sorry if that was unclear. I'm not keen on the idea of
work? |
Because we'd explicitly test for it. That is, if you pass in an array of zeros (dense or sparse, doesn't matter), we treat that as "use the default distance for all edges", which is 1. The reason we can do this is because LightGraphs doesn't (currently) support zero-weighted edges, so an edge distance of 0 is really undefined right now. |
@jiahao - any objection to moving forward with this, and using |
g::AbstractGraph; | ||
edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0)) | ||
edge_dists::AbstractArray{T,2} = zeros(0,0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be zeros(T,0,0)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. A method definition with a default argument is exactly equivalent to two separate method definitions, one of which sets the default value.
f{T}(x::T=0) = x
is exactly equivalent to
f{T}(x::T) = x
f() = f(0)
Therefore, the default argument must already specify a concrete type and cannot refer to a typevar. (You cannot have a method signature f{T}()
.)
Were you to change the method definition to f{T}(x::T=zero(T))
, you would get a runtime error because T
is not defined at runtime in the scope of the method that runs zero(T)
.
@jiahao are you interested in re-engaging in discussion? I'd like to move towards adopting this if you're still up for it. There are a couple things to work out, and then I think we're set.... |
Yes I'd like to, but my time is very sparse until after JuliaCon. |
Regarding |
Are there any other outstanding issues? |
I'm trying a different approach per https://groups.google.com/d/msg/julia-users/K-RJznFeHHM/HWvwk9EdwoYJ:
This should allow us to use different types for edge distances. |
... and my new code is twice as slow in Dijkstra as using Float64 for distances. :( Gonna see where I can optimize, but it looks like it's in the MutableBinaryHeap code, which I can't change. |
If you're using a binary heap, won't that mean that indexing into it will cost I don't see the benefit of using anything other than a dense array, because the way FW works, the only structure you can take advantage of is if you have a disconnected forest - in every other case, you end up with a dense distance matrix. This is a fundamental property of FW which cannot be changed without using much fancier techniques such as fast multipole. |
Oh - sorry for the confusion. I'm talking about Dijkstra, not F-W. The test was for the new typed distance measurements, and it slowed down Dijkstra by 100%. I think I've got it nailed now, though. |
Ok, let me know if there's anything left to discuss in this PR (which is FW only). |
Well, if we can get the typed distance working, then that addresses one of the big issues above. Let me see what I can do and I'll report back. |
Alright - I think I got it. Can you please check out https://github.com/JuliaGraphs/LightGraphs.jl/tree/newdijkstra2 ? (The branch name is a bit of a misnomer - I did the F-W fixes as well - now typed!) |
I'll close this out since I took care of several things in the latest commit, two of which were introduced here:
|
The internal data are now stored as dense matrices instead of Vector{Vector}.
No longer makes the unnecessary transformation from the former to the latter in the computation of floyd_warshall_shortest_paths.
Closes #37