Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remerging into DataStructures #46

Open
oxinabox opened this issue May 25, 2020 · 8 comments
Open

Remerging into DataStructures #46

oxinabox opened this issue May 25, 2020 · 8 comments

Comments

@oxinabox
Copy link
Member

I have been thinking that at some point we should merge OrderedCollections back into DataStructures.jl
The reason to seperate it out was to decrease size of the dependencies that Revise.jl has so it loads faster.
But now Revise has a JuliaInterpreter dependency, which is much much bigger than DataStructures ever will be.

We should work out what the requirements to do that is.
The only one I can think of is the DataStructures.jl needs to hit 1.0.
which is basically constrained by checking the API has gotten rid of stuff that is hanging on from Julia 0.3.

Having them in sperate packages causes problems like in #39

@timholy
Copy link
Member

timholy commented Jun 16, 2020

Let's compare load times, but your plan sounds sensible.

@timholy
Copy link
Member

timholy commented Jun 16, 2020

On Julia 1.6 with the revise3 branch:

tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using OrderedCollections
  0.047111 seconds (49.33 k allocations: 3.746 MiB)

julia> ^D
tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using DataStructures
  0.211489 seconds (245.43 k allocations: 14.640 MiB)

julia> ^D
tim@diva:~/.julia/dev/ImageFiltering/test$ julia-master --startup-file=no -q
julia> @time using Revise
  0.555985 seconds (775.29 k allocations: 50.752 MiB, 2.27% gc time)

It's not a trivial part of the startup time. But perhaps integrated into Revise the times would be different?

@timholy
Copy link
Member

timholy commented Jun 16, 2020

I could also consider building a "private copy" of OrderedSet. I've been meaning to try to understand whether there's a workaround that would make setindex! precompilable.

@oxinabox
Copy link
Member Author

On julia 1.6 I get similar timings to you for how long using OrderedCollections and using DataStructures takes.

The more intereting thing is the between Julia1.3 and Julia 1.6-Dev
the time taken for using JuliaInterpretter has halved, but the time taken for using DataStructures has stayed the same.
So where as before the time taken for loading JuliaInterpretter made the time taken for loading DataStructures pale in comparison, it no longer does.
Its pretty impressive that JuliaInterpretter loads so fast, I wonder if we are doing something bad in DataStructures that makes it so slow? and why it doesn't get the speedups JuliaInterpretter sees on load times.

@timholy
Copy link
Member

timholy commented Jun 17, 2020

It's likely to be because of my invalidation-fixing work. I'm starting to see pretty striking improvements for packages that were formerly causing trouble. JuliaInterpreter was slowed down because it (and CodeTracking.jl) invalidated some of Base's & Pkg's package-loading code. That's not happening anymore on recent Julia 1.6-DEV and recent releases of JuliaInterpreter.

Unfortunately for DataStructures, I don't expect to be able to deliver the same benefits, because it's already in pretty good shape with respect to invalidations:

julia> using SnoopCompile

julia> trees = invalidation_trees(@snoopr using DataStructures)
2-element Array{SnoopCompile.MethodInvalidations,1}:
 inserting eltype(::Type{CircularBuffer{T}}) where T in DataStructures at /home/tim/.julia/packages/DataStructures/GvsTk/src/circular_buffer.jl:159 invalidated:
   backedges: 1: superseding eltype(::Type{var"#s91"} where var"#s91"<:(AbstractArray{E,N} where N)) where E in Base at abstractarray.jl:152 with MethodInstance for eltype(::Type{A} where A<:AbstractArray{Bool,1}) (5 children) more specific
   32 mt_cache

 inserting iterate(v::Base.ValueIterator{T}, i::Int64) where T<:DataStructures.DefaultDictBase in DataStructures at /home/tim/.julia/packages/DataStructures/GvsTk/src/default_dict.jl:64 invalidated:
   backedges: 1: superseding iterate(v::Union{Base.KeySet, Base.ValueIterator}, state...) in Base at abstractdict.jl:59 with MethodInstance for iterate(::Base.ValueIterator, ::Any) (2 children) more specific
              2: superseding iterate(v::T, i::Int64) where T<:Union{Base.KeySet{var"#s88",var"#s87"} where var"#s87"<:Dict where var"#s88", Base.ValueIterator{var"#s86"} where var"#s86"<:Dict} in Base at dict.jl:678 with MethodInstance for iterate(::Base.ValueIterator{var"#s89"} where var"#s89"<:Dict, ::Int64) (8 children) ambiguous
   14 mt_cache

That's actually a bit of a lie due to timholy/SnoopCompile.jl#95, but the reality is not much worse (you can see for yourself by doing the "staged" @mysnoopr as illustrated in that issue). See SnoopCompile's docs to understand what all this means.

That doesn't mean that DataStructure's loading time can't be made better, but if there are gains to be had they will have to come from some other source.

@KristofferC will be happy to hear that JuliaInterpreter is starting to not be a giant anchor anymore.

@timholy
Copy link
Member

timholy commented Sep 9, 2020

Julia 1.6 has gotten even better, so it's time for an update:

julia> @time using OrderedCollections
  0.027468 seconds (25.76 k allocations: 1.843 MiB)

julia> 
tim@diva:~$ julia-master -q --startup-file=no
julia> @time using DataStructures 
  0.078993 seconds (100.80 k allocations: 7.651 MiB)

julia> 
tim@diva:~$ julia-master -q --startup-file=no
julia> @time using Revise
  0.418124 seconds (712.14 k allocations: 48.093 MiB)

While it would be slightly painful given how much I've worked to reduce latency, I'd be willing to put up with another 50ms. So I don't think Revise needs to be a blocker for this reunification.

@timholy
Copy link
Member

timholy commented Sep 9, 2020

(Thanks for the interesting blog post, @eulerkochy!)

@PallHaraldsson
Copy link
Contributor

What blog post is that? I was aware of his V language, but not his involvement with Julia. And V seems very interesting while the opposite of Julia... seems similar to Zig, not sure which is the better replacement for C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants