Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move out SparseArrays and SuiteSparse from the sysimage #44247

Merged
merged 3 commits into from
Jun 21, 2022

Conversation

KristofferC
Copy link
Member

@KristofferC KristofferC commented Feb 18, 2022

cc @ViralBShah

julia> @time using SparseArrays
  1.821477 seconds (2.71 M allocations: 300.647 MiB, 0.32% compilation time)

julia> @time using SuiteSparse
  0.242231 seconds (228.92 k allocations: 18.472 MiB, 40.21% compilation time)

julia> @time using Statistics
  0.002257 seconds (4.08 k allocations: 428.594 KiB)

@KristofferC
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@ViralBShah
Copy link
Member

ViralBShah commented Feb 18, 2022

Those times look quite ok. And probably can receive a bit of love on improving compile time once we move out of sysimage. Does it improve Julia startup time slightly?

@ViralBShah ViralBShah added the excision Removal of code from Base or the repository label Feb 18, 2022
@ViralBShah ViralBShah marked this pull request as draft February 18, 2022 23:17
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@fredrikekre
Copy link
Member

A common failure mode seems to be MethodError for missing factorizations on SparseMatrixCSC.

@KristofferC
Copy link
Member Author

Ugh, this further shows that SparseArrays and SuiteSparse should not have been separated into different packages. SuiteSparse does type piracy between \ and SparseArrayCSC.

@ViralBShah
Copy link
Member

The reason we split them was so that they could be moved out one at a time - and possibly across releases. I am ok with folding back SuiteSparse.jl into SparseArrays.jl, now that they are all out into separate repos, and would be happier if we can move them out of stdlibs altogether.

@ViralBShah
Copy link
Member

ViralBShah commented Mar 11, 2022

Can we do this early in the 1.9 release? I suggest moving them out of stdlibs altogether. Perhaps worthy of a triage discussion?

@ViralBShah ViralBShah added the triage This should be discussed on a triage call label Mar 11, 2022
@DilumAluthge
Copy link
Member

The earlier we move it out, the longer we have to test it in the wild before the 1.9 release.

I also think we need to advertise this change broadly across multiple fora, including (but not limited to) Discourse, Slack, and Zulip. And again, the sooner we make the announcement, the more time people have to fix their code.

@KristofferC
Copy link
Member Author

KristofferC commented Mar 12, 2022

Well, the fact that SuiteSparse does type piracy makes moving them out difficult. I guess we could start with Statistics.

@ViralBShah
Copy link
Member

Why is the type piracy a problem? Can't we just merge suitesparse into the sparsearrays package if that is a concern?

@JeffBezanson
Copy link
Member

From triage: ok to merge SuiteSparse into SparseArrays (or make SparseArrays depend on SuiteSparse?). Move 'em out! 😄

@ViralBShah
Copy link
Member

ViralBShah commented Mar 17, 2022

If making SparseArrays depend on SuiteSparse solves it, that would be the preferred solution. @KristofferC I'm always not sure about these stdlib things - is this something you can do or suggest what needs doing if it is out of the ordinary?

cc @Wimmerer

@rayegun
Copy link
Member

rayegun commented Mar 30, 2022

What remains to be done here? To solve dependency between SuiteSparse and SparseArrays? The obvious way to me is SuiteSparse depending on SA, even if that's not ideal for other sparse packages. E: This is already the case, the issue is using SA won't pull in SuiteSparse for solves.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Mar 31, 2022
@ViralBShah
Copy link
Member

ViralBShah commented Apr 1, 2022

What we need to do is to ensure that SuiteSparse.jl is always installed when SparseArrays.jl is and ensure that this PR passes.

This dependency can be removed in the future, but since it is a part of Julia 1.0, we have no choice but to enforce that dependency if we want to move things out of stdlib.

@ViralBShah
Copy link
Member

@KristofferC Is the simplest solution here to make SparseArrays.jl depend upon SuiteSparse.jl - to ensure that both are always installed together?

@ViralBShah
Copy link
Member

For Statistics, we do want to get it out of the system image and off in to its own package, but it will be odd for mean and std not in base. So perhaps there's some refactoring to be done there. Maybe we should separate out the PR for Statistics?

@brenhinkeller
Copy link
Contributor

brenhinkeller commented Jun 6, 2022

For Statistics, we do want to get it out of the system image and off in to its own package, but it will be odd for mean and std not in base.

Well actually, while I'm not sure I really like the trend of moving everything out of base and stdlibs all the time, this actually might be a silver lining.

My main "elegance" gripe re the last time things got spun out (1.0) was that the statistics functionality kinda got fractured, with some things in Statistics and some things in StatsBase.

So if the overall trend is going to continue (and we can't put StatsBase back into Statistics) then maybe the better option actually is to rip off the bandaid, move everything out including mean and std, spin off Statistics as a separate package and then re-merge that with StatsBase.

@nalimilan
Copy link
Member

@brenhinkeller This is essentially what we'd like to do. AFAIK the only remaining debate is whether mean and std should be moved to Base when moving Statistics out to a separate package. I'd also prefer keeping them in Statistics, as they have a lot in common with other stats functions (notably the presence of methods taking weights vectors; std and var cannot be separated anyway).

@ViralBShah
Copy link
Member

The last time we tried to move mean and std out of base, the crowd came at us with pitchforks. I would not want to face that crowd again! Can't we just have mean and std for basic arrays in base with all the fancy versions in Statistics? If we are moving them out of base, it would be worth taking a pulse on discourse.

@nalimilan
Copy link
Member

Having the basic version in Base and weighted versions in Statistics would certainly be possible (currently they are split between Statistics and StatsBase). It would be a bit more tricky if we want to use a keyword argument (though possible using dispatch under the hood). But I would have hoped this change would be the occasion simplifying the organization of stats functions.

Also std is implemented by calling var, so if we put the former in Base we should probably also move the latter.

Anyway yes, asking for feedback on Discourse is a good idea.

@ViralBShah
Copy link
Member

ViralBShah commented Jun 18, 2022

Continuing discussion in JuliaSparse/SparseArrays.jl#147

@ViralBShah
Copy link
Member

The SparseArrays and SuiteSparse bumps are in (SS is now incorporated in SA and is an empty package).

@ViralBShah ViralBShah closed this Jun 20, 2022
@ViralBShah ViralBShah reopened this Jun 20, 2022
@rayegun
Copy link
Member

rayegun commented Jun 20, 2022

Are these failures related to us? That error message doesn't seem related to me.

@ViralBShah
Copy link
Member

Don't seem related, but what is the actual error? I'm missing it - I thought it is getting stuck in the building system image phase and timing out on 32-bit win and linux.

@rayegun
Copy link
Member

rayegun commented Jun 20, 2022

Well I can't actually tell either, the linux32 one was failing a minute ago I thought (that seemed like a codegen error or something). For win32 I don't actually see any errors other than exit code 1.

@ViralBShah
Copy link
Member

I restarted the linux32 buildbot.

@ViralBShah
Copy link
Member

ViralBShah commented Jun 20, 2022

At line 1232 in https://build.julialang.org/#/builders/29/builds/7561/steps/7/logs/stdio:

Generating REPL precompile statements... 37/40
Generating REPL precompile statements... 38/40
Generating REPL precompile statements... 39/40
Generating REPL precompile statements... 40/40

command interrupted, attempting to kill

Executing precompile statements... 1/1952
Executing precompile statements... 2/1952
Executing precompile statements... 3/1952
Executing precompile statements... 4/1952
Executing precompile statements... 5/1952
Executing precompile statements... 6/1952

@rayegun
Copy link
Member

rayegun commented Jun 21, 2022

I see a timeout this time. Will splitting the tests into individual files help with that? Or was that just for memory purposes. It's failing on win64 this time rather than win32 like last time...

@ViralBShah
Copy link
Member

ViralBShah commented Jun 21, 2022

But this one is actually in running the tests rather than system image building. In fact the sparse tests have all completed. I doubt there are memory issues on 64-bit.

@ViralBShah
Copy link
Member

Rerunning the win64 tester.

@ViralBShah ViralBShah merged commit 18c7e37 into master Jun 21, 2022
@ViralBShah ViralBShah deleted the kc/sparse_sysiamge branch June 21, 2022 12:53
@ViralBShah
Copy link
Member

Let's see how this goes.

@KristofferC
Copy link
Member Author

Would maybe have been good to run PkgEval again.

@ViralBShah
Copy link
Member

ViralBShah commented Jun 21, 2022

I thought "What could possibly go wrong" since we had already run it earlier. I am sure famous last words.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jun 25, 2022

Since SuiteSparse (which is GPL) is moved out, it means the sysimage is now non-GPL, i.e. MIT only? And thus "Julia"? It's still a stdlib I believe, for now, and I'm not sure it changes the question if Julia should now be classified as non-GPL. I believe the plan is it will not even be a stdlib, like DelimitedFiles isn't by now.

This is a reminder to update GPL status, if I'm actually right, and not too soon, if it was forgotten.

@nalimilan
Copy link
Member

AFAICT SuiteSparse was the only GPL library we link to (see julia.spdx.json), though gmp and mpfr are still LGPL, so the compiled Julia will be LGPL once we move SuiteSparse out of the stdlib.

@johnnychen94
Copy link
Member

It looks like moving out SparseArrays from sysimage significantly increases the package loading time because many small root dependencies use Statistics and SparseArrays with the assumption that they are loaded very fast.

package v1.8.0-rc1 v1.9-dev.843
FixedPointNumbers 0.05s 1.68s
ArrayInterface 0.04s 1.67s
Distances 0.008s 1.65s
ImageCore 0.9s 2.9s

I kindly request some efforts into JuliaLang/Pkg.jl#1285 so that people don't need to break packages into smaller pieces (say, ArrayInterfaceCoreSparseArrays) or use Requires.

@rayegun
Copy link
Member

rayegun commented Jul 2, 2022

That is a gigantic slowdown... I had planned to work on conditional deps /glue deps at some point but became sidetracked.

After the 11th I can work towards that again, but that's a pretty large feature at this stage of 1.9 I think.

@ViralBShah
Copy link
Member

Maybe precompile statements for these packages once they become regular packages will greatly help?

@KristofferC
Copy link
Member Author

This is loading time so that won't help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
excision Removal of code from Base or the repository sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.