Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to limit number of concurrent calls to linker when building with -j #1529

Open
byorgey opened this issue Oct 3, 2013 · 16 comments

Comments

@byorgey
Copy link
Member

byorgey commented Oct 3, 2013

On my eight-core machine, running cabal install -j8 renders the machine virtually unusable, presumably due to trying to run many invocations of the linker in parallel (?). It would be nice to be able to do something like cabal install -j8 --max-linkers=3 so I can compile, download, etc. up to 8 packages in parallel but only have 3 linking phases running at a time.

However, it's also possible that I have misdiagnosed the problem. The real issue, of course, is that I don't want cabal install -j8 to make my machine grind to a halt.

@tibbe
Copy link
Member

tibbe commented Oct 3, 2013

I'd like to first make sure that the linking is actually the problem. Why wouldn't running 8 linkers in parallel on an 8-core machine work?

@23Skidoo
Copy link
Member

23Skidoo commented Oct 3, 2013

Can't you just use -j7, for example?

@byorgey
Copy link
Member Author

byorgey commented Oct 3, 2013

I got this idea from the shake documentation here: http://hackage.haskell.org/package/shake-0.10.7/docs/Development-Shake.html#v:newResource . To quote, "calls to compilers are usually CPU bound but calls to linkers are usually disk bound. Running 8 linkers will often cause an 8 CPU system to grid to a halt." Though I can't say that I fully understand why running 8 disk bound processes would cause the system to grind to a halt.

Re: just using -j7, empirically anything above -j3 or -j4 is just as bad as -j8. The difference between 7 and 8 would not be that bad, but only getting to use 3 of my 8 cores when building packages makes me a sad panda.

@23Skidoo
Copy link
Member

23Skidoo commented Oct 3, 2013

OK, it should be possible to implement --max-linkers using a semaphore.

@byorgey
Copy link
Member Author

byorgey commented Oct 3, 2013

I will take a shot at implementing it (just wanted to get some feedback before I attempted it), and see if it helps. Is there a canonical semaphore library/abstraction that we should use nowadays?

@23Skidoo
Copy link
Member

23Skidoo commented Oct 3, 2013

@byorgey

I have an initial implementation of a minimal semaphore module on this branch: https://github.com/23Skidoo/cabal/commits/ghc-parmake
Doesn't work on Windows yet.

@byorgey
Copy link
Member Author

byorgey commented Oct 3, 2013

Ah, cool. So I should wait for that to get merged in?

@23Skidoo
Copy link
Member

23Skidoo commented Oct 4, 2013

You can use System.Posix.Semaphore in the meanwhile.

@bennofs
Copy link
Collaborator

bennofs commented Oct 6, 2013

Maybe the reason that running 8 linkers in parallel causes the system to crash is that linking often requires a lot of RAM when using GHC? So if that's the case (out of memory + lots of swapping?), it would be better to add an option to not start the linker if there is not much free RAM left?

@byorgey
Copy link
Member Author

byorgey commented Oct 6, 2013

Yes, that could certainly be the case. I will watch the memory usage next time to see if that's what is happening. However, I am not sure your suggestion would work very well --- it seems one could easily get in a situation where a bunch of linkers fire up all at once (because RAM usage is not too high) but then once they are running they exhaust the RAM.

@byorgey
Copy link
Member Author

byorgey commented Nov 23, 2013

Incidentally, after more experience and investigation, I'm pretty sure my inherent problem is not linkers per se but running out of memory, causing my system to start swapping. But if running the linker uses a lot of memory (?) this could still help.

@23Skidoo
Copy link
Member

@byorgey Have you tried my patches (#1572)?

@byorgey
Copy link
Member Author

byorgey commented Nov 24, 2013

@23Skidoo not yet. I'll give them a try soon.

@rrnewton
Copy link
Member

rrnewton commented Feb 5, 2014

I've got a question about the semaphore strategy --

When a worker grabs the semaphor to go into a linking phase, if there are no linker resources available, does the worker block (and thus go idle)? When blocking on one scarce resource it would be nice to continue to work on other available tasks -- compiling, building docs, etc.

But of course we don't want to accomplish that just by oversubscription (e.g. -N(4*P)), so it's nice to keep the # workers at one per core but replace them when they go down. (This is what we did in meta-par for blocking on GPU tasks to complete.)

@ttuegel ttuegel modified the milestones: cabal-install-1.24, cabal-install-1.22 Apr 23, 2015
@23Skidoo 23Skidoo modified the milestones: cabal-install 1.24, cabal-install 1.26 Feb 21, 2016
@23Skidoo 23Skidoo removed their assignment Jul 27, 2016
@ezyang ezyang modified the milestone: cabal-install 2.0 Sep 6, 2016
@hvr
Copy link
Member

hvr commented Dec 1, 2017

NB: We already have plans (see #976 (comment)) to extend the syntax of -j to something like -j n[:m] where n and m denote the cabal process parallelism and GHC's internal parallelism respectively. So we should figure out a syntax that allows us to incorporate the linker parallelism limit as well.

@lspitzner
Copy link
Collaborator

To add to the motivation: When a project has more than one testsuite, and one does a change to the library to a function that is not inlined and re-builds all testsuites, cabal does, in sequence: 1) recompile the changed library parts 2) start relinking all the testsuites at the same instant. In this scenario, I can deterministically observe a rather annoying burst of memory usage.

The steps of a build form a DAG, right? Is this DAG currently explicit in the cabal implementation? I had a look at #1572 and it seems like the "execution" aspects for this DAG are mixed with its construction. But perhaps the PR is outdated anyways (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.