Skip to content

Conversation

@sim642
Copy link
Member

@sim642 sim642 commented Feb 16, 2022

Implements the first half of #589.

The separate calling of Unix.open_process and Unix.close_process was extremely simple, but trying it out on OpenSSL it revealed a problem: it crashed with the Unix error EMFILE (Too many open files by the process). Apparently starting 1009 child processes at the same time isn't a good idea.

Of course ulimit -n could've worked around the problem, but I thought I'd be nice and implement a ProcessPool module, which runs at most jobs subprocesses in parallel.

By default (-j 0), the number of parallel jobs is derived from the number of CPU cores. Otherwise the -j command line argument (which is really the jobs option) controls the limit.

OpenSSL

Trying this on OpenSSL (just preprocessing, no parsing or merging), with -j 1 it takes:

real	0m37,266s
user	0m28,811s
sys	0m8,312s

Whereas with -j 16 it takes:

real	0m5,856s
user	1m4,144s
sys	0m16,108s

So here the preprocessing takes 6.4 times less wall time!

@sim642 sim642 added the performance Analysis time, memory usage label Feb 16, 2022
@sim642
Copy link
Member Author

sim642 commented Feb 16, 2022

With all this work happening in parallel in child processes and the distinction of CPU vs wall time, measuring this can be problematic.
For example, to get the above numbers I used time from the shell, instead of Stats.time wrapped around preprocess_files because the Stats module only measures CPU time and only in the OCaml process itself, so in the stats output it always appears as if preprocessing takes almost no time (even before this PR).

These notions might also screw with some of our crappy benchmarking scripts. I'm not sure how they all do the measurements.
I just know that at least BenchExec is very rigorous and correctly accounts for the time used by child processes.

@sim642
Copy link
Member Author

sim642 commented Feb 16, 2022

The changes here that basic_preprocess and CompilationDatabase.load_and_preprocess don't execute any processes themselves, but only return a ProcessPool.task to be later executed, should make it also easier to implement incremental preprocessing. It should be then possible to simply filter the returned (file, task) pairs to only keep those, where any dependency of the file changed.

@sim642 sim642 mentioned this pull request Feb 16, 2022
@sim642
Copy link
Member Author

sim642 commented Feb 21, 2022

I'm now thinking that actually it might be safer to have -j 1 as the default, especially if we want to use multiple jobs for more exotic tricks like #598 as well. I don't think other programs like make implicitly use all cores, but require the user to explicitly choose that.

It would also be nicer for regression testing, because now that script itself will start Goblints on call cores, each of which also wants to run multiple preprocessors, at which point the CPU is overloaded anyway.

@sim642 sim642 merged commit b11a60c into master Feb 21, 2022
@sim642 sim642 deleted the parallel-preprocess branch February 21, 2022 15:12
@sim642 sim642 added the parallel Parallel Goblint label Mar 16, 2022
@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parallel Parallel Goblint performance Analysis time, memory usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants