Parallel preprocessing #597

sim642 · 2022-02-16T13:01:20Z

Implements the first half of #589.

The separate calling of Unix.open_process and Unix.close_process was extremely simple, but trying it out on OpenSSL it revealed a problem: it crashed with the Unix error EMFILE (Too many open files by the process). Apparently starting 1009 child processes at the same time isn't a good idea.

Of course ulimit -n could've worked around the problem, but I thought I'd be nice and implement a ProcessPool module, which runs at most jobs subprocesses in parallel.

By default (-j 0), the number of parallel jobs is derived from the number of CPU cores. Otherwise the -j command line argument (which is really the jobs option) controls the limit.

OpenSSL

Trying this on OpenSSL (just preprocessing, no parsing or merging), with -j 1 it takes:

real	0m37,266s
user	0m28,811s
sys	0m8,312s

Whereas with -j 16 it takes:

real	0m5,856s
user	1m4,144s
sys	0m16,108s

So here the preprocessing takes 6.4 times less wall time!

sim642 · 2022-02-16T13:06:14Z

With all this work happening in parallel in child processes and the distinction of CPU vs wall time, measuring this can be problematic.
For example, to get the above numbers I used time from the shell, instead of Stats.time wrapped around preprocess_files because the Stats module only measures CPU time and only in the OCaml process itself, so in the stats output it always appears as if preprocessing takes almost no time (even before this PR).

These notions might also screw with some of our crappy benchmarking scripts. I'm not sure how they all do the measurements.
I just know that at least BenchExec is very rigorous and correctly accounts for the time used by child processes.

sim642 · 2022-02-16T13:23:02Z

The changes here that basic_preprocess and CompilationDatabase.load_and_preprocess don't execute any processes themselves, but only return a ProcessPool.task to be later executed, should make it also easier to implement incremental preprocessing. It should be then possible to simply filter the returned (file, task) pairs to only keep those, where any dependency of the file changed.

src/maingoblint.ml

src/util/goblintutil.ml

src/util/options.schema.json

src/util/compilationDatabase.ml

src/util/processPool.ml

src/maingoblint.ml

sim642 · 2022-02-21T08:37:03Z

I'm now thinking that actually it might be safer to have -j 1 as the default, especially if we want to use multiple jobs for more exotic tricks like #598 as well. I don't think other programs like make implicitly use all cores, but require the user to explicitly choose that.

It would also be nicer for regression testing, because now that script itself will start Goblints on call cores, each of which also wants to run multiple preprocessors, at which point the CPU is overloaded anyway.

sim642 added 3 commits February 16, 2022 14:31

Add parallel preprocessing (issue #589)

77084f0

Add jobs option and -j argument

fafbac7

Get number of jobs automatically

7b24e52

sim642 added the performance Analysis time, memory usage label Feb 16, 2022

sim642 mentioned this pull request Feb 16, 2022

Parallel parsing #598

Closed