Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670

huonw · 2024-03-14T10:47:12Z

This has all internal PEXes be built with settings to improve performance:

with --no-pre-install-wheels, to package .whl directly rather than unpack and install them. (NB. this requires Pex 2.3.0 to pick up Guard against mismatched --requirements-pex. pex-tool/pex#2392)
with PEX_MAX_INSTALL_JOBS, to use more concurrency for install, when available

This is designed to be a performance improvement for any processing where Pants synthesises a PEX internally, like pants run path/to/script.py or pants test .... pex-tool/pex#2292 has benchmarks for the PEX tool itself.

For benchmarks, I did some more purposeful ones with tensorflow (PyTorch seems a bit awkward hard to set-up and Tensorflow is still huge), using https://gist.github.com/huonw/0560f5aaa34630b68bfb7e0995e99285 . I did 3 runs each of two goals, with 2.21.0.dev4 and with PANTS_SOURCE pointing to this PR, and pulled the numbers out by finding the relevant log lines:

pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) test example_test.py. This involves building 4 separate PEXes partially in parallel, partially sequentially: requirements.pex, local_dists.pex pytest.pex, and then pytest_runner.pex. The first and last are the interesting ones for this test.
pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d) run script.py. This just builds the requirements into script.pex.

(NB. these are potentially unrealistic in they're running with all caching turned off or cleared, so are truly a worst case. This means they're downloading tensorflow wheels and all the others, each time, which takes about 30s on my 100Mbit/s connection. Faster connections will thus see a higher ratio of benefit.)

goal	period	before (s)	after (s)
`run script.py`	building requirements	74-82	49-52
`test some_test.py`	building requirements	67-71	30-36
	building pytest runner	8-9	17-18
	total to start running tests	76-80	53-58

I also did more adhoc ones on a real-world work repo of mine, which doesn't use any of the big ML libraries, just running some basic goals once.

goal	period	before (s)	after (s)
`pants export` on largest resolve	building requirements	66	35
	total	82	54
"random" `pants test path/to/file.py` (1 attempt)	building requirements and pytest runner	49	38

Two explicit questions for review:

The --no-pre-install-wheels flag behaviour isn't explicitly tested... but maybe it should be, i.e. validate we're passing this flag for internal PEXes. Thoughts?
Setting PEX_MAX_INSTALL_JOBS as an env var may result in fewer cache hits, e.g. pants test path/to/file.py may not be able to reuse caches from pants test :: (and vice versa) because the available concurrency is different, and thus this may be better to do differently. Thoughts?
- for instance, leave it as it was (the default 1)
- set to 0 to have each PEX process do its own concurrency, e.g. based on number of CPU cores and wheels. Multiple processes in parallel risk overloading the machine since they don't coordinate, though.

Fixes #15062

huonw · 2024-04-03T04:04:42Z

Splitting the local dists change to #20746, since that seems like it might cause independent problems, so handy to have it out, for bisection.

This marks all local dist PEXes as internal-only, removing the ability for them to be anything but internal. This is almost true already, except for PEXes built via `PexFromTargetsRequest`, where the local dists PEX used for building the "real" PEX has the same internal status as that real PEX. In this case, the local dists PEX still isn't surfaced to users, so it's appropriate for that one to be internal too. This will probably be slightly faster in isolation (building a `pex_binary` that uses in-repo `python_distribution`s will be able to just copy them around with less zip-file manipulation, more often, by creating packed-layout PEXes). However, the real motivation is unblocking #20670, where having this PEX built with `--no-pre-install-wheels` (as internal-only PEXes will, by default) is required to support downstream PEXes using that argument, at least until pex-tool/pex#2299 is fixed. NB. there's still a separate consideration of changing how local dists are incorporated, which isn't changed or considered here: pex-tool/pex#2392 (comment)

benjyw

Nice wins!

cburroughs · 2024-04-15T13:39:18Z

src/python/pants/backend/python/util_rules/pex.py

@@ -1157,6 +1163,9 @@ async def setup_pex_process(request: PexProcess, pex_environment: PexEnvironment
    complete_pex_env = pex_environment.in_sandbox(working_directory=request.working_directory)
    argv = complete_pex_env.create_argv(pex.name, *request.argv)
    env = {
+        # Set this in case this PEX was built with --no-pre-install-wheels, and thus parallelising
+        # the install on cold boot is handy.
+        "PEX_MAX_INSTALL_JOBS": str(request.concurrency_available),


I think this is what you were getting at in your notes, but as a rough estimate if I have 16 cores this will potentially thrash into 16 different cache entries, right?

Yeah, in theory... depending on what else is running at the same time.

However, on closer examination prompted by your comment: I think I did this wrong:

concurrency_available is configuration to tell the command runner how much concurrency would be useful for the process (e.g. "I'm processing 34 files, and could do so in parallel" => concurrency available = 34), not necessary how much it is actually allocated.

That allocation is communicated to the process by replacing the {pants_concurrency} template in its argv (to be used like ["some-process", ..., "--jobs={pants_concurrency}"] or similar).

This replacement currently only seems to work on the argv, not env vars.

For myself, relevant code (and surrounds), of the concurrency_available request flowing through to the replacement:

pants/src/rust/engine/process_execution/src/bounded.rs

Lines 75 to 117 in b7b0e9c

let semaphore_acquisition = self.sema.acquire(process.concurrency_available);

let permit = in_workunit!(

"acquire_command_runner_slot",

// TODO: The UI uses the presence of a blocked workunit below a parent as an indication that

// the parent is blocked. If this workunit is filtered out, parents nodes which are waiting

// for the semaphore will render, even though they are effectively idle.

//

// https://github.com/pantsbuild/pants/issues/14680 will likely allow for a more principled

// solution to this problem, such as removing the mutable `blocking` flag, and then never

// filtering blocked workunits at creation time, regardless of level.

Level::Debug,

|workunit| async move {

let _blocking_token = workunit.blocking();

semaphore_acquisition.await

}

)

.await;

loop {

let mut process = process.clone();

let concurrency_available = permit.concurrency();

log::debug!(

"Running {} under semaphore with concurrency id: {}, and concurrency: {}",

process.description,

permit.concurrency_slot(),

concurrency_available,

);

// TODO: Both of these templating cases should be implemented at the lowest possible level:

// they might currently be applied above a cache.

if let Some(ref execution_slot_env_var) = process.execution_slot_variable {

process.env.insert(

execution_slot_env_var.clone(),

format!("{}", permit.concurrency_slot()),

);

}

if process.concurrency_available > 0 {

let concurrency = format!("{}", permit.concurrency());

let mut matched = false;

process.argv = std::mem::take(&mut process.argv)

.into_iter()

.map(

|arg| match CONCURRENCY_TEMPLATE_RE.replace_all(&arg, &concurrency) {

So, I'm gonna revert this part of the change. Thanks.

#20795

This fixes/removes the use of `PEX_MAX_INSTALL_JOBS` when running a PEX. This was added in #20670 in an attempt to sync with `--no-pre-install-wheels` and do more work in parallel when booting "internal" PEXes... but it was implemented incorrectly. This would need to be set to `PEX_MAX_INSTALL_JOBS={pants_concurrency}` or similar, and: - that substitution isn't currently supported (only argv substitutions) - it's somewhat unclear if we even want to do that at all, as it'll result in more cache misses Noticed in #20670 (comment)

huonw added the category:performance label Mar 14, 2024

This comment was marked as resolved.

Sign in to view

huonw force-pushed the huonw/15062-pre-install branch from c0702c0 to 668449b Compare March 24, 2024 05:53

huonw force-pushed the huonw/15062-pre-install branch from 668449b to 8766e93 Compare April 3, 2024 00:25

huonw changed the title ~~Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for pants pexes~~ Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for internal pexes Apr 3, 2024

huonw mentioned this pull request Apr 3, 2024

Make all "local dists" PEXes internal-only #20746

Merged

huonw added 2 commits April 4, 2024 09:30

Set --no-pre-install-wheels on internal pexes

01916cc

Set PEX_MAX_INSTALL_JOBS on pants invocations of pexes

e9c5ec1

huonw force-pushed the huonw/15062-pre-install branch from 8766e93 to e9c5ec1 Compare April 3, 2024 22:31

Require PEX 2.3.0 for bugfix

6e462a7

huonw marked this pull request as ready for review April 5, 2024 06:28

huonw requested review from cburroughs, kaos and benjyw April 5, 2024 06:28

benjyw approved these changes Apr 5, 2024

View reviewed changes

huonw changed the title ~~Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for internal pexes~~ Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes Apr 6, 2024

huonw merged commit 00f5d69 into main Apr 15, 2024
24 checks passed

huonw deleted the huonw/15062-pre-install branch April 15, 2024 02:51

cburroughs reviewed Apr 15, 2024

View reviewed changes

huonw mentioned this pull request Apr 16, 2024

Avoid passing PEX_MAX_INSTALL_JOBS incorrectly to PEX processes #20795

Merged

purajit mentioned this pull request Jun 26, 2024

Bump default terraform version to 1.9.0, add all known terraform versions inbetween #21110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670

Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670

huonw commented Mar 14, 2024 •

edited

Loading

This comment was marked as resolved.

huonw commented Apr 3, 2024

benjyw left a comment

cburroughs Apr 15, 2024

huonw Apr 16, 2024

	let semaphore_acquisition = self.sema.acquire(process.concurrency_available);
	let permit = in_workunit!(
	"acquire_command_runner_slot",
	// TODO: The UI uses the presence of a blocked workunit below a parent as an indication that
	// the parent is blocked. If this workunit is filtered out, parents nodes which are waiting
	// for the semaphore will render, even though they are effectively idle.
	//
	// https://github.com/pantsbuild/pants/issues/14680 will likely allow for a more principled
	// solution to this problem, such as removing the mutable `blocking` flag, and then never
	// filtering blocked workunits at creation time, regardless of level.
	Level::Debug,
	\|workunit\| async move {
	let _blocking_token = workunit.blocking();
	semaphore_acquisition.await
	}
	)
	.await;

	loop {
	let mut process = process.clone();
	let concurrency_available = permit.concurrency();
	log::debug!(
	"Running {} under semaphore with concurrency id: {}, and concurrency: {}",
	process.description,
	permit.concurrency_slot(),
	concurrency_available,
	);

	// TODO: Both of these templating cases should be implemented at the lowest possible level:
	// they might currently be applied above a cache.
	if let Some(ref execution_slot_env_var) = process.execution_slot_variable {
	process.env.insert(
	execution_slot_env_var.clone(),
	format!("{}", permit.concurrency_slot()),
	);
	}
	if process.concurrency_available > 0 {
	let concurrency = format!("{}", permit.concurrency());
	let mut matched = false;
	process.argv = std::mem::take(&mut process.argv)
	.into_iter()
	.map(
	\|arg\| match CONCURRENCY_TEMPLATE_RE.replace_all(&arg, &concurrency) {

Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670

Set --no-pre-install-wheels and PEX_MAX_INSTALL_JOBS for faster builds of internal pexes #20670

Conversation

huonw commented Mar 14, 2024 • edited Loading

This comment was marked as resolved.

huonw commented Apr 3, 2024

benjyw left a comment

Choose a reason for hiding this comment

cburroughs Apr 15, 2024

Choose a reason for hiding this comment

huonw Apr 16, 2024

Choose a reason for hiding this comment

huonw commented Mar 14, 2024 •

edited

Loading