Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install.sh: Perfomance: Use more shell builtins #106

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mbargull
Copy link

Replace echo/grep/cut/dirname/basename by variable substitutions and
case pattern matching to reduce the amount of subprocesses called for
every copied file.

The current echo | grep/echo | cut/dirname/basename invocations slow down the installation process, esp. when components with many files (rust-docs) are processed.
This pull request replaces them with shell builtin substitutions to avoid process calling overhead.
Locally for me, these small changes reduced the install time (for rust-docs only) from around 20 minutes to around 3 minutes.

Replace echo/grep/cut/dirname/basename by variable substitutions and
case pattern matching to reduce the amount of subprocesses called for
every copied file.
eval $v=$val
fi
case "${arg}" in
"--${op}="* )
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is technically a behavioral change because the original code grepped for "--$op" instead of "^--$op" -- testing for the prefix is probably intended, though. Same for the --$option case below.

Since the manifests have sorted file lists, consecutive invocations of
abs_path (make_dir_recursive) likely do redundant work. Avoiding these
redundancies reduces the amount of subprocessing per file further.
Comment on lines +310 to +311
local file_path_dirname="${file_path%/*}"
local file_path_basename="${file_path##*/}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: These dirname/basename substitutions assume more well-formed paths, e.g., no trailing slashes etc.
Let me know if you aren't comfortable with adding these constraints and if I should revert to dirname/basename.

@pietroalbini
Copy link
Member

Thanks for the PR! It might take a bit for me to review the PR, but I'll eventually get it done :)

@fweimer
Copy link

fweimer commented Mar 21, 2021

Does this address the apparent hang after install: installing component 'rust-docs' with the 1.50.0 installer? strace shows many grep invocations with ^etc/, ^bin/, ^lib/ etc.

@mbargull
Copy link
Author

@fweimer, these changes replace all those grep calls with a case construct. So yes, it greatly reduces the observed "apparent hang".

Comment on lines 148 to 151
# Skip if the last invocation of make_dir_recursive had the same argument
if ! [ "$_dir" = "${_make_dir_recursive_cached_key:-}" ]
then
_make_dir_recursive_cached_key="$_dir"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like kind of an odd optimization, it only works if the very last call was the same directory. Have you compared the times with and without it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see commit message 53d557b : input lists are sorted by path, the directories usually contain more than one file and as such this optimization is hit frequently.
I did compare times and the difference was significant enough to warrant the added complexity. However, I did not record any numbers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also significantly reduced the log file size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went AFK for a bit to let this run:

# pwd              
/tmp/rust-1.51.0-x86_64-unknown-linux-gnu
# stat -fc%T /tmp                      
tmpfs
# dst=/tmp/destdir                                                                                                                                                                                                
# for c in 53d557b 3d7ed69 5254dbf ; do curl -sLO "https://raw.githubusercontent.com/mbargull/rust-installer/${c}/install-template.sh" ; rm -rf "${dst}" ; printf %s\\n "${c}" ; time bash install-template.sh --destdir="${dst}" >/dev/null ; find "${dst}" -name \*.log -exec stat -c'%n %s' {} \; ; done
53d557b
install: WARNING: failed to run ldconfig. this may happen when not installing as root. run with --verbose to see the error
real 44.19
user 35.54
sys 11.26
/tmp/destdir/usr/local/lib/%%TEMPLATE_REL_MANIFEST_DIR%%/install.log 11361126
3d7ed69
install: WARNING: failed to run ldconfig. this may happen when not installing as root. run with --verbose to see the error
real 105.68
user 82.46
sys 30.43
/tmp/destdir/usr/local/lib/%%TEMPLATE_REL_MANIFEST_DIR%%/install.log 12935429
5254dbf
install: WARNING: failed to run ldconfig. this may happen when not installing as root. run with --verbose to see the error
real 688.47
user 570.21
sys 207.40
/tmp/destdir/usr/local/lib/%%TEMPLATE_REL_MANIFEST_DIR%%/install.log 12935429

So for me that commit cuts about 60 % of the time on a tmpfs.
(install.log size difference is not that big in this case due to short prefix/destdir. At https://github.com/conda-forge/rust-feedstock we have longer (~ 255 characters) install prefixes, resulting in a much larger log file.)

Copy link
Member

@jyn514 jyn514 Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if you could get almost as much of a speedup by removing the logging instead. It sounds like it takes a while and isn't super useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the logging is just uses the echo builtin which should be able to offer high throughput.
One thing to try is whether we can just use a test -d DIR for make_dir_recursive instead of the memorized logic. (For the abs path function I don't see a similarly performing alternative.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, test -d gives the same run time. I've added two cleanup commits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Never mind the 2nd commit which I force-pushed away. I just forgot why (abs path from file vs dir) I split those functions.)

@@ -625,7 +626,7 @@ install_components() {

maybe_backup_path "$_file_install_path"

if echo "$_file" | grep "^bin/" > /dev/null || test -x "$_src_dir/$_component/$_file"
if test -z "${_file##bin/*}" || test -x "$_src_dir/$_component/$_file"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test -z "${_file##bin/*}" unintentionally accepts the empty string; [ "${_file#bin/}" != "$_file" ] would be a slightly more faithful translation. (Presumably this doesn’t really matter.)

Or there’s [[ "$_file" = bin/* ]] since this is #!/bin/bash, although the script seems to otherwise avoid bashisms so I guess that avoidance should be preserved.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${_file} is never empty here because of https://github.com/rust-lang/rust-installer/pull/106/files#diff-ec68db39ae4ea5bfe559a47a8a880d5d383dfa81ba067652d2adfcc3c0cd2a17R571 .
So,

  1. if test -z "${_file##bin/*}" || ...
  2. if [ "${_file#bin/*}" != "${_file}" ] || ...
  3. if case "${_file}" in bin/*) ;; *) false ; esac || ... (POSIX-y [[ "${_file}" = bin/*)
    should all be equivalent here.
    If someone has a strong preference for any of those options (I don't), then I'm happy to change it accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants