Skip to content

compressDrv and compressDrvWeb follow up#332752

Merged
philiptaron merged 10 commits intoNixOS:masterfrom
SuperSandro2000:compressDrv
Aug 19, 2024
Merged

compressDrv and compressDrvWeb follow up#332752
philiptaron merged 10 commits intoNixOS:masterfrom
SuperSandro2000:compressDrv

Conversation

@SuperSandro2000
Copy link
Member

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presence of extraArgs means we will never be able to replace find with something else, as this broadens up the interface up to the point where we cannot reason how it works.

AFAIK from your comment earlier, would you like to exclude some extensions? If yes, how about excludePatterns or so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to exclude some paths but good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I am using right now

package = pkgs.compressDrvWeb pkgs.nextcloud29 {
  excludePatterns = [ ".*(\/apps\/.*\/l10n\/).*" ];
};

@motiejus
Copy link
Contributor

motiejus commented Aug 6, 2024

I am a bit unsure about including zstd as default:

  • compression level is unclear. zopfi and brotli default to max, but with zstd it's different -- should we use 3? 6? 19? 22? Ultra? Higher decompression levels require more memory.
  • I have read somewhere (can't find links handy) that for web server case, zstd may not be as useful as brotli due to longer decompression speed, but I may be wrong here.

@ofborg ofborg bot requested review from erikarvstedt, gador, leona-ya and lukegb August 6, 2024 18:06
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Aug 6, 2024
@motiejus
Copy link
Contributor

motiejus commented Aug 7, 2024

I have read somewhere (can't find links handy) that for web server case, zstd may not be as useful as brotli due to longer decompression speed, but I may be wrong here.

I spent some time looking into this, it is pure FUD. I should know better than that before writing such statements. Sorry.

I will happily accept zstd.

@SuperSandro2000
Copy link
Member Author

  • compression level is unclear. zopfi and brotli default to max, but with zstd it's different -- should we use 3? 6? 19? 22? Ultra? Higher decompression levels require more memory.

We don't set any levels on the other compression tools, just stick to the default which should be good enough.

I have read somewhere (can't find links handy) that for web server case, zstd may not be as useful as brotli due to longer decompression speed, but I may be wrong here.

That is already a problem with fzip/brotli. Sometimes one over the other is smaller/faster to decompress. I am not sure if nginx has some options to choose one over the other. It is all controlled by the Accept-Encoding header.

@SuperSandro2000 SuperSandro2000 marked this pull request as ready for review August 7, 2024 14:47
@SuperSandro2000 SuperSandro2000 force-pushed the compressDrv branch 3 times, most recently from 596d1c6 to c63eaac Compare August 7, 2024 17:14
@motiejus
Copy link
Contributor

motiejus commented Aug 8, 2024

  • compression level is unclear. zopfi and brotli default to max, but with zstd it's different -- should we use 3? 6? 19? 22? Ultra? Higher decompression levels require more memory.

We don't set any levels on the other compression tools, just stick to the default which should be good enough.

zopfli and brotli default to max compression level, because they are designed for fast decompression. zstd is balancing both, so it's default is very low.

Here is brotli vs zstd -19 from my last experiment (compression ratio is only 4% different):

Command Mean [ms] Min [ms] Max [ms] Relative
./zstd-19 10.7 ± 0.5 9.7 14.6 1.00
./brotli 15.1 ± 0.5 14.2 18.2 1.41 ± 0.08
Benchmark 1 (460 runs): ./zstd-19
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.8ms ±  465us    9.94ms … 14.0ms         19 ( 4%)        0%
  peak_rss           12.5MB ± 86.9KB    12.2MB … 12.7MB          1 ( 0%)        0%
  cpu_cycles         30.6M  ±  919K     29.7M  … 36.3M          41 ( 9%)        0%
  instructions       88.4M  ± 12.6K     88.4M  … 88.5M          11 ( 2%)        0%
  cache_references   1.80M  ± 22.2K     1.76M  … 2.01M          26 ( 6%)        0%
  cache_misses        188K  ± 3.66K      179K  …  222K           7 ( 2%)        0%
  branch_misses       346K  ± 1.02K      344K  …  352K          24 ( 5%)        0%
Benchmark 2 (328 runs): ./brotli
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          15.2ms ±  488us    14.3ms … 18.9ms         22 ( 7%)        💩+ 40.5% ±  0.6%
  peak_rss           12.0MB ± 91.4KB    11.8MB … 12.2MB          0 ( 0%)        ⚡-  3.7% ±  0.1%
  cpu_cycles         52.6M  ±  575K     51.6M  … 58.9M          11 ( 3%)        💩+ 71.7% ±  0.4%
  instructions        101M  ± 13.1K      101M  …  101M          10 ( 3%)        💩+ 14.5% ±  0.0%
  cache_references   1.95M  ± 75.4K     1.92M  … 3.25M          10 ( 3%)        💩+  8.3% ±  0.4%
  cache_misses        161K  ± 2.37K      153K  …  180K           5 ( 2%)        ⚡- 14.7% ±  0.2%
  branch_misses       898K  ±  869       895K  …  902K           8 ( 2%)        💩+159.6% ±  0.0%

As far as API goes, I think I may reverse the direction: the possible incarnations of find are unbounded, so wrapping them all can be futile. Please give me some time to think/experiment about it. I would like to see how a find-centric file filtering approach throughout would look like.

@motiejus
Copy link
Contributor

motiejus commented Aug 9, 2024

So I did some prototyping. Adding extra parameters for all possible find permutations is indeed bad and unsustainable: we won't be able to support all of them. For example, the current excludePatterns approach implicitly assumes case-insensitivity, but the name of the parameter does not.

Let's roll back to https://github.com/NixOS/nixpkgs/blob/994006d4896ba78acc2c36c2e88921b9a576f625/pkgs/build-support/compress-drv/default.nix#L57, rename extraArgs to extraFindOperands (so it's clear what the args are for) and proceed from there. Sorry for the back-end-forth.

Also, let's switch to zstd -19: we will be inline with brotli for compression ratio, but significantly less resource intensive.

@philiptaron
Copy link
Contributor

I don't understand the need for extraFindOperands given the current usecase.

@motiejus
Copy link
Contributor

I don't understand the need for extraFindOperands given the current usecase.

#239198

Tldr: it needs to exclude vendor directory.

Copy link
Contributor

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only not approving because nix-build -A gitea.data-compressed fails for me. It also fails on master!

motiejus added a commit to motiejus/nixpkgs that referenced this pull request Aug 18, 2024
Fixes two bugs:
- pass a forgotten `{}` to `compressDrv`.
- remove incorrect usage of `lndir` in `compressDrv`. I added a brief
  comment on why, see [this comment][1] for more details.

Tested with:

```
$ nix build .#legacyPackages.x86_64-linux.gitea.passthru.data-compressed
$ ls -lh result/public/assets/licenses.txt*
lrwxrwxrwx 1 root root  90 Jan  1  1970 result/public/assets/licenses.txt -> /nix/store/p21irsr57hckd3x3ym18aa0cr9zmm3an-gitea-1.22.1-data/./public/assets/licenses.txt
-r--r--r-- 1 root root 30K Jan  1  1970 result/public/assets/licenses.txt.br
-r--r--r-- 1 root root 82K Jan  1  1970 result/public/assets/licenses.txt.gz
```

[1]: NixOS#332752 (comment)
@philiptaron
Copy link
Contributor

Needs a rebase now that #335670 is merged. @SuperSandro2000

@SuperSandro2000
Copy link
Member Author

I don't understand the need for extraFindOperands given the current usecase.

see #332752 (comment)

This is useful for eg. nextcloud to prevent compressing thousands of
later unused files which are actually not used by the web server.
This is required to overwrite to use a compressed version of nextcloud in services.nextcloud.package because the module accesses version
Before compressDrvWeb would follow the frontend symlink into the nix
store and couldn't write it's file into the directories below the
symlink. lndir creates all dirs and then symlinks the actual files into
them which allows compressDrvWeb to place the compressed variants next
to them.
# which we do not need; we only need to symlink files.
(cd ${drv}; find -L -type d -exec mkdir -p $out/{} ';')
(cd ${drv}; find -L -type f -exec ln -s ${drv}/{} $out/{} ';')
# cannot use lndir here, because it stop recursing at symlinks that point to directories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

| xargs -0 -P$NIX_BUILD_CORES -I{} ${prog}
'';
formatsPipe = builtins.concatStringsSep "|" formats;
formatsPipe = lib.concatStringsSep "|" formats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why use lib over builtin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it makes really no difference but some lib functions contain backwards compatibility code like

closePropagation = if builtins ? genericClosure
then closePropagationFast
else closePropagationSlow;
and it is easier to just use lib for everything.

Copy link
Contributor

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built, tested, approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments