Skip to content

Conversation

@tueda
Copy link
Collaborator

@tueda tueda commented Jun 19, 2024

This PR implements #540.

@tueda tueda linked an issue Jun 19, 2024 that may be closed by this pull request
@coveralls
Copy link

Coverage Status

coverage: 49.369% (-0.6%) from 49.999%
when pulling 930d085 on tueda:feat-zstd
into 83e3d41 on vermaseren:master.

@coveralls
Copy link

Coverage Status

coverage: 49.374% (-0.6%) from 49.999%
when pulling e9bde71 on tueda:feat-zstd
into 83e3d41 on vermaseren:master.

@coveralls
Copy link

Coverage Status

coverage: 49.474% (-0.5%) from 49.999%
when pulling 8290f9a on tueda:feat-zstd
into 83e3d41 on vermaseren:master.

@jodavies jodavies added this to the v5 milestone Nov 6, 2024
@jodavies
Copy link
Collaborator

What is the status on this one @tueda ? You have several experimental zstd branches.

@tueda
Copy link
Collaborator Author

tueda commented Nov 20, 2024

If I remember correctly, it should work. But I don't have a good benchmark showing the performance advantages of zstd compression. (This may just be because all my machines currently use SSDs, not HDDs.)

Do you have a good benchmark results for this? If you are happy with this implementation, I think it can be merged (but the "`Fix previous" commit could be squashed into the previous one).

I'll rebase it.

@coveralls
Copy link

coveralls commented Nov 20, 2024

Coverage Status

coverage: 47.921% (-2.8%) from 50.74%
when pulling 39b43b7 on tueda:feat-zstd
into 1d5e40b on vermaseren:master.

@jodavies
Copy link
Collaborator

I will take another look at some benchmarks. At least in #540 I claimed some improvements.

@tueda
Copy link
Collaborator Author

tueda commented Nov 20, 2024

zlibWrapper uses the quote form of the #include directive for zstd.h:

https://github.com/facebook/zstd/blob/794ea1b0afca0f020f4e57b6732332231fb23c70/zlibWrapper/zstd_zlibwrapper.c#L28

For some reason, with the macOS 14 image, CPATH doesn't work for the quote form while CPPFLAGS -I does. I'm confused because the GCC documentation says they have the same effect...

Edit: OK, GCC is Clang on Mac, and the Clang manual says CPATH, C_INCLUDE_PATH etc. are for the default system include path list. The meaning of empty components in the environment variable is also different (GCC: -I., Clang: ignored).

@jodavies
Copy link
Collaborator

This also needs adding to the manual. Additionally, the current manual claims "On compress;is the default, but ratherOn compress,gzip;` is the actual default.

@jodavies
Copy link
Collaborator

jodavies commented Nov 21, 2024

This is a good benchmark I think. It makes a large expression (but it fits in ScratchSize) and then pointlessly reads and writes it a bunch of times. So a large % of the time is spent in the compression routines.

#-

* Keep expressions in memory. We want to test only the sort files.
#: ScratchSize 3G
* Write out patches more regularly during sorting
#: TermsInSmall 10K

Off statistics;
Off threadstats;
`COMPR'

#define NTERMS "{2^20}"
#define BLOCKS "16"
#define ITER "10"

Symbol n,m,x,y;
CFunction f,g,h,sum;

Local test =
	#do b = 1,`BLOCKS'
		+ f(`b') * sum(x,1,{`NTERMS'/`BLOCKS'},g(x))
	#enddo
	;
.sort
Identify sum(?a) = sum_(?a);
* Make the terms larger.
Identify f(n?)*g(m?) = f(n)*g(m) * h((n*x+m*y)^25);

* Pointlessly read and sort and write the terms
#do i = 1,`ITER'
	#message i = `i' / `ITER'
	.sort
#enddo

.end

I run it with FORMTMP on a tmpfs, so the speed of the storage is taken out of the equation. Then I see:

Benchmark 1: Master branch, default ( = On compress,gzip;)
  Time (mean ± σ):     121.793 s ±  3.743 s    [User: 567.935 s, System: 7.577 s]
  Range (min … max):   115.948 s … 126.755 s    10 runs

Benchmark 2: Master branch, Off compress;
  Time (mean ± σ):     45.177 s ±  0.200 s    [User: 209.576 s, System: 22.742 s]
  Range (min … max):   44.950 s … 45.688 s    10 runs

Benchmark 3: zstd branch, default (= On compress,zstd;)
  Time (mean ± σ):     112.262 s ±  1.657 s    [User: 501.364 s, System: 7.715 s]
  Range (min … max):   110.187 s … 114.901 s    10 runs

Benchmark 4: zstd branch, On compress,gzip;)
  Time (mean ± σ):     119.869 s ±  2.645 s    [User: 567.030 s, System: 7.585 s]
  Range (min … max):   115.460 s … 124.314 s    10 runs

The peak size of the sort files 741M, 2.5G, 701M and 741M, respectively. That Benchmark 1 == Benchmark 4 means that using gzip via the wrapper does not incur a penalty.

Benchmark 2 is very fast since we do no compression at all, but if FORMTMP is on a real storage device this slows down a lot.

So Zstd is both a bit faster, and compresses a bit better, in this test.

Could you rebase the fix commit in your branch, and here is a suggestion for the manual which you can cherry-pick if you are happy: jodavies@7071cce

@tueda
Copy link
Collaborator Author

tueda commented Nov 22, 2024

Thanks! The results look nice.

Maybe it would be useful to put this benchmark in the repository? We could simply put it into some subdirectory bench/ or benchmarks/. Alternatively, we could use the test suite framework, putting it to check/bench/ with an additional test fold and simple assertions, then ./check/check.rb /path/to/form -C bench thisbench.frm should run it.

I have cherry-picked the patch for the manual and squashed the "Fix previous" commit into the previous one.

@jodavies
Copy link
Collaborator

I added the benchmark in check/benchmarks, and also a "mini" version in the usual tests. The idea is to ensure there is a test in there which creates sort files (very likely there is, but this guarantees it...).

Under valgrind tform the test is surprisingly slow, it suffers from the same load balancing issue as the color test. I am not sure about the parform failure.

@tueda tueda mentioned this pull request Nov 29, 2024
tueda and others added 3 commits February 6, 2025 15:39
- Add zstd as a submodule at extern/zstd. Use only zstd/zlibWrapper.
  Clone during configuration if not already cloned.
- Add a new configure option --with-zstd (default: check). Define
  WITHZSTD and build zstd/zlibWrapper.
- Use the subdir-objects Automake option if appropriate.
The default is to use zstd, if FORM is compiled with the wrapper.
Specifying "On compress,gzip;" will still use zlib, via the zstd
wrapper.
Also fix "dubious ownership" in containers for Git operations.
@tueda
Copy link
Collaborator Author

tueda commented Feb 6, 2025

Rebased (conflicts resolved).

@jodavies
Copy link
Collaborator

One thing: this works fine when loading tablebases which have been created with "zlib form" since the wrapper detects zlib data and uses the right decompression function. But not the other way around: tablebases created with zstd enabled do not work in "zlib form".

The easiest solution would be to disable and renable the wrapper around calls to compress in minos.c.

jodavies added 2 commits March 7, 2025 13:55
Maintain portability of tablebase files between FORM binaries compiled
with and without zstd support.
@jodavies jodavies merged commit f2df9bf into form-dev:master May 23, 2025
58 checks passed
@tueda tueda deleted the feat-zstd branch September 30, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zstd compression for sort files

3 participants