Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support szip (freely) #1132

Closed
PallHaraldsson opened this issue Nov 19, 2023 · 5 comments
Closed

Support szip (freely) #1132

PallHaraldsson opened this issue Nov 19, 2023 · 5 comments

Comments

@PallHaraldsson
Copy link

PallHaraldsson commented Nov 19, 2023

I see szip in the code, but I'm not sure non-proprietary code to de/compress is used. Please close if it isn't, I found free drop-in replacement here:

https://gitlab.dkrz.de/k202009/

The algorithm is patented, and likely they have run out since, I found that free code. The project here links to info on only non-commercial use, implying not fully free/open source is currently used:

https://support.hdfgroup.org/doc_resource/SZIP/

EDIT:

https://www.hdfgroup.org/2017/05/hdf5-data-compression-demystified-2-performance-tuning/

The HDF5 Library comes with two predefined compression methods, GNUzip (Gzip) and Szip and has the capability of using third-party compression methods as well.

The "third-party" linked to file not found, but I'm curious what other may be supported by underlying library, or this package, e.g. zstd? And Szip for sure freely?

I see now it's zstd plus likely at least these (any more of interest?):

H5Zblosc = "c8ec2601-a99c-407f-b158-e79c03c2f5f7"
H5Zbzip2 = "094576f2-1e46-4c84-8e32-c46c042eaaa2"
H5Zlz4 = "eb20ec05-5464-47b5-ba41-098e3c1068a3"
H5Zzstd

zstd is a good standard, at least fast, and Szip had best compression, at least at the time, but no longer? Is some other considered best now (for scientific data), i.e. for size and/or speed, which then?

@mkitti
Copy link
Member

mkitti commented Nov 20, 2023

SZIP should be installed by default and enabled.

julia> using HDF5

julia> HDF5.Filters.isencoderenabled(HDF5.API.H5Z_FILTER_SZIP)
true

julia> HDF5.API.h5z_filter_avail(HDF5.API.H5Z_FILTER_SZIP)
true

@mkitti
Copy link
Member

mkitti commented Nov 20, 2023

HDF5_jll is one of two packages that depend on libaec_jll:

https://juliahub.com/ui/Packages/General/libaec_jll

@mkitti
Copy link
Member

mkitti commented Nov 20, 2023

@mkitti
Copy link
Member

mkitti commented Nov 20, 2023

For good measure, this should be disambiguation from https://github.com/szcompressor/SZ

@PallHaraldsson
Copy link
Author

PallHaraldsson commented Nov 20, 2023

Good to see libaec_jll has HDF5_jll as a dependent, and thus HDF5.jl. That's what I wanted to see, and I had actually looked at:

https://juliahub.com/ui/Packages/General/HDF5_jll

and it's not listed as a dependency, or I would not have opened this issue. I realize it's cached information, and likely not often if ever updated. I've noticed missing package before. I suppose libaec_jll got added later, even recently.

I think I'll be closing the issue, but regarding SZ, I think you're saying we should support, then yes, if it's much used to read such files, or rather just later variant linked from there (seems very intriguing):

Note: SZ3 has been released here. SZ3 has much higher compression ratios than SZ2 in many cases, with comparable throughput (suffering slightly degraded throughput though). Details can be found in our ICDE21 paper.

SZ3: Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. "Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation", Proceeding of the 37th IEEE International Conference on Data Engineering (ICDE 21), Chania, Crete, Greece, Apr 19 - 22, 2021.

SZauto: Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. "Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, 2020. (code: https://github.com/szcompressor/SZauto/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants