Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to decode streams compressed with "zstd long=31" #390

Closed
pushshift opened this issue Jun 8, 2021 · 5 comments · Fixed by #394
Closed

Ability to decode streams compressed with "zstd long=31" #390

pushshift opened this issue Jun 8, 2021 · 5 comments · Fixed by #394

Comments

@pushshift
Copy link

I use very large window sizes when compressing data (up to the maximum of 2 << 31). I noticed that the package has a MaxWindowSize of 2 << 29.

How would I go about decoding streams compressed with long=31? Was support for larger window sizes purposely left out?

Thanks!

@pushshift
Copy link
Author

pushshift commented Jun 8, 2021

FYI -- I monkey patched the const located in framedec.go and was able to stream in a zstd stream encoded with long=31 without any issue. My recommendation would be to increase MaxWindowSize to the actual maximum allowed by zstandard itself (when using the long flag, this is 2 << 31). When trying to decode a compressed file with zstd, it will generally complain when the file was encoded with a window size of 2 << 31 and request that you provide the --long=31 flag for decompression. This generally requires two gigs of memory, but I don't see how increasing the MaxWindowSize would cause any incompatibility with existing scripts that use this module.

const (
    // The minimum Window_Size is 1 KB.
    MinWindowSize = 1 << 10
    MaxWindowSize = 1 << 31
)

@pushshift
Copy link
Author

pushshift commented Jun 8, 2021

Linking to pull request to change MaxWindowSize constant: #392

klauspost added a commit that referenced this issue Jun 8, 2021
Also reduce default memory allocs.

Fixes #390
Replaces #392
@klauspost
Copy link
Owner

2 << 31 is 4GB. Either way, that is just as arbitrary a limit as the current.

I made the maximum configurable and clarified the documentation. See #394

@pushshift
Copy link
Author

pushshift commented Jun 8, 2021

Thank you! That's great. Also, is 2 << 31 4GB for signed int? The warning zstd gives looks like:

-06-07_00.ndjson.zst : Decoding error (36) : Frame requires too much memory for decoding 
-06-07_00.ndjson.zst : Window size larger than maximum : 2147483648 > 134217728 
-06-07_00.ndjson.zst : Use --long=31 or --memory=2048MB

I see what you mean, though. Either way, that appears to be the maximum that can be used for zstd, but I really like the idea of making the window configurable. Awesome project by the way -- really impressive work!

klauspost added a commit that referenced this issue Jun 8, 2021
Also reduce default memory allocs.

Fixes #390
Replaces #392
@klauspost
Copy link
Owner

@pushshift 2 << 31 <- you are shifting a 2, making it 1 << 32.

Thanks for the contribution and the kind words. I have merged the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants