Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: Correctly handle multiple members #795

Merged
merged 1 commit into from
Oct 24, 2023
Merged

Conversation

wader
Copy link
Owner

@wader wader commented Oct 23, 2023

A gzip file consists of one or more "members" that are concatenated on uncompress.

Introduce a members array with each member and uncompressed is now the concatenation.

Fixes #794

@TomiBelan
Copy link

Looks good.
But double check the testdata. Some files got suspiciously shorter. format/text/testdata looks bad. And I don't see why this PR affects format/zip/testdata.

@TomiBelan
Copy link

Are you sure the other testdata changes are ok?

In format/gzip/testdata/compressed_size.fqtest and format/json/testdata/tofromjson.fqtest, is it intentional that the test doesn't show what's inside the gzip anymore?

In format/zip/testdata/test0.fqtest etc, why does the local_files entry for test/ (a directory, whose content is an empty string) change from uncompressed: raw bits 0x3f-0x3f (0) to uncompressed[0:1]: (gzip) 0x3f-0x3f (0)?

A gzip file consists of one or more "members" that are concatenated on uncompress.

Introduce a members array with each member and uncompressed is now the concatenation.

Fixes #794
@wader
Copy link
Owner Author

wader commented Oct 24, 2023

Are you sure the other testdata changes are ok?

Nope :) I thought so but i was a bit too quick, fixed. Thanks! some weirdness in the output i think was caused by that the gzip decoder succeeded for an empty input.

Now i've remodelled so that the root is a struct again and then there is a "members" array and the root "uncompressed" is the concat of the member uncompressed. You can still access each member uncompress as .members[].uncompress

What do you think? i'm still not very happy with the uncompress related code in fq, is a bit of a mess, but have not found good abstraction yet to make it more shareable between compression formats.

@TomiBelan
Copy link

I haven't reviewed the go code, but the output looks good. 👍

@wader
Copy link
Owner Author

wader commented Oct 24, 2023

Ok, thanks! i'm still not very happy with the most of the archive decoders in fq but that is for a future PR. Ex user facing-wise it would be nice if all of them had common uncompress and decode_uncompressed/files etc.

@wader wader merged commit 0cebaed into master Oct 24, 2023
5 checks passed
@wader wader deleted the gzip-multi-memebers branch October 24, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gzip files can contain multiple concatenated gzips
2 participants