Is there a proper way to send zlib partial files for unzipping? #4030

constellates · 2015-11-25T22:38:07Z

I saw a bug was fixed in v5.0.0 so that zlib will throw an error when it reaches the end of a truncated input (#2595). Is it possible to use zlib on partial files.

I'm working on a project that needs to be able quickly read through a large package of gzipped files (20+ files with a combined size of 2-5GB). All the content I care about is in the header of each file (first 500 bytes). I had this working previously using fs.read() while passing options to only read the first 500 bytes then using zlib.gunzip() to decompress the contents before parsing the header from the binary data.

This now throws an "unexpected end of file" input after v5.0.0. Is there another way to accomplish this or is zlib going to throw errors for the process regardless of what I do?

I've tried using streams and the chunks in the .on('data') event are being properly decompressed and parsed but I'm not confident the chunk size will always contain the full header and I still have to handle the error which is breaking the pipe before it gets to an "end" or "close" event.

var readStream = fs.createReadStream(file.path, {start: 0, end: 500});
var gunzip = zlib.createGunzip();

readStream.pipe(gunzip)
    .on('data', function(chunk) {
        console.log(parseBinaryHeader(chunk));
        console.log('got %d bytes of data', chunk.length);
    })
    .on('error', function (err) {
        console.log(err);
    })
    .on('end', function() {
        console.log('end');
    });

The text was updated successfully, but these errors were encountered:

kyriosli · 2015-11-27T02:32:19Z

I think you should not use pipe here because it will trigger the end of the dest stream when it reaches its end of stream. You can try to listen to its "data" event and write to the gunzip stream, and open another file for read when it reaches its end.

jhamhader · 2015-11-27T09:20:33Z

Regarding the partial data behavior after #2595: decompression is done the same way as before. The only difference is that once you call .end() on the zlib stream, if it didn't complete the decompression (a truncated input) it will emit an error.

You could ask .pipe() to not end the writable part of the pipe by using its option end.
For example:
readStream.pipe(gunzip, {end: false})

Fishrock123 · 2015-11-30T15:05:58Z

cc @indutny and @trevnorris

indutny · 2015-11-30T17:59:48Z

Yeah, I think you may want to write partial data and flush, should work just fine.

constellates · 2015-11-30T19:02:59Z

Yeah, I think you may want to write partial data and flush, should work just fine.

@indutny

Can you explain what you mean by "flush"? I'm relatively new to node streams.

indutny · 2015-11-30T19:07:08Z

Sorry, I was referring to https://nodejs.org/api/zlib.html#zlib_zlib_flush_kind_callback

constellates · 2015-12-03T17:24:48Z

Thanks for all the suggestions and help. Here's what ended up working for me.

Set the chunk size to the full header size.
Write the single chunk to the decompress stream and immediately pause the stream.
Handle the decompressed chunk.

example

var bytesRead = 500;
var decompressStream = zlib.createGunzip()
    .on('data', function (chunk) {
        parseHeader(chunk);
        decompressStream.pause();
    }).on('error', function(err) {
        handleGunzipError(err, file, chunk);
    });

fs.createReadStream(file.path, {start: 0, end: bytesRead, chunkSize: bytesRead + 1})
    .on('data', function (chunk) {
        decompressStream.write(chunk);
    });

This has been working so far and also allows me to keep handling all other gunzip errors as the pause() prevents the decompress stream from throwing the "unexpected end of file" error. Let me know if there are any consequences of this strategy I might not be aware of.

evanlucas · 2016-02-02T11:52:51Z

Closing as this seems to have been answered. Thanks!

justinsg · 2016-06-23T18:26:11Z

I know this has already been closed, but PR #6069 makes it possible to do this synchronously too.

The synchronous version of @constellates answer above is:

var bufferSize = 500; // you'll need to account for the header size
var buffer = new Buffer(bufferSize);

var fd = fs.openSync(file.path, 'r');
fs.readSync(fd, buffer, 0, bufferSize, 0);

var outBuffer = zlib.unzipSync(buffer, {finishFlush: zlib.Z_SYNC_FLUSH});

console.log(outBuffer.toString());

I use it for reading the start of gzipped log files to parse the date of the first line (oldest entry).

Source:
API reference for zlib
https://github.com/nodejs/node/blob/master/doc/api/zlib.md#compressing-http-requests-and-responses

constellates · 2016-06-23T18:30:44Z

Thanks @justinsg

I appreciate hearing about the update.

LRagji · 2022-12-11T09:31:47Z

@justinsg How do you account for the header size?, what i understand is there is some book keeping bytes at start are those fixed? or is there a formulae for this? I know my application header size

mscdex added question Issues that look for answers. zlib Issues and PRs related to the zlib subsystem. labels Nov 25, 2015

evanlucas closed this as completed Feb 2, 2016

jhamhader mentioned this issue Mar 17, 2016

Gzip decoder too strict #5761

Closed

addaleax mentioned this issue Apr 5, 2016

zlib: Make the finish flush flag configurable #6069

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a proper way to send zlib partial files for unzipping? #4030

Is there a proper way to send zlib partial files for unzipping? #4030

constellates commented Nov 25, 2015

kyriosli commented Nov 27, 2015

jhamhader commented Nov 27, 2015

Fishrock123 commented Nov 30, 2015

indutny commented Nov 30, 2015

constellates commented Nov 30, 2015

indutny commented Nov 30, 2015

constellates commented Dec 3, 2015

evanlucas commented Feb 2, 2016

justinsg commented Jun 23, 2016

constellates commented Jun 23, 2016

LRagji commented Dec 11, 2022

Is there a proper way to send zlib partial files for unzipping? #4030

Is there a proper way to send zlib partial files for unzipping? #4030

Comments

constellates commented Nov 25, 2015

kyriosli commented Nov 27, 2015

jhamhader commented Nov 27, 2015

Fishrock123 commented Nov 30, 2015

indutny commented Nov 30, 2015

constellates commented Nov 30, 2015

indutny commented Nov 30, 2015

constellates commented Dec 3, 2015

evanlucas commented Feb 2, 2016

justinsg commented Jun 23, 2016

constellates commented Jun 23, 2016

LRagji commented Dec 11, 2022