Unexpected EOF when inflating ZIP #208

blacha · 2024-02-12T21:31:26Z

The problem

I have a zip file that I can decompress with other decompressors (ubuntu's unzip, yazul, python3) but when using fflate the zip is unable to be decompressed.

Error: unexpected EOF
    at inflt ([worker eval]:233:25)
    at Inflate.c ([worker eval]:289:18)
    at Inflate.push ([worker eval]:294:29)
    at [worker eval]:263:40
    at MessagePort.<anonymous> ([worker eval]:298:86)

How to reproduce

Using a simple script

import { AsyncUnzipInflate, Unzip } from 'fflate';

const bytes = readFileSync('./address.zip'); 
const unzip = new Unzip((file) => {
  console.log(file.name);
  file.ondata = (err, data, final) => {
    if (err) throw err;
    console.log('got', file.name, data.length, final);
  };
  file.start();
});

unzip.register(AsyncUnzipInflate);
unzip.push(bytes);

with my sample zip file: https://public.chard.com/fflate/address.zip

The text was updated successfully, but these errors were encountered:

101arrowz · 2024-02-13T00:52:59Z

I'll look into this and let you know what the issue is soon. The streaming unzip API has needed a bit of work for a while now...

dwsilk · 2024-03-05T20:03:28Z

Still unsure of the issue here @101arrowz?

101arrowz · 2024-03-07T15:15:16Z

Sorry, I've been busy for the last few weeks - I probably won't be able to get to this for a while. To be honest I would just advise against using the streaming unzip API until I can get around to fixing it.

johan-tribus · 2024-07-26T08:50:42Z

Hi @101arrowz,

I'm encountering the same problem with this file: https://github.com/johan-tribus/fflate/blob/master/file.zip
Have you had a chance to investigate the issue yet?

SvV-CWinJS · 2024-08-03T07:19:57Z

Hi 101arrowz

Thank you for this great tool.

I have the same problem, using unzipSync.
Strangely, though, I remember seeing it work on the same file several weeks ago.
This makes me think it is perhaps a certain race condition that mostly does not work out (or worse: a Windows update :D).

I hope you find and solve the problem easily.

khoa-superfine · 2024-08-20T08:44:48Z

Hi @101arrowz

Thank you for this great tool.

I had the same problem when using decompressSync.
Strangely enough. it works with a downloaded zip file. but after i unzip it then compress on mac. I get error: Unexpected EOF. I have compressed files before and used decompressSync and it worked fine.

I hope you find and solve the problem easily.

kajkal · 2025-02-02T20:46:27Z

@johan-tribus I've looked at the file you've attached. The problem is simple and should be easy to fix.

First a bit of context: essentially a zip file consists of several building blocks, it usually looks like this:

[local file header 1]        <- file 1 metadata, filename etc, how long the [file data 1] block will be
[file data 1]                <- content of the file 1
[data descriptor 1]          <- optional, added when this .zip archive was written as a stream and [file data 1]
...                             block final size was not known at the time of writing [local file header 1]
[local file header n]
[file data n]
[data descriptor n]
[central directory header 1] <- added at the very end, the directory of all files in this .zip archive,
...                             has correct information about the length and position of all [file data x] blocks
[central directory header n]
[end of central directory record]

When reading a .zip archive using the fflate.unzipSync or fflate.unzip function, the entire file buffer has to be loaded into memory, but this makes it possible to quickly locate the [central directory header] blocks and with their help locate and load bytes from the [file data x] related to the selected file, or all files from the archive.
When reading a .zip archive stream, the bytes are read from the beginning and data from [central directory header x] is not yet available, you have to rely on [local file header x]. But when the .zip was also written as a stream [local file header x] may not inform you how many consecutive bytes are in the [file data x] block, you must read all subsequent bytes until you come across [data descriptor x].

In the file you attached, fflate when iterating over such a [file data x] block encounters something that looks to it like [data descriptor x], but in fact is not. Starting at byte 13104 in your .zip file, a sequence of 4 bytes (50 4B 07 08) begins, which is used to identify the start of the [data descriptor] block. However, this is just a coincidence and the actual [data descriptor] does not start until byte 201194.

The problem is here:

fflate/src/index.ts

Lines 3610 to 3613 in f787356

    
           if (sig == 0x8074B50) { 
        
             is = i += 12 + (oc == -2 && 8), f = 3, this.c = 0; 
        
             break; 
        
           } else if (sig == 0x2014B50) {

fflate should verify that it actually encountered [data descriptor] block. I don't fully understand these parts of the code too much, these magical negative values are a mystery to me, but a fix should look something like this:

if (sig == 0x8074B50) {
  const z64 = oc == -2; // is this file in Zip64 format, `oc == -2` seems to mean this, I am not sure
  const _sc = z64 ? b8(buf, i + 8) : b4(buf, i + 8); // read 'compressed size' from data descriptor block
  if (_sc == i) { // assuming that `i` is how many bytes of [file data x] block were read
    is = i += 12 + (oc == -2 && 8), f = 3, this.c = 0;
    break;
  } else {
    // just a coincidence, ignore it
  }
} else if (sig == 0x2014B50) {

Alternatively, crc-32 can be verified, but byte count is simpler.
Maybe verification by byte count first and then by crc-32 to be 100% sure?

johan-tribus · 2025-03-04T12:27:13Z

Hi all,

Thank you @kajkal for your extensive explanation and suggested changes. I tested it and seems to work as expected.
I think a check on the compressed size should be good enough. I suppose a crc-32 on the resulting file will be done in some other part of the code, I still struggle to understand it all.
I've opened a PR to address the issue using your fix.

kajkal · 2025-03-05T01:37:04Z

@johan-tribus of my two assumptions in the fix suggestion only one is true, i.e. oc == -2 actually indicates a file with a Data Descriptor in Zip64 format (oc == -1 indicates a file with a standard Data Descriptor). Unfortunately i does not indicate how many bytes have been read from the [file data x] block and as far as I have been able to determine, such information is not currently stored by fflate, at least, not always:

const filepath = './file.zip';
const unzipper = new fflate.Unzip();
unzipper.register(fflate.UnzipInflate);
unzipper.onfile = (file) => {
  console.log('file found', { name: file.name, compression: file.compression });
  let i = 0;
  file.ondata = (err, data, final) => {
    if (err) throw err;
    console.log(`  got decompressed chunk ${i++}`, { bytes: data.length, final });
  };
  file.start();
};

// my suggestion works when the file is loaded in one piece
unzipper.push(fs.readFileSync(filepath), true);

// but fails when the file is loaded in chunks
const rs = fs.createReadStream(filepath, { highWaterMark: 64 * 1024 /*bytes*/ });
rs.on('data', (chunk) => unzipper.push(chunk));
rs.on('end', () => unzipper.push(new Uint8Array(), true));

// therefore, it cannot be assumed that `i` always means how many bytes of [file data x] block were read

From what I understand it would be possible to keep the 'historical' (multiple chunks friendly) compressed size value in the UnzipDecoder instance and with each call to UnzipDecoder->push (applies to UnzipPassThrough, UnzipInflate and AsyncUnzipInflate) evaluate expression this.sc += data.length then when verifying with the Data Descriptor you could do if (_sc == this.d.sc + i) which should give the correct value.
That would be my naive solution - it would be good to hear how @101arrowz would approach this - there is probably a much simpler and more elegant solution 😉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected EOF when inflating ZIP #208

Unexpected EOF when inflating ZIP #208

blacha commented Feb 12, 2024

101arrowz commented Feb 13, 2024

dwsilk commented Mar 5, 2024

101arrowz commented Mar 7, 2024

johan-tribus commented Jul 26, 2024

SvV-CWinJS commented Aug 3, 2024

khoa-superfine commented Aug 20, 2024

kajkal commented Feb 2, 2025

johan-tribus commented Mar 4, 2025

kajkal commented Mar 5, 2025

Unexpected EOF when inflating ZIP #208

Unexpected EOF when inflating ZIP #208

Comments

blacha commented Feb 12, 2024

101arrowz commented Feb 13, 2024

dwsilk commented Mar 5, 2024

101arrowz commented Mar 7, 2024

johan-tribus commented Jul 26, 2024

SvV-CWinJS commented Aug 3, 2024

khoa-superfine commented Aug 20, 2024

kajkal commented Feb 2, 2025

johan-tribus commented Mar 4, 2025

kajkal commented Mar 5, 2025