Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected EOF when inflating ZIP #208

Open
blacha opened this issue Feb 12, 2024 · 9 comments
Open

Unexpected EOF when inflating ZIP #208

blacha opened this issue Feb 12, 2024 · 9 comments

Comments

@blacha
Copy link

blacha commented Feb 12, 2024

The problem

I have a zip file that I can decompress with other decompressors (ubuntu's unzip, yazul, python3) but when using fflate the zip is unable to be decompressed.

Error: unexpected EOF
    at inflt ([worker eval]:233:25)
    at Inflate.c ([worker eval]:289:18)
    at Inflate.push ([worker eval]:294:29)
    at [worker eval]:263:40
    at MessagePort.<anonymous> ([worker eval]:298:86)

How to reproduce

Using a simple script

import { AsyncUnzipInflate, Unzip } from 'fflate';

const bytes = readFileSync('./address.zip'); 
const unzip = new Unzip((file) => {
  console.log(file.name);
  file.ondata = (err, data, final) => {
    if (err) throw err;
    console.log('got', file.name, data.length, final);
  };
  file.start();
});

unzip.register(AsyncUnzipInflate);
unzip.push(bytes);

with my sample zip file: https://public.chard.com/fflate/address.zip

@101arrowz
Copy link
Owner

I'll look into this and let you know what the issue is soon. The streaming unzip API has needed a bit of work for a while now...

@dwsilk
Copy link

dwsilk commented Mar 5, 2024

Still unsure of the issue here @101arrowz?

@101arrowz
Copy link
Owner

Sorry, I've been busy for the last few weeks - I probably won't be able to get to this for a while. To be honest I would just advise against using the streaming unzip API until I can get around to fixing it.

@johan-tribus
Copy link

Hi @101arrowz,

I'm encountering the same problem with this file: https://github.com/johan-tribus/fflate/blob/master/file.zip
Have you had a chance to investigate the issue yet?

@SvV-CWinJS
Copy link

Hi 101arrowz

Thank you for this great tool.

I have the same problem, using unzipSync.
Strangely, though, I remember seeing it work on the same file several weeks ago.
This makes me think it is perhaps a certain race condition that mostly does not work out (or worse: a Windows update :D).

I hope you find and solve the problem easily.

@khoa-superfine
Copy link

Hi @101arrowz

Thank you for this great tool.

I had the same problem when using decompressSync.
Strangely enough. it works with a downloaded zip file. but after i unzip it then compress on mac. I get error: Unexpected EOF. I have compressed files before and used decompressSync and it worked fine.

I hope you find and solve the problem easily.

@kajkal
Copy link

kajkal commented Feb 2, 2025

@johan-tribus I've looked at the file you've attached. The problem is simple and should be easy to fix.

First a bit of context: essentially a zip file consists of several building blocks, it usually looks like this:

[local file header 1]        <- file 1 metadata, filename etc, how long the [file data 1] block will be
[file data 1]                <- content of the file 1
[data descriptor 1]          <- optional, added when this .zip archive was written as a stream and [file data 1]
...                             block final size was not known at the time of writing [local file header 1]
[local file header n]
[file data n]
[data descriptor n]
[central directory header 1] <- added at the very end, the directory of all files in this .zip archive,
...                             has correct information about the length and position of all [file data x] blocks
[central directory header n]
[end of central directory record]

When reading a .zip archive using the fflate.unzipSync or fflate.unzip function, the entire file buffer has to be loaded into memory, but this makes it possible to quickly locate the [central directory header] blocks and with their help locate and load bytes from the [file data x] related to the selected file, or all files from the archive.
When reading a .zip archive stream, the bytes are read from the beginning and data from [central directory header x] is not yet available, you have to rely on [local file header x]. But when the .zip was also written as a stream [local file header x] may not inform you how many consecutive bytes are in the [file data x] block, you must read all subsequent bytes until you come across [data descriptor x].

In the file you attached, fflate when iterating over such a [file data x] block encounters something that looks to it like [data descriptor x], but in fact is not. Starting at byte 13104 in your .zip file, a sequence of 4 bytes (50 4B 07 08) begins, which is used to identify the start of the [data descriptor] block. However, this is just a coincidence and the actual [data descriptor] does not start until byte 201194.

The problem is here:

fflate/src/index.ts

Lines 3610 to 3613 in f787356

if (sig == 0x8074B50) {
is = i += 12 + (oc == -2 && 8), f = 3, this.c = 0;
break;
} else if (sig == 0x2014B50) {

fflate should verify that it actually encountered [data descriptor] block. I don't fully understand these parts of the code too much, these magical negative values are a mystery to me, but a fix should look something like this:

if (sig == 0x8074B50) {
  const z64 = oc == -2; // is this file in Zip64 format, `oc == -2` seems to mean this, I am not sure
  const _sc = z64 ? b8(buf, i + 8) : b4(buf, i + 8); // read 'compressed size' from data descriptor block
  if (_sc == i) { // assuming that `i` is how many bytes of [file data x] block were read
    is = i += 12 + (oc == -2 && 8), f = 3, this.c = 0;
    break;
  } else {
    // just a coincidence, ignore it
  }
} else if (sig == 0x2014B50) {

Alternatively, crc-32 can be verified, but byte count is simpler.
Maybe verification by byte count first and then by crc-32 to be 100% sure?

@johan-tribus
Copy link

Hi all,

Thank you @kajkal for your extensive explanation and suggested changes. I tested it and seems to work as expected.
I think a check on the compressed size should be good enough. I suppose a crc-32 on the resulting file will be done in some other part of the code, I still struggle to understand it all.
I've opened a PR to address the issue using your fix.

@kajkal
Copy link

kajkal commented Mar 5, 2025

@johan-tribus of my two assumptions in the fix suggestion only one is true, i.e. oc == -2 actually indicates a file with a Data Descriptor in Zip64 format (oc == -1 indicates a file with a standard Data Descriptor). Unfortunately i does not indicate how many bytes have been read from the [file data x] block and as far as I have been able to determine, such information is not currently stored by fflate, at least, not always:

const filepath = './file.zip';
const unzipper = new fflate.Unzip();
unzipper.register(fflate.UnzipInflate);
unzipper.onfile = (file) => {
  console.log('file found', { name: file.name, compression: file.compression });
  let i = 0;
  file.ondata = (err, data, final) => {
    if (err) throw err;
    console.log(`  got decompressed chunk ${i++}`, { bytes: data.length, final });
  };
  file.start();
};

// my suggestion works when the file is loaded in one piece
unzipper.push(fs.readFileSync(filepath), true);

// but fails when the file is loaded in chunks
const rs = fs.createReadStream(filepath, { highWaterMark: 64 * 1024 /*bytes*/ });
rs.on('data', (chunk) => unzipper.push(chunk));
rs.on('end', () => unzipper.push(new Uint8Array(), true));

// therefore, it cannot be assumed that `i` always means how many bytes of [file data x] block were read

From what I understand it would be possible to keep the 'historical' (multiple chunks friendly) compressed size value in the UnzipDecoder instance and with each call to UnzipDecoder->push (applies to UnzipPassThrough, UnzipInflate and AsyncUnzipInflate) evaluate expression this.sc += data.length then when verifying with the Data Descriptor you could do if (_sc == this.d.sc + i) which should give the correct value.
That would be my naive solution - it would be good to hear how @101arrowz would approach this - there is probably a much simpler and more elegant solution 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants