-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zip 2.1.3
regressed large zips with >64k files
#189
Comments
Another interesting result here is that performance also suffers because it is collecting into two |
I suspect #93 is likely the culprit. Will take a look at this. |
Thanks for taking a look here. I was able to quickly reproduce this by using an older version of This is to mimic the usage within https://github.com/getsentry/symbolic/blob/master/symbolic-debuginfo/src/sourcebundle.rs as closely as possible. I can confirm using that testcase that version #[test]
fn test_64k() {
let mut buffer = vec![];
buffer.write_all(b"SYSB").unwrap();
buffer.write_all(&0u32.to_le_bytes()).unwrap();
let cursor = Cursor::new(buffer);
let mut zipwriter = zip064::write::ZipWriter::new(cursor);
let opt =
zip064::write::FileOptions::default().last_modified_time(zip064::DateTime::default());
for i in 0..100_000 {
let file_contents = format!("{i}.txt");
zipwriter.start_file(&file_contents, opt).unwrap();
zipwriter.write_all(file_contents.as_bytes()).unwrap();
}
let cursor = zipwriter.finish().unwrap();
let mut zipreader = zip213::read::ZipArchive::new(cursor).unwrap();
for i in 0..100_000 {
let expected_contents = format!("{i}.txt");
let mut file = zipreader.by_name(&expected_contents).unwrap();
let mut file_contents = String::new();
file.read_to_string(&mut file_contents).unwrap();
assert_eq!(file_contents, expected_contents);
}
} feel free to adopt this to your testsuite, though mind you that it runs very slowly on a debug build. |
Done adapting it, and I've now used it to test a fix with some refactoring that avoids enumerating the files repeatedly and unnecessarily. When writing a lot of tiny files, it's important to choose |
I have a huge zip file with well over 64k files.
Version
2.1.1
was able to correctly parse that because inside ofget_metadata
, there would only be oneok_results
, the one with the correct number of files.Version
2.1.3
however regressed that behavior, and it would parse both zip64 and zip32 indices, reading both indices into anIndexMap
, and then picking the wrong one which has the number of files capped at 64k.As the
dir_start
is the same for both,max_by_key
picks the last one as per its documentation.As mentioned, version
2.1.1
rejects the index with the capped number of files and it fails somewhere incentral_header_to_zip_file
, though I haven’t debugged it deeper.This regression might be related to 8efd233, or 68f7f5d which are touching the relevant code, though I’m not quite sure about that.
The text was updated successfully, but these errors were encountered: