High-performance parallelized implementations of common zip file operations.
See discussion in pex-tool/pex#2158.
This crate adds some hacks to the widely-used zip
crate (see the diff at https://github.com/zip-rs/zip/compare/master...cosmicexplorer:zip:merge-entries?expand=1). When the merge
feature is provided to this fork of zip
, two crimes are unveiled:
merge_archive()
:- This will copy over the contents of another zip file into the current one without deserializing any data.
- This enables parallelization of arbitrary zip commands, as multiple zip files can be created in parallel and then merged afterwards.
finish_into_readable()
:- Creating a writable
ZipWriter
and then converting it into a readableZipArchive
is a very common operation when merging zip files. - This likely has zero performance benefit, but it is a good example of the types of investigations you can do with the zip format, especially against the well-written
zip
crate.
- Creating a writable
We mainly need compatibility with zipfile
and zipimport
(see pex-tool/pex#2158 (comment)). Also see the zipimport
PEP. I currently believe that this program's output will work perfectly against zipfile
and zipimport
.
- benchmark zip creation (vs
zip
crate) - benchmark zip merging (vs
zip
crate)- this should also really be done in the
zip-merge
crate, too
- this should also really be done in the