Extract .jar and .zip files concurrently, use buffer for all io.Copy operations #779

egibs · 2025-01-25T18:51:21Z

We can speed up the extraction of large packages like Spark and Trino by using concurrency since both contain many .jar files. Both .jar and .zip archives can be extracted in parallel rather than sequentially which is fortuitous in this case since these packages, when fully extracted, amount to somewhere in the neighborhood of ~1e5 files.

Additionally, we can improve overall memory usage by using a buffer pool and io.CopyBuffer across all extraction methods.

Finally, I fixed .xz extractions not using a limit reader which slipped through previous optimizations.

Signed-off-by: egibs <[email protected]>

stevebeattie

LGTM, and saw a slight performance increase in limited testing. Thanks!

egibs added 2 commits January 25, 2025 10:38

Improve efficiency and performance of zip extractions

f5d0f42

Signed-off-by: egibs <[email protected]>

Use buffer for all io.Copy operations; add limit reader for .xz files

aaa74d3

Signed-off-by: egibs <[email protected]>

egibs requested a review from stevebeattie January 25, 2025 18:51

egibs changed the title ~~Extract .jar and .zip files conncurrently, use buffer for all io.Copy operations~~ Extract .jar and .zip files concurrently, use buffer for all io.Copy operations Jan 25, 2025

egibs added 3 commits January 25, 2025 15:12

Consolidate consts

2ed60e9

Signed-off-by: egibs <[email protected]>

Keep parameters on a single line

54864e1

Signed-off-by: egibs <[email protected]>

Merge branch 'main' into zip-improvements

9dbd38c

stevebeattie approved these changes Jan 27, 2025

View reviewed changes

stevebeattie merged commit e7d91da into chainguard-dev:main Jan 27, 2025
9 checks passed

BrewTestBot mentioned this pull request Jan 27, 2025

malcontent 1.8.6 Homebrew/homebrew-core#205653

Merged

egibs deleted the zip-improvements branch January 28, 2025 00:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract .jar and .zip files concurrently, use buffer for all io.Copy operations #779

Extract .jar and .zip files concurrently, use buffer for all io.Copy operations #779

Uh oh!

egibs commented Jan 25, 2025 •

edited

Loading

Uh oh!

stevebeattie left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Extract .jar and .zip files concurrently, use buffer for all io.Copy operations #779

Extract .jar and .zip files concurrently, use buffer for all io.Copy operations #779

Uh oh!

Conversation

egibs commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevebeattie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

egibs commented Jan 25, 2025 •

edited

Loading