fix: avoid scanning through all local file headers when opening an archive #281

jrudolph · 2025-01-17T10:49:45Z

Fixes #280

The idea is to make ZipFileData.data_start be calculated lazy to avoid accessing all local file headers already when opening a file. This required a change to the signature of ZipFileData.data_start() (which seems to be non-public after all).

…rchive Fixes zip-rs#280

Pr0methean · 2025-02-25T21:51:29Z

src/read.rs

@@ -1068,14 +1065,6 @@ pub(crate) fn central_header_to_zip_file<R: Read + Seek>(
        ));
    }

-    let data_start = find_data_start(&file, reader)?;
-
-    if data_start > central_directory.directory_start {


Shouldn't we still check for this eventually?

The question is what the ultimate purpose of this check is. It looks a bit like a conservative check of some specification requirement.

But is it require for correctness / ruling out weird edge cases? After all no tests fail after removing this check.

In some way, being able to do this check runs counter to the idea of this PR of not having to scan through the file for random access. I.e. even if we defer, e.g. to the below ZipFileData.data_start or move the check into find_data_start, the random access use case I have in mind here will never execute the check.

IIRC it does relate to edge cases that have come up in fuzzing, such as when archives are concatenated or nested or the magic bytes occur in filenames. Even if the spec is ambiguous in those cases about which one to extract, which I'm pretty sure is true for concatenation when the second one is under 64KiB, we should still consistently choose one or the other.

Pr0methean

Sounds good in principle, but I don't like the idea of totally removing the validation when we could just defer it.

Signed-off-by: Chris Hennick <[email protected]>

Pr0methean

After this is merged I'll look at where the data-start-after-header-start check might be added back without an additional seek, to ensure we're not reading an old central directory that's been superseded (at least, not by adding or updating files).

Lynnesbian · 2025-04-09T22:56:39Z

@Pr0methean Just a suggestion: If you can't find a way to add the check back without incurring a performance penalty, maybe it could be made optional with a parameter on the Config struct?

jrudolph and others added 2 commits January 17, 2025 11:47

fix: avoid scanning through all local file headers while opening an a…

41c262f

…rchive Fixes zip-rs#280

Merge branch 'master' into 280-do-not-scan-file-early

0553873

Pr0methean requested changes Feb 25, 2025

View reviewed changes

Pr0methean requested changes Mar 17, 2025

View reviewed changes

Pr0methean added 3 commits March 16, 2025 20:46

Merge branch 'master' into 280-do-not-scan-file-early

466693f

Merge branch 'master' into 280-do-not-scan-file-early

97e4680

Merge branch 'master' into 280-do-not-scan-file-early

ac7520b

Signed-off-by: Chris Hennick <[email protected]>

Pr0methean enabled auto-merge April 3, 2025 17:39

Pr0methean approved these changes Apr 3, 2025

View reviewed changes

Pr0methean added this pull request to the merge queue Apr 3, 2025

Merged via the queue into zip-rs:master with commit f4d71a4 Apr 3, 2025
39 checks passed

Pr0methean mentioned this pull request Apr 3, 2025

chore: release v2.6.1 #334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid scanning through all local file headers when opening an archive #281

fix: avoid scanning through all local file headers when opening an archive #281

jrudolph commented Jan 17, 2025

Pr0methean Feb 25, 2025 •

edited

Loading

jrudolph Mar 17, 2025

Pr0methean Mar 17, 2025 •

edited

Loading

Pr0methean left a comment

Pr0methean left a comment •

edited

Loading

Lynnesbian commented Apr 9, 2025

fix: avoid scanning through all local file headers when opening an archive #281

fix: avoid scanning through all local file headers when opening an archive #281

Conversation

jrudolph commented Jan 17, 2025

Pr0methean Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

jrudolph Mar 17, 2025

Choose a reason for hiding this comment

Pr0methean Mar 17, 2025 • edited Loading

Choose a reason for hiding this comment

Pr0methean left a comment

Choose a reason for hiding this comment

Pr0methean left a comment • edited Loading

Choose a reason for hiding this comment

Lynnesbian commented Apr 9, 2025

Pr0methean Feb 25, 2025 •

edited

Loading

Pr0methean Mar 17, 2025 •

edited

Loading

Pr0methean left a comment •

edited

Loading