Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: compress/gzip: add distinct error for trailing garbage data #67986

Closed
rgmz opened this issue Jun 14, 2024 · 3 comments
Closed

proposal: compress/gzip: add distinct error for trailing garbage data #67986

rgmz opened this issue Jun 14, 2024 · 3 comments

Comments

@rgmz
Copy link

rgmz commented Jun 14, 2024

Proposal Details

Context

When using the compress/gzip package to decompress gzip files, receiving a gzip: invalid header error can indicate two distinct possibilities.

  1. The file is not a valid gzip file.

    Example test case (click to expand)
    import (
        "compress/gzip"
        "io"
        "os"
        "testing"
    )
    
    func TestNotAValidGzipFile(t *testing.T) {
        f, err := os.Open("/tmp/python-3.12.3-amd64.exe") // Arbitrary example, any non-gzip file will suffice.
        if err != nil {
    	    t.Fatal(err)
        }
        defer f.Close()
    
        rc, err := gzip.NewReader(f)
        if err != nil {
    	    t.Fatal(err)
        }
        defer rc.Close()
    
        _, err = io.ReadAll(rc)
        if err != nil {
    	    t.Fatal(err)
        }
    }
    
    // === RUN   TestNotAValidGzipFile
    //  gzip_test.go:19: gzip: invalid header
  2. The file is a valid gzip file, but contains invalid trailing data.

    Example test case (click to expand)
    func TestValidGzipFileWithTrailingData(t *testing.T) {
      // Reproducer file. There are many examples of this.
      // https://github.com/udacity/self-driving-car-sim/blob/4b1f739ebda9ed4920fe895ee3677bd4ccb79218/Assets/Standard%20Assets/Environment/SpeedTree/Conifer/Conifer_Desktop.spm
      f, err := os.Open("/tmp/Conifer_Desktop.spm")
      if err != nil {
        t.Fatal(err)
      }
      defer f.Close()
      
      rc, err := gzip.NewReader(f)
      if err != nil {
        t.Fatal(err)
      }
      defer rc.Close()
      
      _, err = io.ReadAll(rc)
      if err != nil {
        t.Fatal(err)
      }
    }
    
    // === RUN   TestValidGzipFileWithTrailingData
    //  gzip_test.go:19: gzip: invalid header

Scenario 2 can be especially confusing because Go's implementation of compress/gzip rejects invalid trailing data, while many popular applications and languages do not. Hence, the ambiguity of this error can lead people to believe that Go is rejecting a valid gzip file.

$ file -i Conifer_Desktop.spm
Conifer_Desktop.spm: application/gzip; charset=binary
$ gzip -S .spm -d /tmp/Conifer_Desktop.spm
gzip: stdin: decompression OK, trailing garbage ignored. 

Side-note: #47809 (comment) explains why this choice was made and how to handle invalid trailing data using gzip.Reader.Multistream.

Proposal Details

I propose adding a distinct error for when subsequent inputs in a data stream don't have valid headers (i.e., trailing garbage). This would provide clarity for users who many not realize that they need to disable gzip.Reader.Multistream. Additionally, it could be used to programmatically call Multistream(false) for specific files with trailing garbage.

I will leave the specific error message and behave for future discussion, should there be interest in implementing this proposal.


Caveat: I am not familiar with the implementation of gzip.Reader. This change may not be possible or desirable due to technical limitations.

@rgmz rgmz added the Proposal label Jun 14, 2024
@gabyhelp
Copy link

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@rgmz
Copy link
Author

rgmz commented Jun 14, 2024

Similar Issues

https://github.com/golang/go/issues/47809
https://github.com/golang/go/issues/61797

Drats, I spent quite a bit of time searching existing issues. Not sure how I missed the second one.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2024

Duplicate of #61797.

@rsc rsc closed this as completed Jun 28, 2024
@rsc rsc added the gabywins label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests

4 participants