Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overreading with v1.3.1 #108

Closed
klauspost opened this issue Sep 12, 2022 · 10 comments · Fixed by #109
Closed

Overreading with v1.3.1 #108

klauspost opened this issue Sep 12, 2022 · 10 comments · Fixed by #109

Comments

@klauspost
Copy link
Contributor

It seems you are still over-reading with the latest release. Here is a reproducer test:

func TestWriteTo(t *testing.T) {
	const length = 958506
	addBuf := []byte(`1234`)
	bs := New(length)
	var buf bytes.Buffer
	_, err := bs.WriteTo(&buf)
	if err != nil {
		t.Fatal(err)
	}
	buf.Write(addBuf)
	
	// Generate test input for regression tests:
	if false {
		gzout := bytes.NewBuffer(nil)
		gz, err := gzip.NewWriterLevel(gzout, 9)
		if err != nil {
			t.Fatal(err)
		}
		gz.Write(buf.Bytes())
		gz.Close()
		t.Log("Encoded:", base64.StdEncoding.EncodeToString(gzout.Bytes()))
	}
	bs = New(length)
	_, err = bs.ReadFrom(&buf)
	if err != nil {
		t.Fatal(err)
	}
	more, err := io.ReadAll(&buf)
	if err != nil {
		t.Fatal(err)
	}
	if !bytes.Equal(more, addBuf) {
		t.Fatalf("extra mismatch. got %v, want %v", more, addBuf)
	}
}

Works with v1.2.0, but outputs:

=== RUN   TestWriteTo
    bitset_test.go:1684: extra mismatch. got [], want [49 50 51 52]
--- FAIL: TestWriteTo (0.00s)

You can extend it for more sizes.

@klauspost
Copy link
Contributor Author

klauspost commented Sep 12, 2022

You should add regression tests. Here is a template:


func TestReadFrom(t *testing.T) {
	addBuf := []byte(`1234`)
	tests := []struct {
		length uint
		input  string // base64+gzipped
	}{{
		length: 958506,
		input:  "H4sIAAAAAAAC/+zAORUAIAwFsK8AIWwcgpHKq4lOScp4MwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATdY+9wcAAP//b+hYaRTUAQA=",
	}}

	for _, test := range tests {
		var buf bytes.Buffer
		b, err := base64.StdEncoding.DecodeString(test.input)
		if err != nil {
			t.Fatal(err)
		}
		gz, err := gzip.NewReader(bytes.NewBuffer(b))
		if err != nil {
			t.Fatal(err)
		}
		io.Copy(&buf, gz)
		gz.Close()

		bs := New(test.length)
		_, err = bs.ReadFrom(&buf)
		if err != nil {
			t.Fatal(err)
		}
		more, err := io.ReadAll(&buf)
		if err != nil {
			t.Fatal(err)
		}
		if !bytes.Equal(more, addBuf) {
			t.Fatalf("extra mismatch. got %v, want %v", more, addBuf)
		}
	}
}

You can also add some fixed content to the bf itself.

@klauspost
Copy link
Contributor Author

@thanhpk

@lemire
Copy link
Member

lemire commented Sep 12, 2022

@klauspost Would you kindly produce a pull request ?

@klauspost
Copy link
Contributor Author

@lemire With the tests?

@lemire
Copy link
Member

lemire commented Sep 12, 2022

You have identified a bug in the ReadFrom function, I invite you to produce a patch and to submit it as a pull request.

@lemire
Copy link
Member

lemire commented Sep 12, 2022

Here is the function, can you point at the bug please?

// ReadFrom reads a BitSet from a stream written using WriteTo
func (b *BitSet) ReadFrom(stream io.Reader) (int64, error) {
	var length uint64

	// Read length first
	err := binary.Read(stream, binaryOrder, &length)
	if err != nil {
		return 0, err
	}
	newset := New(uint(length))

	if uint64(newset.length) != length {
		return 0, errors.New("unmarshalling error: type mismatch")
	}

	// Read remaining bytes as set
	// current implementation bufio.Reader is more memory efficient than
	// binary.Read for large set
	reader := bufio.NewReader(stream)
	var item = make([]byte, binary.Size(uint64(0))) // one uint64
	nWords := uint64(wordsNeeded(uint(length)))
	for i := uint64(0); i < nWords; i++ {
		if _, err := reader.Read(item); err != nil {
			return 0, err
		}
		newset.set[i] = binaryOrder.Uint64(item)
	}

	*b = *newset
	return int64(b.BinaryStorageSize()), nil
}

@klauspost
Copy link
Contributor Author

Bug is using bufio.NewReader, which reads ahead into buffer. I will submit a PR.

@lemire
Copy link
Member

lemire commented Sep 12, 2022

Thanks.

klauspost added a commit to klauspost/bitset-1 that referenced this issue Sep 12, 2022
`bufio.NewReader` usually reads ahead. Limit the input to the expected size.

Return `io.ErrUnexpectedEOF` when stream is unexpectedly truncated.

Add regression tests (and apparently some Go 1.19 formatting)

Fixes bits-and-blooms#108
@klauspost
Copy link
Contributor Author

Global Endianness switch. oh, wow.

@klauspost
Copy link
Contributor Author

Anyway PR sent in #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants