-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic out of range in bytes queue #148
Comments
Hi, can you add a bit more details, please? What is the problem? |
@cristaloleg, I can dig a little deeper into the bytes_queue to figure out what might be the right fix, but the underlying issue is that Line 223 in bbf64ae
results in a panic because index+headerEntrySize+blockSize >=len(q.array) .
There is a bounds check on L218 here: Line 218 in bbf64ae
but there is no bounds check for adding the blockSize to that. Does that make sense? |
Oh, finally I got it. I thought there is a problem on the line 222 as in the original message. That's funny that there were no(???) panic before. I think somewhere in late 2017 I've checked this code and everything was fine. I might be wrong 😉 |
@cristaloleg it might be an uncommon edge case, or a misuse of the API somehow, but I did see this panic in logs using this code. If I have time, I can try to find a small reproducible test, it but it might be tough without enough knowledge about this code. I do experience this panic without changing this library, so some permutation of the usage of this API will cause the panic |
@cristaloleg I obtain this panic while using an iterator to check all entries whose name match a given pattern and deleting them, while other processes are adding entries. It is also possible that two of these iterators work concurrently, but they cannot try to delete the same entries. Is this a supported pattern? |
@olivierperes Can you provide code to demonstrate this behavior? |
This is the function that deletes entries. Unfortunately I cannot provide a unit test. My program can run for about 12 hours before the crash occurs. // cleanCache deletes from the cache all elements whose name has the given prefix.
func cleanCache(prefix string) error {
iterator := cache.Iterator()
for iterator.SetNext() {
entry, err := iterator.Value()
if err != nil {
return err
}
k := entry.Key()
if strings.HasPrefix(k, prefix) {
err := cache.Delete(k)
if err != nil && err != bigcache.ErrEntryNotFound {
return err
}
}
}
return nil
} |
Lines 23 to 27 in c00e2be
So here Line 81 in e24eb22
we may allocate less memory than we need I think the bug may be here Lines 134 to 136 in e24eb22
We do not check if all data were succesfuly copied |
Here is small visualization https://storytime.dev/preview/5d8b85caddbb4e10b4566fc9 |
Hello @janisz. Just to be clear, is someone working on this or are you expecting a PR? In principle I would volunteer but in this case, it seems to require a very good knowledge of the code. |
I tired to reproduce this issue with approach I presented above but I can't do it. It's hard for me to figure out the order for inseritions/deletions to get invalid state. |
How about using https://github.com/dvyukov/go-fuzz to reproduce this bug. I came up with following piece of code. I have no experience in fuzzing so it might be totally wrong but at least we can iterate on this. We know bug is in Queue and my approach is to use random data to perform random operations on Queue (Push with random length and Pop). package fuzz
import (
"bytes"
"encoding/binary"
"io"
. "github.com/allegro/bigcache/queue"
)
func Fuzz(data []byte) int {
queue := NewBytesQueue(0, 4 * 1024 * 1024 * 1024, false)
r := bytes.NewReader(data)
for {
push, n, err := nextMove(r)
if err == io.EOF {
return 1
}
if err != nil {
panic(err)
}
if push {
queue.Push(blob('a', int(n)))
} else {
queue.Pop()
}
}
return 1
}
func nextMove(r io.Reader) (push bool, n int16, err error) {
boolean := make([]byte, 1)
_, err = r.Read(boolean)
if err != nil {
return false, n, err
}
if boolean[0] == 0 {
return false, n, nil
}
integer := make([]byte, 16)
_, err = r.Read(integer)
if err != nil {
return false, n, err
}
err = binary.Read(bytes.NewReader(integer), binary.BigEndian, &n)
return true, n, err
}
func blob(char byte, len int) []byte {
b := make([]byte, len)
for index := range b {
b[index] = char
}
return b
} |
This is similar to: https://github.com/janisz/bigcache/commit/e73c96d8b9c30e2f5247ce96521edbd62b5d096b/checks
|
I'm also not an expert, but you can prolly come up with a test case like you have already and then just loop over it N times with a fuzzer providing some data and then when it crashes, we'll see what happens. /shrug |
@@ -46,6 +48,10 @@ func readTimestampFromEntry(data []byte) uint64 {
func readKeyFromEntry(data []byte) string {
length := binary.LittleEndian.Uint16(data[timestampSizeInBytes+hashSizeInBytes:])
+ if headersSizeInBytes+int(length) > len(data) {
+ panic(fmt.Sprintf("\n%s\n", hex.Dump(data)))
+ }
+
// copy on read
dst := make([]byte, length)
copy(dst, data[headersSizeInBytes:headersSizeInBytes+length])
It looks like two entries can override each other.
|
Race condition or bad locking? |
I think it could be a bad locking |
I saw this before the PR, and I would agree it's likely bad locking. Ideally, bigcache should deterministically lock on any operation, but I'm not really sure if that's reasonable. |
This just hit me
On |
I prepared a test for this issue – unfortunately it's far from minimal but still reproducible example func Test_issue_148(t *testing.T) {
const n = 9000000
var message = blob('a', 512)
cache, _ := NewBigCache(Config{
Shards: 1,
LifeWindow: time.Hour,
MaxEntriesInWindow: 10,
MaxEntrySize: len(message) + 1,
HardMaxCacheSize: n >> 11,
})
for i := 0; i < n; i++ {
err := cache.Set(strconv.Itoa(i), message)
if err != nil {
t.Fatal(err)
}
}
wg := sync.WaitGroup{}
wg.Add(n)
for i := 0; i < n; i++ {
go func() {
cache.Get(strconv.Itoa(rand.Intn(n)))
wg.Done()
}()
}
wg.Wait()
} |
The fact this feels like a transient locking issue is strange. I might try to take some time this week to work on it since I have some cycles. |
I found 3 problems:
|
@janisz what is the status of the issue? |
We still need to fix it |
I think #232 tentatively fixes this. I want to keep this open for a bit to see if we're still running across it. @olivierperes @codyohlsen can you reproduce the issue on the latest commit? |
I changed job and no longer have access to the code that used BigCache, so I won’t be able to test. |
I think I found the problem. It's here Line 138 in 21e5ca5
On big queues we may overflow uint32 so for e.g. if we stored value under I'll try to provide a patch for this |
There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: allegro#148
There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: allegro#148
Wow! Looks like it, nice find.
…On Fri, Aug 21, 2020, 11:36 AM janisz ***@***.***> wrote:
I think I found the problem. It's here
https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138
On big queues we may overflow uint32 so for e.g. if we stored value under
4294968891 but in hashmap it's 1595 so when we try to read it we hit in
the middle of random entry that is not int and result in panic.
I'll try to provide a patch for this
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACHB3QBU2HDUWCMYOOTM2FTSB24W5ANCNFSM4H7RSVZQ>
.
|
There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: allegro#148
There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: allegro#148
There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: allegro#148
* Fix iterator styling issues (#247) Issues reported in #246 * fix #160 (#246) * Update to latest golang (#248) Co-authored-by: Oleg Kovalov <[email protected]> * inital prep for v3. Signed-off-by: Mike Lloyd <[email protected]> * Use uint64 intead of uint32 There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: #148 * Fix CI * Do not run on 1.13 * Do not run long test * Optimze append (#249) * Add Benchmark for append * Optimize Append and halve byte copies * Optimize Append by reducing allocs * Optimize Append by reducing allocs * Reduces allocs from test construct Co-authored-by: Fabian Gärtner <[email protected]> Co-authored-by: S@P <[email protected]> Co-authored-by: Oleg Kovalov <[email protected]> Co-authored-by: Mike Lloyd <[email protected]> Co-authored-by: Fabianexe <[email protected]> Co-authored-by: Fabian Gärtner <[email protected]>
* Fix iterator styling issues (#247) Issues reported in #246 * fix #160 (#246) * Update to latest golang (#248) Co-authored-by: Oleg Kovalov <[email protected]> * inital prep for v3. Signed-off-by: Mike Lloyd <[email protected]> * Use uint64 intead of uint32 There are posibility we run into a problem of int32 overflow. To prevent this let's use uint64 everywhere. https://github.com/allegro/bigcache/blob/21e5ca5c3d539f94e8dc563350acd97c5400154f/shard.go#L138 Fixes: #148 * Fix CI * Do not run on 1.13 * Do not run long test * Optimze append (#249) * Add Benchmark for append * Optimize Append and halve byte copies * Optimize Append by reducing allocs * Optimize Append by reducing allocs * Reduces allocs from test construct Co-authored-by: Fabian Gärtner <[email protected]> Co-authored-by: S@P <[email protected]> Co-authored-by: Oleg Kovalov <[email protected]> Co-authored-by: Mike Lloyd <[email protected]> Co-authored-by: Fabianexe <[email protected]> Co-authored-by: Fabian Gärtner <[email protected]>
@codyohl @cristaloleg can we close it. It was fixed by #236 |
@janisz Yes, i don't observe this anymore. I'm all good with closing! |
bigcache/queue/bytes_queue.go
Line 222 in bbf64ae
It appears there is a bounds check before the blockSize, but then there is an assumption that adding block size is not out of bounds.
The text was updated successfully, but these errors were encountered: