Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map delete not finding keys when value is largeish, key hash seems to be incorrect #3358

Closed
anuraaga opened this issue Dec 28, 2022 · 34 comments
Assignees
Labels

Comments

@anuraaga
Copy link
Contributor

anuraaga commented Dec 28, 2022

I am currently debugging this code that tries to delete the keys in a map (to empty it while retaining bucket storage)

https://github.com/corazawaf/coraza/blob/v3/dev/internal/corazawaf/rulegroup.go#L105

(for the below, note that I tried copying the keys into a slice before deleting to see if it was iterator+delete behavior problem but it was all the same so simplified back to normal map iteration).

I noticed a memory leak and after adding if len(transformationCache) > 0 { panic } after the deletion could verify that the map wasn't getting cleared with tinygo wasm. It works fine with Go.

I have added this instrumentation to hashmap.go

https://github.com/anuraaga/tinygo/compare/custom-gc...anuraaga:tinygo:hashmapdelete-debug?expand=1

And shrunk down the test to similar one as last time

https://github.com/corazawaf/coraza-proxy-wasm/compare/main...anuraaga:coraza-1225?expand=1#diff-173fbfd8d8844658344b121461b4290d0a85230caae9825240705df8130e8b75R33
~/go/bin/tinygo build -scheduler=none -target=wasi -opt=2 -o mapbench.wasm

The debug output looks something like this

hashmapBinaryGet 0x0001019c 0x0000ffd0 16 1666840535
could find key
hashmapBinaryDelete 0x0001019c 0x0000ffb8 16 1666840535
delete could find key 0x0001019c 0x0000ffb8 1666840535
hashmapBinaryGet 0x0001019c 0x0000ffd0 16 1666840535
couldn't find key

The key address for get is 0x0000ffd0 and delete is 0x0000ffb8. That being said the hash is the same in this example so it's being able to clear the map but with the same instrumentation when looking at the output for the original coraza code, the hash values were also different. I'm not sure why I wasn't able to reproduce this hash value difference, but either way, the key is in a local variable k, which there is only one of, so intuitively I suppose those addresses must be the same and the difference is unexpected.

One weird thing I found while making the repro is it seems the value struct needs to be more than 2 strings worth of size to exhibit the behavior - with three fields, get and delete have different addresses, while with two fields they are the same.

Looked through the code in hashmap.go and map.go (compiler) and couldn't see anything suspicious, the code paths for get/lookup vs delete look basically the same for both, but the difference does cause our real-world code to fail with the map not being cleared. With the code looking the same, the issue may be in IR reduction?

Note the above simpleish test case approach is also applied to the real world code here (which is where I was observing the address+key value difference)

https://github.com/corazawaf/coraza/compare/v3/dev...anuraaga:scratch?expand=1

The output looks like this (we can see the different hash values)

[2022-12-28 04:04:56.210312][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: hashmapBinaryGet 0x09887b70 0x0000fb98 16 2748313615
[2022-12-28 04:04:56.210312][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: could find key
[2022-12-28 04:04:56.210312][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: hashmapBinaryDelete 0x09887b70 0x0000fbf8 16 3003228291
[2022-12-28 04:04:56.210313][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: delete couldn't find key 0x09887b70
[2022-12-28 04:04:56.210313][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: hashmapBinaryGet 0x09887b70 0x0000fb98 16 2748313615
[2022-12-28 04:04:56.210314][25][info][wasm] [source/extensions/common/wasm/context.cc:1170] wasm log coraza-filter my_vm_id: could find key
@anuraaga
Copy link
Contributor Author

Digging through LLVM docs, I did come to understand that because get and delete create a temporarly alloca, it is conceivable for those to be different and then different addresses (though that would be an unfortunate deoptimization if copying the key struct from execution stack to c stack twice). I guess there is something about the real-world code that is causing the contents of the allocas to be different, which I still wasn't able to repro in the simpler example...

@anuraaga
Copy link
Contributor Author

anuraaga commented Dec 28, 2022

For context, changing the map value type to pointer solved the issue, though is not ideal since we didn't want to heap allocate for this cache

corazawaf/coraza#554

This is because it seems like the problematic behavior only happens with larger value size. Or perhaps just coincidence / luck

@anuraaga anuraaga changed the title map delete not finding keys when value is largeish, key address seems to be incorrect map delete not finding keys when value is largeish, key hash seems to be incorrect Dec 28, 2022
@dgryski
Copy link
Member

dgryski commented Dec 28, 2022

I've done a lot of work on the hashing code and the hash maps. I'll take a look at this.

@dgryski dgryski self-assigned this Dec 28, 2022
@dgryski
Copy link
Member

dgryski commented Jan 3, 2023

Hmm... can you come up with a reproducer based on the mapbench code that doesn't require patching the runtime to demonstrate it? (I've patched my runtime but it's much easier if we have something that can if if ... { panic }

@anuraaga
Copy link
Contributor Author

anuraaga commented Jan 5, 2023

@dgryski Do you need a small amount of code or a small command that reliably reproduces? I can send the latter easily based on coraza-proxy-wasm if it will help, if the former I'll try some more and see what I can do

@dgryski
Copy link
Member

dgryski commented Jan 5, 2023

Both are nice. The small amount of code is better but the small command is useful until that arrives.

@anuraaga
Copy link
Contributor Author

anuraaga commented Jan 5, 2023

Thanks @dgryski.

You'll need your TinyGo built with #3302 and docker available. ftw will fail after a few requests

git clone https://github.com/anuraaga/coraza-proxy-wasm.git --branch repro-mapdelete
cd coraza-proxy-wasm
LOCAL_TINYGO=1 go run mage.go build
go run mage.go ftw
tail -n 20 build/ftw-envoy.log

@dgryski
Copy link
Member

dgryski commented Jan 10, 2023

Smaller reproducer:

~/go/src/github.com/dgryski/bug $ cat bug.go
package main

type V struct {
	buf [256]byte
	s1  string
	s2  string
	s3  string
}

func main() {
	m := make(map[int]V)

	m[0] = V{s1: "hello", s2: ",", s3: "world"}

	println(m[0].s1)
	delete(m, 0)
	println(m[0].s1)
}
~/go/src/github.com/dgryski/bug $ go run bug.go
hello

~/go/src/github.com/dgryski/bug $ tinygo run bug.go
hello
hello

Note that if the buf struct is after the strings, then the bug is not present.

@dgryski
Copy link
Member

dgryski commented Jan 10, 2023

Oh duh. The hashmap code limits value size to uint8 for some reason: https://github.com/tinygo-org/tinygo/blob/release/src/runtime/hashmap.go#L19

That's an easy enough fix.

@dgryski
Copy link
Member

dgryski commented Jan 10, 2023

#3377

@dgryski
Copy link
Member

dgryski commented Jan 11, 2023

Annoyingly while this is a bug I'm not certain this actually related to the issue at hand, given that the values in the test program are much smaller.

@dgryski
Copy link
Member

dgryski commented Jan 11, 2023

I wonder if this is the same memory corruption from the garbage collector missing pointers that we've been seeing elsewhere?

@dgryski
Copy link
Member

dgryski commented Jan 11, 2023

I'm having real trouble reproducing this on stock tinygo and wasmtime. :(

@aykevl
Copy link
Member

aykevl commented Jan 11, 2023

@anuraaga it would be very helpful if you could reproduce this with stock TinyGo. If you can't, I have to assume it's related to the custom GC somehow.

Some things that could help:

  • use -tags=runtime_asserts to do a bit more checking
  • use 8d929ad (this should hopefully be merged soon)
  • try adding runGC() inside runtime.alloc to make the chance of hitting this bug higher (this is very slow), something like this:
        gcTotalAlloc += uint64(size)
        gcMallocs++
 
+       runGC()
+
        neededBlocks := (size + (bytesPerBlock - 1)) / bytesPerBlock
 
        // Continue looping until a run of free blocks has been found that fits the

@anuraaga
Copy link
Contributor Author

I'll see if I can figure something out. One thing, from the original message

The key address for get is 0x0000ffd0 and delete is 0x0000ffb8. That being said the hash is the same in this example so it's being able to clear the map

Is this expected? This is with TinyGo dev, no custom GC - while the map is getting cleared in this case still, the addresses being different seems unexpected to me.

@dgryski
Copy link
Member

dgryski commented Jan 12, 2023

Yes. because the unsafe.Pointer is a pointer to the value struct of the key, and that particular value struct is copied from the range iterator. It's not unexpected to me that those two keys have different pointer values.

@anuraaga
Copy link
Contributor Author

anuraaga commented Jan 12, 2023

Thanks - I guess each operation is copying the key to the stack and its unfortunate those don't get coallesced but agree that's probably just a lack of optimization and not likely a bug (but still not 100% sure on that).

Anyways, while it's still not a small code example, I updated the same branch as #3358 (comment) to unplug the custom GC and malloc, now it'll fail with just stock dev TinyGo.

@aykevl
Copy link
Member

aykevl commented Jan 12, 2023

The key address for get is 0x0000ffd0 and delete is 0x0000ffb8. That being said the hash is the same in this example so it's being able to clear the map

Is this expected? This is with TinyGo dev, no custom GC - while the map is getting cleared in this case still, the addresses being different seems unexpected to me.

It's a stack address, so yeah there is a good chance it is different each time depending on what optimizations happen or don't happen. The address itself is not relevant to the hashmap though: it only looks at the value stored at the address.

@dgryski
Copy link
Member

dgryski commented Jan 12, 2023

@anuraaga Have you gotten the mapbench to fail on wasmtime or wazero rather than as part of the waf?

@anuraaga
Copy link
Contributor Author

@dgryski Unfortunately continuing tweaks to make mapbench look more like the WAF code has not repro'd the bug.

However, that reminds me that the WAF code itself can actually be run with wazero. So instead of go run mage.go ftw, you can just use go run mage.go test to get the failure with wazero - while wazero doesn't seem to be printing the panic string message, the stack trace is the transormation cache not empty line. I guess this shows it's not an Envoy-specific issue.

I know a smaller code example is important but not sure what else to try to get it to repro in mapbench.

I do still want to challenge the stack address observations once more. The code is this

for k := range m {
			if _, ok := m[k]; !ok {
				println("couldn't find key")
			} else {
				println("could find key")
			}
			delete(m, k)
			if _, ok := m[k]; !ok {
				println("couldn't find key")
			} else {
				println("could find key")
			}
		}

There is just one stack frame and one local variable k. Yes, it can result in a different stack address for those two allocas, but do we really expect it? It seems like a really simple optimization being missed if so and I can't shake the feeling that the fact that those addresses are different is related to the underlying bug. This wasn't happening with a smaller struct size, just a larger one, which would not seem to have a semantic impact on those allocas.

--- FAIL: TestLifecycle (0.41s)
    --- FAIL: TestLifecycle/wasm (0.41s)
        --- FAIL: TestLifecycle/wasm/request_body_accepted (0.02s)
panic: wasm error: unreachable
wasm stack trace:
	.runtime._panic(i32,i32)
		0x5a9a2: /Users/anuraag/git/tinygo/src/runtime/runtime_tinygowasm.go:72:6 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/panic.go:52:7 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/algorithm.go:363:16 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/algorithm.go:839:22 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:223:27 (inlined)
		         /Users/anuraag/git/tinygo/src/runtime/arch_tinygowasm_malloc.go:529:34
	.(*github.com/corazawaf/coraza/v3/internal/corazawaf.RuleGroup).Eval(i32,i32)
		0x1266de: /Users/anuraag/go/pkg/mod/github.com/anuraaga/coraza/[email protected]/internal/corazawaf/rulegroup.go:108:8 (inlined)
		          <Go interface method> (inlined)
		          /Users/anuraag/go/pkg/mod/github.com/petar-dambovaliev/[email protected]/ahocorasick.go:131:34 (inlined)
		          /Users/anuraag/go/pkg/mod/github.com/wasilibs/[email protected]/re2.go:681:13 (inlined)
		          /Users/anuraag/go/pkg/mod/github.com/wasilibs/[email protected]/re2.go:677:11 (inlined)
		          /Users/anuraag/go/pkg/mod/github.com/wasilibs/[email protected]/re2.go:646:17 (inlined)

@dgryski
Copy link
Member

dgryski commented Jan 13, 2023

I'll double check the code we're generating for the map operations.

@dgryski
Copy link
Member

dgryski commented Jan 13, 2023

When generating map operations, the LLVM code looks like

		// key can be compared with runtime.memequal
		// Store the key in an alloca, in the entry block to avoid dynamic stack
		// growth.
		mapKeyAlloca, mapKeyPtr, mapKeySize := b.createTemporaryAlloca(key.Type(), "hashmap.key")
		b.CreateStore(key, mapKeyAlloca)
		// Fetch the value from the hashmap.
		params := []llvm.Value{m, mapKeyPtr, mapValuePtr, mapValueSize}
		commaOkValue = b.createRuntimeCall("hashmapBinaryGet", params, "")
		b.emitLifetimeEnd(mapKeyPtr, mapKeySize)

So, because the get/delete/set operations take a pointer to the key, we create an alloca and store the key into it. If llvm doesn't otherwise coalesce that store with the fact that the key value is already a stack value, then that probably is an optimization opportunity for us. However, while the extra key copy is a performance hit I don't think it's the source of this bug (which I have yet to replicate...)

Edit: The punchline here of course is that the extra copy was the source of this bug...

@dgryski
Copy link
Member

dgryski commented Jan 13, 2023

Got tinygo runtime logging enabled with

diff --git a/proxywasm/proxytest/wasmwrapper.go b/proxywasm/proxytest/wasmwrapper.go
index 8a515cb..38e6ce5 100644
--- a/proxywasm/proxytest/wasmwrapper.go
+++ b/proxywasm/proxytest/wasmwrapper.go
@@ -17,6 +17,7 @@ package proxytest
 import (
        "context"
        "io"
+       "os"
        "reflect"
        "unsafe"

@@ -84,7 +85,8 @@ type vmContext struct {
 // Note: Currently only HTTP plugins are supported.
 func NewWasmVMContext(wasm []byte) (WasmVMContext, error) {
        ctx := context.Background()
-       r := wazero.NewRuntime(ctx)
+       wazeroconfig := wazero.NewModuleConfig().WithStdout(os.Stderr).WithStderr(os.Stderr)
+       r := wazero.NewRuntimeWithConfig(ctx, wazero.NewRuntimeConfig().WithDebugInfoEnabled(true))

        _, err := wasi_snapshot_preview1.Instantiate(ctx, r)
        if err != nil {
@@ -96,7 +98,12 @@ func NewWasmVMContext(wasm []byte) (WasmVMContext, error) {
                return nil, err
        }

-       mod, err := r.InstantiateModuleFromBinary(ctx, wasm)
+       compiled, err := r.CompileModule(ctx, wasm)
+       if err != nil {
+               return nil, err
+       }
+
+       mod, err := r.InstantiateModule(ctx, compiled, wazeroconfig)
        if err != nil {
                return nil, err
        }

and of course a go.mod line to point to my local copy

replace github.com/tetratelabs/proxy-wasm-go-sdk => ../proxy-wasm-go-sdk

@dgryski
Copy link
Member

dgryski commented Jan 13, 2023

The issue has to do with struct padding.

The fields for the key struct look like:

type transformationKey struct {
        argKey            uintptr
        argIndex          int
        argVariable       variables.RuleVariable
        transformationsID int
}

The argVariable member is a uint8, meaning there are padding bytes before the transformationsID.

Rearranging the fields like this passes:

type transformationKey struct {
        argKey            uintptr
        argIndex          int
        transformationsID int
        argVariable       variables.RuleVariable
}

Explicitly account for those bytes with _ [3]byte after the argVariable also solves it. Ok, time to dig deeper.

@aykevl
Copy link
Member

aykevl commented Jan 13, 2023

... oh no. This looks like a rather gnarly bug.

One fix would be to:

  1. store the zero value in the alloca, to make sure it's zeroed entirely
  2. individually store all fields

But that's probably not the most efficient way to do this.
There's a good chance there are other places with a similar bug.

@dgryski
Copy link
Member

dgryski commented Jan 14, 2023

"proof" via hexudmps:

hexdump of key in loop before first map lookup:
00000000:  00 00 00 00  00 00 00 00  25 00 00 00  01 00 00 00   |........%.......|

binary get  0x00178ea0 0x0000fd00 16 2046442687
hexdump of key in "get"
00000000:  00 00 00 00  00 00 00 00  25 9c 1c 00  01 00 00 00   |........%.......|

hexdump of key in loop before delete
00000000:  00 00 00 00  00 00 00 00  25 00 00 00  01 00 00 00   |........%.......|

binary del  0x00178ea0 0x0000fd60 16 45773010
hexdump of key in delete:
00000000:  00 00 00 00  00 00 00 00  25 00 00 00  01 00 00 00   |........%.......|

binary get  0x00178ea0 0x0000fd00 16 2046442687
hexdump of key in get after delete:
00000000:  00 00 00 00  00 00 00 00  25 9c 1c 00  01 00 00 00   |........%.......|

So, again looks like memory corruption so it's tricky to replicate exactly once I add the debug logging, but in any event we can see that the 3 extra bytes 9c 1c 00 are "invading" the struct and throwing off the hash calculation and the equality test.

@dgryski
Copy link
Member

dgryski commented Jan 14, 2023

It seems like there's going to be a bunch of places where we ask LLVM to "store" a value, and it copied over the bytes that are "live", but really we'd prefer to have just a straight memcpy.

@anuraaga
Copy link
Contributor Author

Thanks for the investigation, looks promising!

For reference found this issue in rust. Didn't read it all but the idea of synthetic padding fields generated by TinyGo for structs seems sort of ok

rust-lang/rust#70230

I don't know if an alloca store would ever be conceptually a memcpy - the fields of the struct could just be in individual slots on the execution stack in wasm, or registers in a cpu, and would be individually copied onto the stack I guess. I guess that means that synthetic padding fields have the downside of likely using more execution stack / register.

Otherwise probably need to memzero after each alloca.

@aykevl
Copy link
Member

aykevl commented Jan 15, 2023

Yeah, I think a memzero is probably the best thing to do here. LLVM should be able to optimize it away when it isn't necessary, I believe.

https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics

@dgryski
Copy link
Member

dgryski commented Jan 15, 2023

Do we need those memsets every time we do stores for structs with padding or just for map keys? Can we get away with just doing it the first time we create the value? Can we tag the hasher functions somehow so that llvm knows the padding needs to be cleared?

@anuraaga
Copy link
Contributor Author

anuraaga commented Jan 16, 2023

I suppose it is just about map keys / general hashing - in that case, I wonder if fixing #3354's optimization issue with the reflection-based key computation and always using it instead of the memory-based hash is an alternative solution. It would be similar to Go's hash which always iterates over the fields

https://github.com/golang/go/blob/c22a6c3b906cd37616d76da5f504c4c3e5677d94/src/runtime/alg.go#L179

Ideally the fields could be passed directly (like Go's structtype) instead of indirectly through reflection, but I guess that would be too big of a change.

@aykevl
Copy link
Member

aykevl commented Jan 16, 2023

So I looked at the LangRef and we may have a bit of a problem here:

The contents of memory are updated to contain <value> at the location specified by the <pointer> operand. If <value> is of scalar type then the number of bytes written does not exceed the minimum number of bytes needed to hold all bits of the type. For example, storing an i24 writes at most three bytes. When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten. If <value> is of aggregate type, padding is filled with undef. If <pointer> is not a well-defined value, the behavior is undefined.

(Emphasis mine). Structs are one of the aggregate types. What this says is that the padding bytes in a struct are undefined when stored. So even if the memory is cleared before the store, LLVM is allowed to fill the padding again with garbage (it probably won't, but it is allowed to).
One way around this is to split up the store manually in IR and explicitly zero the struct padding. This is a bit tricky to get right but certainly not impossible. It probably won't hurt optimizations significantly as LLVM itself also has a pass that splits up large stores in a similar way.

Do we need those memsets every time we do stores for structs with padding or just for map keys?

No, only when the alloca itself is used as raw memory (like when it is hashed etc). Normal loads/stores (like *ssa.Alloc) should be fine.

Can we tag the hasher functions somehow so that llvm knows the padding needs to be cleared?

I'm not aware of anything like that.

I wonder if fixing #3354's optimization issue with the reflection-based key computation and always using it instead of the memory-based hash is an alternative solution.

We could, but it is always going to be slower than hashing the memory directly. Remember, reflect is slow (especially in TinyGo).
It does give a quick fix though: either temporarily do all hashing of structs via reflect, or somehow detect padding bytes and use the reflect based hashing for structs with padding bytes. One relatively easy way to detect padding bytes is to sum up all the sizes of the struct fields (recursively) and compare it to the size LLVM calculates for it.

@dgryski
Copy link
Member

dgryski commented Jan 23, 2023

I'm going to start on a "zero-the-undef-bytes-for-hashing" patch and see what that looks like.

dgryski added a commit to dgryski/tinygo that referenced this issue Feb 10, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 10, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 16, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 17, 2023
dgryski added a commit to dgryski/tinygo that referenced this issue Feb 18, 2023
@dgryski
Copy link
Member

dgryski commented Mar 9, 2023

Fix for this is in dev.

@dgryski dgryski added the next-release Will be part of next release label Mar 9, 2023
conejoninja pushed a commit to conejoninja/tinygo that referenced this issue Mar 22, 2023
@aykevl aykevl closed this as completed in 4766217 Jun 10, 2023
@deadprogram deadprogram removed the next-release Will be part of next release label Jun 14, 2023
LiuJiazheng pushed a commit to LiuJiazheng/tinygo that referenced this issue Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants