consensus/ethash: avoid runtime errors due to OOD on mmap writes#23799
consensus/ethash: avoid runtime errors due to OOD on mmap writes#23799holiman merged 3 commits intoethereum:masterfrom
Conversation
16cd366 to
7b04f14
Compare
There was a problem hiding this comment.
In https://pkg.go.dev/syscall they say: Deprecated: this package is locked down. Callers should use the corresponding package in the golang.org/x/sys repository instead. That is also where updates required by new systems or versions should be applied. See https://golang.org/s/go1.4-syscall for more information.
There was a problem hiding this comment.
do you need to specify all the possible $GOARCH in there? A simple
| //go:build linux && (386 || amd64 || arm || arm64 || mips || mips64 || mips64le || ppc || pcc64 || pcc64le || riscv64 || s390x || sparc64) | |
| //go:build linux |
should work, I think, as I see no reason for fallocate not to be present for a given architecture.
There was a problem hiding this comment.
Hm, indeed, it is present in all files. I was confused because they didn't put it in in syscall_linux.go, but instead added it individually in all files.
7b04f14 to
7ecade8
Compare
|
Fixed, ptal |
|
@gballet lgty? |
| logger.Error("Failed to generate mapped ethash dataset", "err", err) | ||
|
|
||
| d.dataset = make([]uint32, dsize/2) | ||
| d.dataset = make([]uint32, dsize/4) |
There was a problem hiding this comment.
Note for reviewers: check line 330, and you'll see that this is the same as what's used already whenever we don't use disk-based dag.
…ereum#23799) When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address . This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. Co-authored-by: Felix Lange <fjl@twurst.com>
…ereum#23799) When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address . This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. Co-authored-by: Felix Lange <fjl@twurst.com>
This PR avoids a pretty rare crash.
Some more info here: edsrzf/mmap-go#12 (comment) .
When we map a file for generating the DAG, we do a simple truncate to e.g.
1Gb. This is fine, even if we have nowhere near1Gbdisk available, as the actual file doesn't take up the full1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with aunexpected fault address.Examples:
There are two ways we can fix this, the simplest being to use the
fallocatelinux system call.The second way is to convert the internals of ethash; instead of generating into a
[]byte, we generate into an interface which supportsWriteAt. Then we can first generate into a file (or into a wrapped memory slice), and later do a readonly-mmap on it.This PR implements the first approach. I tried the second approach, but it becomes a pretty large refactor, especially since it also touches the cache generator (which uses the same schema), and the cache-generator is not as 'linear' -- it reads from self in a way which the DAG-generator does not, and thus is more difficult to convert this way. In the end, I thought it seems a bit wasteful to do a major refactoring of ethash at this point.
This PR also fixes an over-allocation that could happen if the mmap fails, and we generate the dag in-memory, and does some more cleanup in case of failures.
Re testing
I found this error when I tried running
go test . -run - -bench BenchmarkHashimotoFullMmap/WithLock -v. It generates the dag to the system temp, which in my case was limited to1Gb. Without this fix, the benchmark crashed, but it runs fine with this fix.Re platforms
I got the supported-platform info from this