Skip to content

consensus/ethash: avoid runtime errors due to OOD on mmap writes#23799

Merged
holiman merged 3 commits intoethereum:masterfrom
holiman:ethash_fixmem
Nov 2, 2021
Merged

consensus/ethash: avoid runtime errors due to OOD on mmap writes#23799
holiman merged 3 commits intoethereum:masterfrom
holiman:ethash_fixmem

Conversation

@holiman
Copy link
Copy Markdown
Contributor

@holiman holiman commented Oct 24, 2021

This PR avoids a pretty rare crash.

Some more info here: edsrzf/mmap-go#12 (comment) .

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

Examples:

There are two ways we can fix this, the simplest being to use the fallocate linux system call.
The second way is to convert the internals of ethash; instead of generating into a []byte, we generate into an interface which supports WriteAt. Then we can first generate into a file (or into a wrapped memory slice), and later do a readonly-mmap on it.

This PR implements the first approach. I tried the second approach, but it becomes a pretty large refactor, especially since it also touches the cache generator (which uses the same schema), and the cache-generator is not as 'linear' -- it reads from self in a way which the DAG-generator does not, and thus is more difficult to convert this way. In the end, I thought it seems a bit wasteful to do a major refactoring of ethash at this point.

This PR also fixes an over-allocation that could happen if the mmap fails, and we generate the dag in-memory, and does some more cleanup in case of failures.

Re testing

I found this error when I tried running go test . -run - -bench BenchmarkHashimotoFullMmap/WithLock -v. It generates the dag to the system temp, which in my case was limited to 1Gb. Without this fix, the benchmark crashed, but it runs fine with this fix.

Re platforms

I got the supported-platform info from this

[user@work syscall]$ grep Fallocate * 
grep: js: Is a directory
syscall_linux.go://sys	Fallocate(fd int, mode uint32, off int64, len int64) (err error)
zsyscall_linux_386.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_amd64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_arm.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_arm64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mips64le.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_mipsle.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_ppc64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_ppc64le.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_riscv64.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {
zsyscall_linux_s390x.go:func Fallocate(fd int, mode uint32, off int64, len int64) (err error) {

Comment thread consensus/ethash/mmap_help_linux.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://pkg.go.dev/syscall they say: Deprecated: this package is locked down. Callers should use the corresponding package in the golang.org/x/sys repository instead. That is also where updates required by new systems or versions should be applied. See https://golang.org/s/go1.4-syscall for more information.

Comment thread consensus/ethash/mmap_help_linux.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to specify all the possible $GOARCH in there? A simple

Suggested change
//go:build linux && (386 || amd64 || arm || arm64 || mips || mips64 || mips64le || ppc || pcc64 || pcc64le || riscv64 || s390x || sparc64)
//go:build linux

should work, I think, as I see no reason for fallocate not to be present for a given architecture.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, indeed, it is present in all files. I was confused because they didn't put it in in syscall_linux.go, but instead added it individually in all files.

@holiman
Copy link
Copy Markdown
Contributor Author

holiman commented Oct 26, 2021

Fixed, ptal

@holiman
Copy link
Copy Markdown
Contributor Author

holiman commented Oct 27, 2021

@gballet lgty?

logger.Error("Failed to generate mapped ethash dataset", "err", err)

d.dataset = make([]uint32, dsize/2)
d.dataset = make([]uint32, dsize/4)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: check line 330, and you'll see that this is the same as what's used already whenever we don't use disk-based dag.

@fjl fjl removed the status:triage label Nov 2, 2021
@holiman holiman added this to the 1.10.12 milestone Nov 2, 2021
@holiman holiman merged commit 178debe into ethereum:master Nov 2, 2021
sidhujag pushed a commit to syscoin/go-ethereum that referenced this pull request Nov 2, 2021
…ereum#23799)

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. 


Co-authored-by: Felix Lange <fjl@twurst.com>
@holiman holiman deleted the ethash_fixmem branch November 10, 2021 18:32
yongjun925 pushed a commit to DODOEX/go-ethereum that referenced this pull request Dec 3, 2022
…ereum#23799)

When we map a file for generating the DAG, we do a simple truncate to e.g. 1Gb. This is fine, even if we have nowhere near 1Gb disk available, as the actual file doesn't take up the full 1Gb, merely a few bytes. When we start generating into it, however, it eventually crashes with a unexpected fault address .

This change fixes it (on linux systems) by using the Fallocate syscall, which preallocates suffcient space on disk to avoid that situation. 


Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants