Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lotus-miner / markets subsystem crash - keywords: 'panic: must not use a read-write blockstore after finalizing' #6968

Closed
4 tasks done
benjaminh83 opened this issue Aug 2, 2021 · 9 comments
Assignees
Labels
kind/bug Kind: Bug P1 P1: Must be resolved
Milestone

Comments

@benjaminh83
Copy link

Checklist

  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • I am reporting a bug w.r.t one of the M1 tags. If not, choose another issue option here.
  • I am not reporting a bug around deal making. If yes, create a M1 Bug Report For Deal Making.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.

Lotus component

lotus miner - mining and block production

Lotus Tag and Version

lotus version
Daemon:  1.11.1-rc1+mainnet+git.3be268182+api1.3.1
Local: lotus version 1.11.1-rc1+mainnet+git.3be268182

lotus-miner version
Daemon:  1.11.1-rc1+mainnet+git.3be268182+api1.2.0
Local: lotus-miner version 1.11.1-rc1+mainnet+git.3be268182

M1.3.3

Describe the Bug

Seems like I experienced some instability with M1.3.3
Basically the lotus-miner process (I only have one, as currently running monolith) crashed at 4.30AM this morning, only 1.5h from my first deadline.
It seems to have something to do with this, but it a pretty big problem that the failure leads to a total crash of the miner. I think this is the key event:

2021-08-02T04:24:53.835+0200	WARN	dt_graphsync	graphsync/graphsync.go:209	graphsync error: Request Failed - Unknown Reason
panic: must not use a read-write blockstore after finalizing

Logging Information

2021-08-02T04:24:38.920+0200	INFO	net/identify	identify/id.go:376	failed negotiate identify protocol with peer	{"peer": "12D3KooWPHrBvfpQrVziZVbWFi1FiZodTD7G5AvnvMD4omot8SV9", "error": "stream reset"}
2021-08-02T04:24:43.758+0200	INFO	net/identify	identify/id.go:376	failed negotiate identify protocol with peer	{"peer": "12D3KooWJE7d5eBmCa6vYUhC7GQSF6gw7F9A1SjNG2SydKN87Wgy", "error": "stream reset"}
2021-08-02T04:24:51.053+0200	INFO	net/identify	identify/id.go:376	failed negotiate identify protocol with peer	{"peer": "12D3KooWEpFvqf6pXcejvBryHJDQSkzWdPFVxz54rHqxiJnNerP9", "error": "stream reset"}
2021-08-02T04:24:53.835+0200	INFO	graphsync	requestmanager/requestmanager.go:420	graphsync request complete	{"request id": 33, "peer": "12D3KooWGBWx9gyUFTVQcKMTenQMSyE2ad9m7c9fpjS4NMjoDien", "total time": 2080.931689718}
2021-08-02T04:24:53.835+0200	WARN	dt_graphsync	graphsync/graphsync.go:209	graphsync error: Request Failed - Unknown Reason
panic: must not use a read-write blockstore after finalizing
goroutine 53395 [running]:
github.com/ipld/go-car/v2/blockstore.(*ReadWrite).panicIfFinalized(...)
	/home/acemgt/go/pkg/mod/github.com/ipld/go-car/[email protected]/blockstore/readwrite.go:289
github.com/ipld/go-car/v2/blockstore.(*ReadWrite).Get(0xc026e362c0, 0xc05132fbf0, 0x24, 0x18, 0xc035d23940, 0x563f66, 0xc0242fa8e0)
	/home/acemgt/go/pkg/mod/github.com/ipld/go-car/[email protected]/blockstore/readwrite.go:377 +0x9f
github.com/ipfs/go-graphsync/storeutil.LoaderForBlockstore.func1(0x384fb10, 0xc132fcba20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x384fb10, ...)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/storeutil/storeutil.go:22 +0x5e
github.com/ipfs/go-graphsync/requestmanager/asyncloader.setupAttemptQueue.func1(0xc000000021, 0x384fb10, 0xc132fcba20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:359 +0x19b
github.com/ipfs/go-graphsync/requestmanager/asyncloader/loadattemptqueue.(*LoadAttemptQueue).AttemptLoad(0xc0b2848c40, 0xc000000021, 0x384fb10, 0xc132fcba20, 0xc07f05e9c0, 0x1)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/loadattemptqueue/loadattemptqueue.go:49 +0x63
github.com/ipfs/go-graphsync/requestmanager/asyncloader/loadattemptqueue.(*LoadAttemptQueue).RetryLoads(0xc0b2848c40)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/loadattemptqueue/loadattemptqueue.go:83 +0xa8
github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*newResponsesAvailableMessage).handle(0xc0cc1dfc60, 0xc015386600)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:336 +0x396
github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*AsyncLoader).run(0xc015386600)
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:201 +0x44
created by github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*AsyncLoader).Startup
	/home/acemgt/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:68 +0x65

Repo Steps

Run M1.3.3 in a monolith setup.
Multiple incoming (more or less faulty deals from Estuary)

@benjaminh83
Copy link
Author

I have the full log file for the complete miner run - start to crash (if needed by the devs)

@dirkmc
Copy link
Contributor

dirkmc commented Aug 2, 2021

@benjaminh83 thanks for bug report. Please do post the full log file here.

@benjaminh83
Copy link
Author

miner-m1.3.3-log5.log

@jennijuju jennijuju added P1 P1: Must be resolved and removed need/triage labels Aug 2, 2021
@jennijuju jennijuju added this to the 1.11.1 milestone Aug 2, 2021
@raulk raulk changed the title Lotus-miner crash - keywords: 'panic: must not use a read-write blockstore after finalizing' lotus-miner / markets subsystem crash - keywords: 'panic: must not use a read-write blockstore after finalizing' Aug 2, 2021
@dirkmc
Copy link
Contributor

dirkmc commented Aug 2, 2021

I think what's happening is

  1. Client sends cancel message to provider
  2. Provider closes context on graphsync request to cancel it (but doesn't wait for request to finish being cancelled)
  3. Provider calls Finalize() on blockstore
  4. Somehow blockstore.Get() is called

Step 4 could be because eg a block has arrived while the cancel message is being processed, and graphsync does a read on the blockstore to check if the block is a duplicate.

Currently there's no way to know when the event queue has completely drained after cancelling the graphsync context. So at the moment we already have hacks in place to wait 100ms after cancelling the graphsync context when restarting a data transfer.

I suggest we add a Cancel method to the graphsync interface, that cancels the request and drains event queues etc before returning. That would allow us to take care of this panic and also get rid of these hacks: ipfs/go-graphsync#184

@Meatball13
Copy link

I had the same error message occur overnight on my split miner lotus-miner process (the Miner/Storage/Sealing miner stayed up and operational.)

2021-08-01T22:24:47.178-0400	WARN	dt_graphsync	graphsync/graphsync.go:209	graphsync error: Request Failed - Unknown Reason
2021-08-01T22:24:47.178-0400	INFO	graphsync	requestmanager/requestmanager.go:420	graphsync request complete	{"request id": 11, "peer": "12D3KooWGBWx9gyUFTVQcKMTenQMSyE2ad9m7c9fpjS4NMjoDien", "total time": 2149.050854644}
panic: must not use a read-write blockstore after finalizing

goroutine 711 [running]:
github.com/ipld/go-car/v2/blockstore.(*ReadWrite).panicIfFinalized(...)
	/home/tbtminer/go/pkg/mod/github.com/ipld/go-car/[email protected]/blockstore/readwrite.go:289
github.com/ipld/go-car/v2/blockstore.(*ReadWrite).Get(0xc02a1be8f0, 0xc0a04c2ff0, 0x24, 0x18, 0xc014387940, 0x563fa6, 0xc0383a8fa8)
	/home/tbtminer/go/pkg/mod/github.com/ipld/go-car/[email protected]/blockstore/readwrite.go:377 +0x9f
github.com/ipfs/go-graphsync/storeutil.LoaderForBlockstore.func1(0x3862b10, 0xc031649420, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3862b10, ...)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/storeutil/storeutil.go:22 +0x5e
github.com/ipfs/go-graphsync/requestmanager/asyncloader.setupAttemptQueue.func1(0xc00000000b, 0x3862b10, 0xc031649420, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:359 +0x19b
github.com/ipfs/go-graphsync/requestmanager/asyncloader/loadattemptqueue.(*LoadAttemptQueue).AttemptLoad(0xc01e1017c0, 0xc00000000b, 0x3862b10, 0xc031649420, 0xc029be76e0, 0x1)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/loadattemptqueue/loadattemptqueue.go:49 +0x63
github.com/ipfs/go-graphsync/requestmanager/asyncloader/loadattemptqueue.(*LoadAttemptQueue).RetryLoads(0xc01e1017c0)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/loadattemptqueue/loadattemptqueue.go:83 +0xa8
github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*newResponsesAvailableMessage).handle(0xc017927ce0, 0xc01410a780)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:336 +0x396
github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*AsyncLoader).run(0xc01410a780)
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:201 +0x44
created by github.com/ipfs/go-graphsync/requestmanager/asyncloader.(*AsyncLoader).Startup
	/home/tbtminer/go/pkg/mod/github.com/ipfs/[email protected]/requestmanager/asyncloader/asyncloader.go:68 +0x65

@Meatball13
Copy link

Meatball13 commented Aug 2, 2021

Think this is something deal related external to our miners. If you look at the time stamps, both Benjamin and I crashed at the same time. I'm EDT, so it occurred at 2:24 AM UTC, I think Benjamin is an hour or two ahead of UTC and his crashed at 4:24 Local Time. Also verified that TippyFlits crashed at 2:24 AM UTC.

@dirkmc
Copy link
Contributor

dirkmc commented Aug 2, 2021

@Meatball13 please provide your full logs, at least for a couple of hours before the crash

@Meatball13
Copy link

Unfortunately I was half asleep when I noticed it was down and restarted them without keeping the logs. Seems like Stuberman also had the same issue, at the same time, and it might have been related to a big batch of deals that came out to all of us at that time. Discussion ongoing here: https://filecoinproject.slack.com/archives/C029ETPJ6BB/p1627877131340100

@raulk
Copy link
Member

raulk commented Aug 9, 2021

Thanks for the report. Resolved in m1.3.4 after merging filecoin-project/go-data-transfer#229 and ipld/go-car#195.

@raulk raulk closed this as completed Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug P1 P1: Must be resolved
Projects
None yet
Development

No branches or pull requests

5 participants