Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deal Making Issue] Client gets stuck in StorageDealAwaitingPreCommit even after deal has failed on miner side #261

Closed
ghost opened this issue Apr 20, 2021 · 6 comments
Labels
area/markets P2 P2: Should be resolved

Comments

@ghost
Copy link

ghost commented Apr 20, 2021

Basic Information

When a miner fails 3 times to seal, it will give up and purge the sector. The problem this issue describes is that on the client side there is never any notification that the deal has failed. The client just hangs in a StorageDealAwaitingPreCommit state forever.

I propose that the miner reports back on a final failure state like this so the client status can change to StorageDealError or some other error state.

Describe the problem

On the client side (1.7.0), I'm seeing repeated instances of deals getting stuck in StorageDealAwaitingPreCommit even days after the miner has failed to seal and given up on the sector.

Version

  • Client:
$ lotus version
Daemon:  1.7.0-dev+mainnet+git.a40fb1e1d+api1.1.0
Local: lotus version 1.7.0-dev+mainnet+git.a40fb1e1d
  • Miner
$ lotus version
Daemon:  1.6.0-rc2+mainnet+git.80224bcb8+api1.1.0
Local: lotus version 1.6.0-rc2+mainnet+git.80224bcb8

# Note: this is after a recent downgrade to 1.6.0; one of the failed deals shown below was made last week when miner was on 1.7.0.
# But this bug report is not about the miner-side issue! It is about the client UX, ie, not telling the user that the deal has failed.

To Reproduce

  1. Make a deal from client:
$ lotus client deal --fast-retrieval --verified-deal bafykbzacebzqkg44jer45btajfibw4hnzjpfoxh6f63eg4ridgfcget2wxbyw f0127896  0.0000002 518400
bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei
  1. Wait ~5-10 minutes for client-side deal to advance to StorageDealAwaitingPreCommit .
  2. Now wait 3-5 days or longer. The StorageDealAwaitingPreCommit status never changes to a final error status.

Deal status

The output of lotus client list-deals -v:

ubuntu@dealbot-mainnet:~$ lotus client list-deals -v
Created          DealCid                                                      DealId   Provider  State                         On Chain?         Slashed?     PieceCID                                                          Size       Price          Duration  TransferChannelID                                                                                                              TransferStatus  Verified  Message
Apr 16 14:39:48  bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy  1742150  f0127896  StorageDealAwaitingPreCommit  Y (epoch 0)       Y (epoch 0)  baga6ea4seaqn2csbrgsdpavbcek5rvj7z52gahph65pta53oanbmqnmw6tgnkoq  1.984 GiB  0.1039372 FIL  519686    12D3KooWLXgNd8PfyRf7eC7xKPVnuZCb65SA7ZcqEFp4zUeC2yah-12D3KooWQrZnhRaBecSZYKR815zH65xkg1TvfyPWUGYwehHvVEKB-1617824480496011385  Completed       true
Apr 20 14:33:21  bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei  1770094  f0127896  StorageDealAwaitingPreCommit  N                 N            baga6ea4seaqn2csbrgsdpavbcek5rvj7z52gahph65pta53oanbmqnmw6tgnkoq  1.984 GiB  0.1039398 FIL  519699    12D3KooWLXgNd8PfyRf7eC7xKPVnuZCb65SA7ZcqEFp4zUeC2yah-12D3KooWQrZnhRaBecSZYKR815zH65xkg1TvfyPWUGYwehHvVEKB-1617824480496011386  Completed       true

lotus-miner storage-deals list -v (relevant lines only):

$ lotus-miner storage-deals list -v
Apr 16 17:39:48  true   bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy  1742150  StorageDealError                         f1wdxdpqh3hirrhp353i4o6ld7bsw6evh3v7i5jtq                                               2GiB    0.1039372 FIL             519686   12D3KooWLXgNd8PfyRf7eC7xKPVnuZCb65SA7ZcqEFp4zUeC2yah-12D3KooWQrZnhRaBecSZYKR815zH65xkg1TvfyPWUGYwehHvVEKB-1617824480496011385  error awaiting deal pre-commit: failed to set up called handler: called check error (h: 686834): failed to look up deal on chain: deal 1742150 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed
Apr 20 17:33:21  true   bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei  1770094  StorageDealAwaitingPreCommit             f1wdxdpqh3hirrhp353i4o6ld7bsw6evh3v7i5jtq                                               2GiB    0.1039398 FIL             519699   12D3KooWLXgNd8PfyRf7eC7xKPVnuZCb65SA7ZcqEFp4zUeC2yah-12D3KooWQrZnhRaBecSZYKR815zH65xkg1TvfyPWUGYwehHvVEKB-1617824480496011386

Lotus daemon and miner logs

This issue is about client UX, not the underlying miner bug(s) that caused these deals to fail miner-side. Nevertheless, here are the relevant miner logs.

This is for the 'jhy deal started last Friday and where client side is still stuck in StorageDealAwaitingPreCommit (miner was running 1.7.0):

$ cat lotus-miner-logs* | grep bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy
2021-04-17T03:21:24.726+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventRestart", "proposal CID": "bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy", "state": "StorageDealAwaitingPreCommit", "message": ""}
2021-04-20T12:36:49.398+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventRestart", "proposal CID": "bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy", "state": "StorageDealAwaitingPreCommit", "message": ""}
2021-04-20T12:37:14.888+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "", "proposal CID": "bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy", "state": "StorageDealFailing", "message": "error awaiting deal pre-commit: failed to set up called handler: called check error (h: 686834): failed to look up deal on chain: deal 1742150 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"}
2021-04-20T12:37:14.890+0300	WARN	providerstates	providerstates/provider_states.go:536	deal bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy failed: error awaiting deal pre-commit: failed to set up called handler: called check error (h: 686834): failed to look up deal on chain: deal 1742150 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed
2021-04-20T12:37:14.900+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventFailed", "proposal CID": "bafyreicqyarmouxojweqio7fmd55d6qe6qphraq2trkn7cvjjgxg4s3jhy", "state": "StorageDealError", "message": "error awaiting deal pre-commit: failed to set up called handler: called check error (h: 686834): failed to look up deal on chain: deal 1742150 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"}

This is for the 'dei deal that was started 4 hours ago and is still in StorageDealAwaitingPreCommit (miner running 1.6.0`):

$ cat lotus-miner-logs* | grep bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei
2021-04-20T17:33:21.924+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventOpen", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealValidating", "message": ""}
2021-04-20T17:33:21.932+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDealDeciding", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealAcceptWait", "message": ""}
2021-04-20T17:33:21.942+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDataRequested", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealWaitingForData", "message": ""}
2021-04-20T17:33:22.070+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDataTransferInitiated", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealTransferring", "message": ""}
2021-04-20T17:37:55.880+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDataTransferCompleted", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealVerifyData", "message": ""}
2021-04-20T17:38:14.816+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventVerifiedData", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealReserveProviderFunds", "message": ""}
2021-04-20T17:38:14.822+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventFundsReserved", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealReserveProviderFunds", "message": ""}
2021-04-20T17:38:14.827+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventFunded", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealPublish", "message": ""}
2021-04-20T18:38:14.993+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDealPublishInitiated", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealPublishing", "message": ""}
2021-04-20T18:44:00.484+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventFundsReleased", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealPublishing", "message": ""}
2021-04-20T18:44:00.489+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDealPublished", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealStaged", "message": ""}
2021-04-20T18:44:09.211+0300	INFO	markets	loggers/loggers.go:20	storage provider event	{"name": "ProviderEventDealHandedOff", "proposal CID": "bafyreiffnyigpe3l32gmnkjbv7px6rb3a5xqxgqshd5uq5nup6qsi4gdei", "state": "StorageDealAwaitingPreCommit", "message": ""}

Right now, on the miner, lotus client list-deals is is still showing StorageDealAwaitingPreCommit even though both deals have already failed permanently on the miner:

$ lotus client list-deals
DealCid      DealId   Provider  State                         On Chain?         Slashed?     PieceCID     Size       Price          Duration  Verified
...xg4s3jhy  1742150  f0127896  StorageDealAwaitingPreCommit  Y (epoch 0)       Y (epoch 0)  ...w6tgnkoq  1.984 GiB  0.1039372 FIL  519686    true
...qsi4gdei  1770094  f0127896  StorageDealAwaitingPreCommit  N                 N            ...w6tgnkoq  1.984 GiB  0.1039398 FIL  519699    true

Code modifications
None

@ghost
Copy link
Author

ghost commented Apr 20, 2021

PS I can provide shell access to both the client and the miner.

@ghost
Copy link
Author

ghost commented Apr 21, 2021

Interestingly, in the last ~10 hours since I opened this issue, both deals have left StorageDealAwaitingPreCommit.

The ...jhy deal from 3 days ago did eventually fail and timeout :

$ lotus client list-deals --show-failed
...xg4s3jhy  1742150  f0127896  StorageDealError   Y (epoch 0)       Y (epoch 0)  ...w6tgnkoq  1.984 GiB  0.1039372 FIL           519686    true
  Message: error waiting for deal pre-commit message to appear on chain: handling applied event: deal 1742150 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

And the ...dei deal that was started about 4 hours before I opened the original issue succeeded:

$ lotus client list-deals
...qsi4gdei  1770094  f0127896  StorageDealActive  Y (epoch 688144)  N         ...w6tgnkoq  1.984 GiB  0.1039398 FIL  519699    true

Maybe the real issues here are:

  1. Why did it take 3 days for the client to realize the jhy deal had failed the previous Friday? Could we have some UX where the client can get an update every 5 minutes or so that the deal is still on track? If enough of these messages are missed, the client would give up.
  2. I think I just jumped the gun on including dei in this issue report because I had watched these same symptoms so many times that I assumed it should have advanced states within 4 hours (when I wrote this issue). So, again, maybe more frequent updates from the miner basically saying "yes, StorageDealAwaitingPreCommit is still the true state" would have persuaded me to give that one more time?

@dkkapur dkkapur added P2 P2: Should be resolved and removed need/triage labels May 19, 2021
@RobQuistNL
Copy link
Contributor

This stuff is still happening.

I have a bunch of deals that are failed in the storage-deals list, but are active on chain, and successful on the client side.

A ton with this error;

failed to look up deal on chain: deal (dealid) not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed

@RobQuistNL
Copy link
Contributor

Example log;

{"level":"info","ts":"2021-11-11T10:40:09.717Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventOpen","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealValidating","message":""}
{"level":"info","ts":"2021-11-11T10:40:09.720Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDealDeciding","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealAcceptWait","message":""}
{"level":"info","ts":"2021-11-11T10:40:10.135Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDataRequested","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealWaitingForData","message":""}
{"level":"info","ts":"2021-11-11T10:47:19.101Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventVerifiedData","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealReserveProviderFunds","message":""}
{"level":"info","ts":"2021-11-11T10:47:19.104Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventFundsReserved","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealReserveProviderFunds","message":""}
{"level":"info","ts":"2021-11-11T10:47:19.104Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventFunded","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealPublish","message":""}
{"level":"info","ts":"2021-11-11T10:57:19.219Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDealPublishInitiated","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealPublishing","message":""}
{"level":"info","ts":"2021-11-11T11:03:30.587Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventFundsReleased","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealPublishing","message":""}
{"level":"info","ts":"2021-11-11T11:03:30.588Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDealPublished","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealStaged","message":""}
{"level":"info","ts":"2021-11-11T11:03:30.588Z","logger":"providerstates","caller":"providerstates/provider_states.go:329","msg":"handing off deal to sealing subsystem","pieceCid":"baga6ea4seaqpgmwj5yz4kbpy4l5a75y5lbkjq5o572sjt6pioxxxukms3szskdq","proposalCid":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge"}
{"level":"info","ts":"2021-11-11T11:07:16.626Z","logger":"providerstates","caller":"providerstates/provider_states.go:376","msg":"successfully handed off deal to sealing subsystem","pieceCid":"baga6ea4seaqpgmwj5yz4kbpy4l5a75y5lbkjq5o572sjt6pioxxxukms3szskdq","proposalCid":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge"}
{"level":"info","ts":"2021-11-11T11:07:16.626Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDealHandedOff","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealAwaitingPreCommit","message":""}
{"level":"info","ts":"2021-11-11T11:23:25.261Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventDealPrecommitFailed","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealFailing","message":"error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1274108): failed to look up deal on chain: deal 2744407 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"}
{"level":"warn","ts":"2021-11-11T11:23:25.261Z","logger":"providerstates","caller":"providerstates/provider_states.go:561","msg":"deal bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge failed: error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1274108): failed to look up deal on chain: deal 2744407 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"}
{"level":"info","ts":"2021-11-11T11:23:25.261Z","logger":"markets","caller":"loggers/loggers.go:20","msg":"storage provider event","name":"ProviderEventFailed","proposal CID":"bafyreidsf5msroslsubmaxmq7iewtma4ycr3jq634pwa2l2j7mmq46buge","state":"StorageDealError","message":"error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1274108): failed to look up deal on chain: deal 2744407 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"}

@jennijuju jennijuju transferred this issue from filecoin-project/lotus Feb 21, 2022
@LexLuthr
Copy link
Collaborator

I am closing this issue as boost uses a different method to track deal state within the pipeline. Please feel free to reopen if this can be reproduced on boost.

@RobQuistNL
Copy link
Contributor

Well thats nice, but its an error that is happening in markets (which is still being used) and then @jennijuju moved it away from the markets project and into the boost project - so instead of closing it, shouldn't it be moved back to the Lotus repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/markets P2 P2: Should be resolved
Projects
Status: Done
Development

No branches or pull requests

4 participants