Cardano-wallet-shelley workers crash #1708

piotr-iohk · 2020-06-02T10:48:29Z

Context

Information	-
Version	cardano-wallet: 2020.5.6 (git revision: `3fb783f`), cardano-node: 1.12 044e82d68a5e5d05258c93eda6c00d8541e4d977
Platform	Linux
Installation	From source

1.

I observed that workers on cardano-wallet tend to die while wallet is just running on top of cardano-node. After running wallet overnight on a VPS server I had:

2 out of 3 of my Byron wallets showing status: not_responding
Listing of Shelley wallets was giving 500 :

Code = 500,
Message = {"code":"wallet_not_responding","message":"That's embarassing. My associated worker for e9bc1ad5c4273eed7813483a40342e01ad47e243 is no longer responding. This is not something that is supposed to happen. The worker must have left a trace in the logs of severity 'Error' when it died which might explain the cause. Said differently, this wallet won't be accessible until the server is restarted but there are good chances it'll recover itself upon restart."}

2.

I was able to reproduce the situation by starting and stopping cardano-node while wallet was running. (however it was intermittent, i.e. not every cardano-node restart was causing that).

There was in the wallet.log upon restarting cardano-node (and workers crash as a result):

[cardano-wallet.network:Info:119][2020-06-02 08:16:13.26 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:121][2020-06-02 08:16:13.26 UTC] Couldn't connect to node (x11). Retrying in a bit...
[cardano-wallet.network:Info:121][2020-06-02 08:16:13.26 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:125][2020-06-02 08:16:13.27 UTC] Couldn't connect to node (x11). Retrying in a bit...
[cardano-wallet.network:Warning:127][2020-06-02 08:16:13.27 UTC] Couldn't connect to node (x11). Retrying in a bit...
[cardano-wallet.network:Warning:129][2020-06-02 08:16:13.27 UTC] Couldn't connect to node (x11). Retrying in a bit...
[cardano-wallet.network:Warning:123][2020-06-02 08:16:13.27 UTC] Couldn't connect to node (x11). Retrying in a bit...
[cardano-wallet.network:Info:119][2020-06-02 08:16:13.27 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:121][2020-06-02 08:16:13.27 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:16][2020-06-02 08:16:13.27 UTC] Intersection found: 5e708589
[cardano-wallet.network:Info:72][2020-06-02 08:16:13.27 UTC] Intersection found: 5e708589
[cardano-wallet.wallet-engine:Error:30][2020-06-02 08:16:13.28 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 127 Network.Socket.connect: <socket: 23>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Error:51][2020-06-02 08:16:13.29 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 129 Network.Socket.connect: <socket: 23>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Error:42][2020-06-02 08:16:13.29 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 125 Network.Socket.connect: <socket: 23>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Error:61][2020-06-02 08:16:13.29 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 123 Network.Socket.connect: <socket: 23>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Info:16][2020-06-02 08:16:13.36 UTC] c3ce146c: Applying blocks [34.18962 ... 34.18962]
[cardano-wallet.wallet-engine:Info:16][2020-06-02 08:16:13.36 UTC] c3ce146c: Creating checkpoint at 5e708589<-[52a91198-34.18962#31204]

3.

Finally I managed to observed the situation when the worker of my wallet dies without any intervention from my side.

The worker crashes on 2020-06-02 09:59:26.14 UTC. Here are corresponding logs from wallet and the node around that time.

Wallet.log:

[cardano-wallet.api-server:Info:209][2020-06-02 09:59:25.10 UTC] [RequestId 27] 200 OK in 0.035328973s
[cardano-wallet.network:Warning:11][2020-06-02 09:59:25.12 UTC] Couldn't connect to node (x22). Retrying in a bit...
[cardano-wallet.network:Info:11][2020-06-02 09:59:25.12 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Info:11][2020-06-02 09:59:25.12 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Warning:159][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:159][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:157][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:157][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:162][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:162][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:155][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:159][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:157][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Warning:161][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:161][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:165][2020-06-02 09:59:26.13 UTC] Couldn't connect to node (x18). Retrying in a bit...
[cardano-wallet.network:Info:165][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Info:165][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:162][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:161][2020-06-02 09:59:26.13 UTC] Mux ConnectionId {localAddress = LocalAddress {getFilePath = ""}, remoteAddress = LocalAddress {getFilePath = "/root/node/relay/node.socket"}} Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:23][2020-06-02 09:59:26.13 UTC] Intersection found: 46ad783d
[cardano-wallet.network:Info:35][2020-06-02 09:59:26.13 UTC] Intersection found: 46ad783d
[cardano-wallet.network:Info:26][2020-06-02 09:59:26.13 UTC] Intersection found: 46ad783d
[cardano-wallet.network:Info:38][2020-06-02 09:59:26.13 UTC] Intersection found: 46ad783d
[cardano-wallet.network:Info:14][2020-06-02 09:59:26.13 UTC] Intersection found: 46ad783d
[cardano-wallet.wallet-engine:Error:19][2020-06-02 09:59:26.14 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 155 Network.Socket.connect: <socket: 36>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Info:23][2020-06-02 09:59:26.14 UTC] 3f532c05: Applying blocks [35.3545 ... 35.3545]

Node.log:

[testnet:cardano.node.ChainDB:Info:95][2020-06-02 09:59:25.12 UTC] New reader was created
[testnet:cardano.node.ChainDB:Info:95][2020-06-02 09:59:25.12 UTC] ReaderNoLongerInMem
[testnet:cardano.node.ChainDB:Debug:95] [2020-06-02 09:59:25.12 UTC] TraceImmDBEvent
[testnet:cardano.node.ChainDB:Debug:95] [2020-06-02 09:59:25.12 UTC] TraceImmDBEvent
[testnet:cardano.node.ChainDB:Debug:95] [2020-06-02 09:59:25.12 UTC] TraceImmDBEvent
[testnet:cardano.node.ChainDB:Info:36][2020-06-02 09:59:25.17 UTC] Valid candidate (Point 759545, HashHeader {unHashHeader = b3c4529ef1be49f701d85f967c8e09a32f30fb2639de80451c5311fbd2a0acf8})
[testnet:cardano.node.ChainDB:Notice:36][2020-06-02 09:59:25.17 UTC] Chain extended, new tip: (Point 759545, HashHeader {unHashHeader = b3c4529ef1be49f701d85f967c8e09a32f30fb2639de80451c5311fbd2a0acf8})
[testnet:cardano.node.ChainDB:Debug:105] [2020-06-02 09:59:26.13 UTC] before next, messages elided = 1802406411496672
[testnet:cardano.node.ChainDB:Debug:105] [2020-06-02 09:59:26.13 UTC] Scheduled a garbage collection for 757251
[testnet:cardano.node.ChainDB:Info:105][2020-06-02 09:59:26.13 UTC] New reader was created
[testnet:cardano.node.ChainDB:Info:108][2020-06-02 09:59:26.13 UTC] New reader was created
[testnet:cardano.node.ChainDB:Info:114][2020-06-02 09:59:26.13 UTC] New reader was created
[testnet:cardano.node.ChainDB:Info:120][2020-06-02 09:59:26.13 UTC] New reader was created
[testnet:cardano.node.ChainDB:Info:125][2020-06-02 09:59:26.13 UTC] New reader was created
[testnet:cardano.node.ChainDB:Debug:88] [2020-06-02 09:59:33.16 UTC] Block added to queue: (Point 759572, HashHeader {unHashHeader = b2126d1f495a976d73eb9135550faa86c6b9d6cdff5f90fd20649d61c0ddf4a8}) queue size 1
[testnet:cardano.node.ChainDB:Notice:36][2020-06-02 09:59:33.16 UTC] Chain extended, new tip: (Point 759572, HashHeader {unHashHeader = b2126d1f495a976d73eb9135550faa86c6b9d6cdff5f90fd20649d61c0ddf4a8})
[testnet:cardano.node.ChainDB:Debug:36] [2020-06-02 09:59:33.19 UTC] before next, messages elided = 1802414403318408
[testnet:cardano.node.ChainDB:Debug:36] [2020-06-02 09:59:33.19 UTC] Chain added block (Point 759571, HashHeader {unHashHeader = db7db23ce97c4a399835c13bb854ad95b00416880b61c1d38b7aecfce951908e})
[testnet:cardano.node.ChainDB:Info:36][2020-06-02 09:59:33.19 UTC] Block fits onto some fork: (Point 759571, HashHeader {unHashHeader = db7db23ce97c4a399835c13bb854ad95b00416880b61c1d38b7aecfce951908e})

Steps to Reproduce

Start a wallet connected to F&F testnet and create a few wallets (Byron and Shelley)

Expected behavior

Wallet should be running with no crashes.

Actual behavior

Wallet workers tend to crash which makes them not operational until wallet server is restarted.

Resolution

Analysis

The error Network.Socket.connect: <socket: 24>: resource exhausted (Resource temporarily unavailable) means that connect() returns EAGAIN.

This is because localSnocket sets up a listen queue of length 1. By contrast the TCP version socketSnocket uses a listen queue of length 8.

Our code uses localSnocket through the connectClient function. This function is used multiple times at once - one node client per wallet, one "global" client, and another node client for stake pools monitoring. We handle some errors, but the "resource exhausted" error is considered fatal.

Solution

In the event of connect() returning EAGAIN, we should retry, ~~backing off for a small random interval~~ => Handle Network.Socket.connect: resource exhausted exceptions #1961.
We can add a regression test which connects multiple network layers in parallel to the same node socket => Handle Network.Socket.connect: resource exhausted exceptions #1961.
We can add HasCallStack through the network layer, and print backtraces in the "Worker has exited unexpectedly" log message ⇒ Add more HasCallStack to shelley network layer #1969.

QA

Regression test added in lib/shelley/test/unit/Cardano/Wallet/Shelley/NetworkSpec.hs

The text was updated successfully, but these errors were encountered:

rvl · 2020-07-02T04:20:30Z

I saw this when running latest master revision on shelley_testnet last night.

Only one of the two wallet workers has crashed. cardano-node is still running.

It is still possible to get the details of the wallet which did not crash. But listing wallets fails.

Luckily I have been running with | tee -a cardano-wallet.log, so at least there is an Info-level log to look through.

More details to come...

piotr-iohk · 2020-07-02T19:29:32Z

@rvl I observe it pretty regularly on Ubuntu test env I'm using. If there are any DEBUG level logs that are worth turning on let me know. Attaching info - level wallet log from an "incident":

wallet.log

Relevant excerpt (I think):

[cardano-wallet.network:Warning:670] [2020-07-02 18:37:58.44 UTC] Connection lost with the node.
[cardano-wallet.wallet-engine:Error:29] [2020-07-02 18:37:58.44 UTC] 73d38c71: Unexpected error following the chain: ExceptionInLinkedThread ThreadId 665 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 16> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:29] [2020-07-02 18:37:58.44 UTC] Destroying cursor connection at ThreadId 665
[cardano-wallet.wallet-engine:Error:55] [2020-07-02 18:37:58.44 UTC] 1ceb45b3: Unexpected error following the chain: ExceptionInLinkedThread ThreadId 666 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 23> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:55] [2020-07-02 18:37:58.44 UTC] Destroying cursor connection at ThreadId 666
[cardano-wallet.wallet-engine:Error:71] [2020-07-02 18:37:58.44 UTC] bed3fc77: Unexpected error following the chain: ExceptionInLinkedThread ThreadId 668 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 27> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:71] [2020-07-02 18:37:58.44 UTC] Destroying cursor connection at ThreadId 668
[cardano-wallet.wallet-engine:Error:45] [2020-07-02 18:37:58.44 UTC] 1b0aa249: Unexpected error following the chain: ExceptionInLinkedThread ThreadId 670 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 31> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:45] [2020-07-02 18:37:58.44 UTC] Destroying cursor connection at ThreadId 670
[cardano-wallet.network:Info:29] [2020-07-02 18:37:58.44 UTC] Looking for an intersection with the node's local chain with: 01010101<-[3bfd2a29-0.0#0], b4503aca<-[df9984c1-15.10480#15000], 81cffd86<-[14e09790-15.12360#15100], 7a0a4f91<-[2c1c5d7b-15.12840#15123], 2c1c5d7b<-[22e52c78-15.12880#15124], 22e52c78<-[f2fb4d76-15.12889#15125], f2fb4d76<-[8c49d6ed-15.12911#15126], 8c49d6ed<-[c15f16be-15.12960#15127], c15f16be<-[1838faad-15.12965#15128], 1838faad<-[41787df4-15.12993#15129], 41787df4<-[f33a143f-15.13000#15130], f33a143f<-[09eaac01-15.13039#15131], 09eaac01<-[2ee8ff25-15.13040#15132], 2ee8ff25<-[67da4ece-15.13053#15133]
[cardano-wallet.network:Info:45] [2020-07-02 18:37:58.45 UTC] Looking for an intersection with the node's local chain with: 01010101<-[3bfd2a29-0.0#0], b4503aca<-[df9984c1-15.10480#15000], 81cffd86<-[14e09790-15.12360#15100], 7a0a4f91<-[2c1c5d7b-15.12840#15123], 2c1c5d7b<-[22e52c78-15.12880#15124], 22e52c78<-[f2fb4d76-15.12889#15125], f2fb4d76<-[8c49d6ed-15.12911#15126], 8c49d6ed<-[c15f16be-15.12960#15127], c15f16be<-[1838faad-15.12965#15128], 1838faad<-[41787df4-15.12993#15129], 41787df4<-[f33a143f-15.13000#15130], f33a143f<-[09eaac01-15.13039#15131], 09eaac01<-[2ee8ff25-15.13040#15132], 2ee8ff25<-[67da4ece-15.13053#15133]
[cardano-wallet.network:Info:71] [2020-07-02 18:37:58.45 UTC] Looking for an intersection with the node's local chain with: 01010101<-[3bfd2a29-0.0#0], b4503aca<-[df9984c1-15.10480#15000], 81cffd86<-[14e09790-15.12360#15100], 7a0a4f91<-[2c1c5d7b-15.12840#15123], 2c1c5d7b<-[22e52c78-15.12880#15124], 22e52c78<-[f2fb4d76-15.12889#15125], f2fb4d76<-[8c49d6ed-15.12911#15126], 8c49d6ed<-[c15f16be-15.12960#15127], c15f16be<-[1838faad-15.12965#15128], 1838faad<-[41787df4-15.12993#15129], 41787df4<-[f33a143f-15.13000#15130], f33a143f<-[09eaac01-15.13039#15131], 09eaac01<-[2ee8ff25-15.13040#15132], 2ee8ff25<-[67da4ece-15.13053#15133]
[cardano-wallet.network:Info:55] [2020-07-02 18:37:58.45 UTC] Looking for an intersection with the node's local chain with: 01010101<-[3bfd2a29-0.0#0], b4503aca<-[df9984c1-15.10480#15000], 81cffd86<-[14e09790-15.12360#15100], 7a0a4f91<-[2c1c5d7b-15.12840#15123], 2c1c5d7b<-[22e52c78-15.12880#15124], 22e52c78<-[f2fb4d76-15.12889#15125], f2fb4d76<-[8c49d6ed-15.12911#15126], 8c49d6ed<-[c15f16be-15.12960#15127], c15f16be<-[1838faad-15.12965#15128], 1838faad<-[41787df4-15.12993#15129], 41787df4<-[f33a143f-15.13000#15130], f33a143f<-[09eaac01-15.13039#15131], 09eaac01<-[2ee8ff25-15.13040#15132], 2ee8ff25<-[67da4ece-15.13053#15133]
[cardano-wallet.network:Warning:673] [2020-07-02 18:37:58.45 UTC] Connection lost with the node.
[cardano-wallet.network:Warning:716] [2020-07-02 18:37:58.45 UTC] Connection lost with the node.
[cardano-wallet.pools-engine:Error:102] [2020-07-02 18:37:58.45 UTC] Unexpected error following the chain: ExceptionInLinkedThread ThreadId 673 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 35> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:102] [2020-07-02 18:37:58.45 UTC] Destroying cursor connection at ThreadId 673
[cardano-wallet.pools-engine:Error:102] [2020-07-02 18:37:58.45 UTC] Chain follower exited with error. Worker will no longer monitor stake pools.
[cardano-wallet.pools-engine:Notice:102] [2020-07-02 18:37:58.45 UTC] Worker has exited: main action is over.
[cardano-wallet.wallet-engine:Error:90] [2020-07-02 18:37:58.46 UTC] b1fb8632: Unexpected error following the chain: ExceptionInLinkedThread ThreadId 716 (MuxError {errorType = MuxBearerClosed, errorMsg = "<socket: 36> closed when reading data, waiting on next header True", errorStack = []})
[cardano-wallet.network:Notice:90] [2020-07-02 18:37:58.46 UTC] Destroying cursor connection at ThreadId 716
[cardano-wallet.network:Info:90] [2020-07-02 18:37:58.46 UTC] Looking for an intersection with the node's local chain with: 01010101<-[3bfd2a29-0.0#0], b4503aca<-[df9984c1-15.10480#15000], 81cffd86<-[14e09790-15.12360#15100], 7a0a4f91<-[2c1c5d7b-15.12840#15123], 2c1c5d7b<-[22e52c78-15.12880#15124], 22e52c78<-[f2fb4d76-15.12889#15125], f2fb4d76<-[8c49d6ed-15.12911#15126], 8c49d6ed<-[c15f16be-15.12960#15127], c15f16be<-[1838faad-15.12965#15128], 1838faad<-[41787df4-15.12993#15129], 41787df4<-[f33a143f-15.13000#15130], f33a143f<-[09eaac01-15.13039#15131], 09eaac01<-[2ee8ff25-15.13040#15132], 2ee8ff25<-[67da4ece-15.13053#15133]
[cardano-wallet.network:Notice:11] [2020-07-02 18:37:58.69 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:13] [2020-07-02 18:37:58.69 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:814] [2020-07-02 18:37:58.70 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:816] [2020-07-02 18:37:58.70 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:817] [2020-07-02 18:37:58.70 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:820] [2020-07-02 18:37:58.70 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Notice:824] [2020-07-02 18:37:58.71 UTC] Couldn't connect to node (x2). Retrying in a bit...
[cardano-wallet.network:Warning:13] [2020-07-02 18:37:58.94 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:11] [2020-07-02 18:37:58.94 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:814] [2020-07-02 18:37:58.95 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:816] [2020-07-02 18:37:58.95 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:817] [2020-07-02 18:37:58.95 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:820] [2020-07-02 18:37:58.95 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:824] [2020-07-02 18:37:58.96 UTC] Couldn't connect to node (x3). Retrying in a bit...
[cardano-wallet.network:Warning:11] [2020-07-02 18:37:59.44 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:13] [2020-07-02 18:37:59.44 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:816] [2020-07-02 18:37:59.45 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:820] [2020-07-02 18:37:59.45 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:817] [2020-07-02 18:37:59.45 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:814] [2020-07-02 18:37:59.45 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:824] [2020-07-02 18:37:59.46 UTC] Couldn't connect to node (x4). Retrying in a bit...
[cardano-wallet.network:Warning:13] [2020-07-02 18:38:00.19 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:11] [2020-07-02 18:38:00.19 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:816] [2020-07-02 18:38:00.20 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:820] [2020-07-02 18:38:00.20 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:817] [2020-07-02 18:38:00.20 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:814] [2020-07-02 18:38:00.20 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:824] [2020-07-02 18:38:00.21 UTC] Couldn't connect to node (x5). Retrying in a bit...
[cardano-wallet.network:Warning:11] [2020-07-02 18:38:01.44 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Info:11] [2020-07-02 18:38:01.44 UTC] conn::/root/node/relay/node.socket Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:13] [2020-07-02 18:38:01.44 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Info:13] [2020-07-02 18:38:01.44 UTC] conn::/root/node/relay/node.socket Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Info:13] [2020-07-02 18:38:01.44 UTC] conn::/root/node/relay/node.socket Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:11] [2020-07-02 18:38:01.44 UTC] conn::/root/node/relay/node.socket Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Warning:816] [2020-07-02 18:38:01.45 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Info:816] [2020-07-02 18:38:01.45 UTC] conn::/root/node/relay/node.socket Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:820] [2020-07-02 18:38:01.45 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Info:820] [2020-07-02 18:38:01.45 UTC] conn::/root/node/relay/node.socket Send MsgProposeVersions (fromList [(NodeToClientV_2,TInt 42)])
[cardano-wallet.network:Warning:817] [2020-07-02 18:38:01.45 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Warning:814] [2020-07-02 18:38:01.45 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Warning:824] [2020-07-02 18:38:01.46 UTC] Couldn't connect to node (x6). Retrying in a bit...
[cardano-wallet.network:Info:816] [2020-07-02 18:38:01.47 UTC] conn::/root/node/relay/node.socket Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:820] [2020-07-02 18:38:01.47 UTC] conn::/root/node/relay/node.socket Recv MsgAcceptVersion NodeToClientV_2 (TInt 42)
[cardano-wallet.network:Info:55] [2020-07-02 18:38:01.47 UTC] Intersection found: 67da4ece
[cardano-wallet.network:Info:71] [2020-07-02 18:38:01.47 UTC] Intersection found: 67da4ece
[cardano-wallet.wallet-engine:Error:19] [2020-07-02 18:38:01.47 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 814 Network.Socket.connect: <socket: 27>: resource exhausted (Resource temporarily unavailable)
[cardano-wallet.wallet-engine:Error:33] [2020-07-02 18:38:01.47 UTC] Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 817 Network.Socket.connect: <socket: 27>: resource exhausted (Resource temporarily unavailable)

piotr-iohk · 2020-07-18T11:38:12Z

I saw manifestation of this in Daedalus few times. In Daedalus it looks then as if all wallets suddenly disappeared. That is because listing Shelley wallets in case of this error results in:

Code = 500,
Message = {"code":"wallet_not_responding","message":"That's embarassing. My associated worker for e9bc1ad5c4273eed7813483a40342e01ad47e243 is no longer responding. 
This is not something that is supposed to happen. 
The worker must have left a trace in the logs of severity 'Error' when it died which might explain the cause. 
Said differently, this wallet won't be accessible until the server is restarted but there are good chances it'll recover itself upon restart."}

Luckily restarting Daedalus usually helps and everything is back to normal.

KtorZ · 2020-07-27T12:19:36Z

@rvl From a few experiments I conducted the other day (observing crashes in the integration test cluster), I observed 3 failures that had the same symptoms and led to a "wallet_not_responding" kind of situation.

Symptoms were, at some point in time, the wallet would fail to acquire a point on chain:

Error when querying local state parameters for DelegationRewardsClient: AcquireFailurePointNotOnChain

And, within a few seconds later, the worker would suddenly exits with a:

Worker has exited unexpectedly: ExceptionInLinkedThread ThreadId 1972 Network.Socket.connect: <socket: 31>: resource exhausted (Resource temporarily unavailable)

With no clear sign that any re-connection attempts did occur or anything really abnormal happened (the acquire failure is itself so-to-speak expected under some circumstances and shouldn't have any major impact on the life of the worker).

From there, any subsequent request would return a 500 Internal Server Error with wallet_not_responding as a code, including for newly created wallets after the worker of an existing wallet initially crashed. This suggests some sort of failure on a shared components between wallets (possibly, the network layer?).

Anviking · 2020-07-27T12:36:08Z

If I run 3 instances of

set -x CARDANO_NODE_SOCKET_PATH /private/var/folders/h_/mlzym27d5lz_fx3ph2d547700000gn/T/test-7d0bcbd6e21b57e6/node/node.socket
while [ 1 ]
do
  cardano-cli shelley query tip --testnet-magic 764824073
done

against the relay node while the integration tests are running, I get many

cardano-cli: Network.Socket.connect: <socket: 12>: does not exist (Connection refused)

, but that is different from Resource temporarily unavailable

My original suspicious was whether our strategy to close-and re-open the connection when the wallet worker rolls back actually was sound, as we can rollback often.

1961: Handle Network.Socket.connect: resource exhausted exceptions r=KtorZ a=rvl ### Issue Number Relates to #1708 ### Overview - Adds regression test - Fixes unhandled exception resulting from parallel local snocket connections (see issue description for more details). Co-authored-by: Rodney Lorrimar <[email protected]> Co-authored-by: IOHK <[email protected]>

1969: Add more HasCallStack to shelley network layer r=KtorZ a=rvl ### Issue Number Relates to #1708 ### Overview - Adds HasCallStack annotations where possible, based on the principle that future bugs often appear in the same place as past bugs. - Make the parallel socket connections test more robust by removing the threadDelay. ### Comments - I was unable to add `HasCallStack` to `connectClient`. This requires a change in the ouroboros-network library. Co-authored-by: Rodney Lorrimar <[email protected]>

piotr-iohk · 2020-07-29T08:25:14Z

lgtm. I haven't seen that lately 🎉

Fell-x27 · 2021-07-30T13:29:23Z

After a year its still actual.
[cardano-wallet.network:Warning:958] [2021-07-30 13:26:58.03 UTC] Connection lost with the node. Network.Socket.recvBuf: resource vanished (Connection reset by peer)

and then worker becomes crushed

piotr-iohk · 2021-07-30T14:03:14Z

Thanks. We are aware of this, it's been tracked in our Jira backlog (ADP-871) and is currently our highest priority bug... unfortunately a little bit on hold recently due to Alonzo work.

34r7h · 2022-02-08T08:45:31Z

plz update if possible

piotr-iohk · 2022-02-08T11:07:35Z

The issue is fixed since v2021-11-11, and should be good also in current wallet version v2022-01-18.

piotr-iohk added BUG? and removed BUG? labels Jun 2, 2020

piotr-iohk changed the title ~~Cardano-wallet workers crash~~ Cardano-wallet-shelley workers crash Jun 9, 2020

KtorZ added MINOR-BUG:CONFIRMED and removed MAJOR-BUG:CONFIRMED labels Jun 26, 2020

piotr-iohk mentioned this issue Jul 14, 2020

Testing byron txs and migrations in Shelley #1900

Closed

piotr-iohk added BUG:MAJOR and removed BUG:MINOR labels Jul 27, 2020

rvl self-assigned this Jul 27, 2020

rvl mentioned this issue Jul 27, 2020

Handle Network.Socket.connect: resource exhausted exceptions #1961

Merged

rvl added a commit that referenced this issue Jul 28, 2020

Add backtraces for bug #1708

b1faa72

rvl mentioned this issue Jul 28, 2020

Add more HasCallStack to shelley network layer #1969

Merged

rvl added a commit that referenced this issue Jul 28, 2020

Add backtraces for bug #1708

8408897

piotr-iohk closed this as completed Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cardano-wallet-shelley workers crash #1708

Cardano-wallet-shelley workers crash #1708

piotr-iohk commented Jun 2, 2020 •

edited by rvl

Loading

rvl commented Jul 2, 2020

piotr-iohk commented Jul 2, 2020

piotr-iohk commented Jul 18, 2020

KtorZ commented Jul 27, 2020

Anviking commented Jul 27, 2020 •

edited

Loading

piotr-iohk commented Jul 29, 2020

Fell-x27 commented Jul 30, 2021 •

edited

Loading

piotr-iohk commented Jul 30, 2021

34r7h commented Feb 8, 2022

piotr-iohk commented Feb 8, 2022

Cardano-wallet-shelley workers crash #1708

Cardano-wallet-shelley workers crash #1708

Comments

piotr-iohk commented Jun 2, 2020 • edited by rvl Loading

Context

1.

2.

3.

Steps to Reproduce

Expected behavior

Actual behavior

Resolution

Analysis

Solution

QA

rvl commented Jul 2, 2020

piotr-iohk commented Jul 2, 2020

piotr-iohk commented Jul 18, 2020

KtorZ commented Jul 27, 2020

Anviking commented Jul 27, 2020 • edited Loading

piotr-iohk commented Jul 29, 2020

Fell-x27 commented Jul 30, 2021 • edited Loading

piotr-iohk commented Jul 30, 2021

34r7h commented Feb 8, 2022

piotr-iohk commented Feb 8, 2022

piotr-iohk commented Jun 2, 2020 •

edited by rvl

Loading

Anviking commented Jul 27, 2020 •

edited

Loading

Fell-x27 commented Jul 30, 2021 •

edited

Loading