Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

2.7.2 sync blocks extremely slow #11494

Open
fish-sammy opened this issue Feb 15, 2020 · 45 comments
Open

2.7.2 sync blocks extremely slow #11494

fish-sammy opened this issue Feb 15, 2020 · 45 comments

Comments

@fish-sammy
Copy link

fish-sammy commented Feb 15, 2020

  • Parity Ethereum version: 2.7.2
  • Operating system: Linux (debian:buster-slim)
  • Installation: built from source
  • Fully synchronized: yes
  • Network: ethereum
  • Restarted: yes

I have been running 2 parity nodes for ethereum mainnet in a Kubernetes cluster, and just upgraded them from 2.5.13 to 2.7.2. Now syncing blocks become extremely slow (~0.04 blk/s) and they cannot even keep up with the chain.

My TOML configuration file for both nodes is as following

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091
# Threads for handling incoming connections for HTTP JSON-RPC server.
server_threads = 16

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "on"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 2000 old state entries.
pruning_history = 20000
# Will keep up to 8192 MB old state entries.
pruning_memory = 8192
# If defined will never use more then 32768MB for all caches. (Overrides other cache settings).
cache_size = 32768
# Initial number of verifier threads
num_verifiers = 16
# Number of threads will vary depending on the workload. Not guaranteed to be faster.
scale_verifiers = true

[mining]
# Maximum number of transactions per sender in the queue. By default it's 1% of the entire queue, but not less than 16.
tx_queue_per_sender = 256
# Maximum amount of memory that can be used by the transaction queue. Setting this parameter to 0 disables limiting.
tx_queue_mem_limit = 32
# Parity will keep/relay at most 8196 transactions in queue.
tx_queue_size = 8196

[network]
# Override the bootnodes from selected chain file.
bootnodes = ["enode://d860a01f9722d78051619d1e2351aba3f43f943f6f00718d1b9baa4101932a1f5011f16bb2b1bb35db20d6fe28fa0bf09636d26a87d31de9ec6203eeedb1f666@18.138.108.67:30303", "enode://22a8232c3abc76a16ae9d6c3b164f98775fe226f0917b0ca871128a74a8e9630b458460865bab457221f1d448dd9791d24c4e5d88786180ac185df813a68d4de@3.209.45.79:30303", "enode://ca6de62fce278f96aea6ec5a2daadb877e51651247cb96ee310a318def462913b653963c155a0ef6c7d50048bba6e6cea881130857413d9f50a621546b590758@34.255.23.113:30303", "enode://279944d8dcd428dffaa7436f25ca0ca43ae19e7bcf94a8fb7d1641651f92d121e972ac2e8f381414b80cc8e5555811c2ec6e1a99bb009b3f53c4c69923e11bd8@35.158.244.151:30303", "enode://8499da03c47d637b20eee24eec3c356c9a2e6148d6fe25ca195c7949ab8ec2c03e3556126b0d7ed644675e78c4318b08691b7b57de10e5f0d40d05b09238fa0a@52.187.207.27:30303", "enode://103858bdb88756c71f15e9b5e09b56dc1be52f0a5021d46301dbbfb7e130029cc9d0d6f73f693bc29b665770fff7da4d34f3c6379fe12721b5d7a0bcb5ca1fc1@191.234.162.198:30303", "enode://715171f50508aba88aecd1250af392a45a330af91d7b90701c436b618c86aaa1589c9184561907bebbb56439b8f8787bc01f49a7c77276c58c1b09822d75e8e8@52.231.165.108:30303", "enode://5d6d7cd20d6da4bb83a1d28cadb5d409b64edf314c0335df658c1a54e32c7c4a7ab7823d57c39b6a757556e68ff1df17c748b698544a55cb488b52479a92b60f@104.42.217.25:30303", "enode://979b7fa28feeb35a4741660a16076f1943202cb72b6af70d327f053e248bab9ba81760f39d0701ef1d8f89cc1fbd2cacba0710a12cd5314d5e0c9021aa3637f9@5.1.83.226:30303"]

Each node is running in a Kubernetes pod with SSD, 16 CPU, and 64GB of memory. What I notice is despite that scale_verifiers is enabled, both nodes only consume 1 CPU all the time as shown below.

eth-parity-0       1025m        34660Mi
eth-parity-1       1028m        34502Mi

The logs are much like

2020-02-15 05:34:51 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  393 Qed  #9485728   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:08 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  396 Qed  #9485730   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:08 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  393 Qed  #9485728   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:17 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  397 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:17 UTC 10 -> GetNodeData:   item 0/1 – slow state fetch for hash 0xe78a53e5f6cc66d9eb8582fc2b221dc3eff288f6c30ba22359029b0687d62fa8; took Instant { tv_sec: 6415631, tv_nsec: 100498033 }
2020-02-15 05:35:25 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  385 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   33 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC Syncing #9485347 0x7061…06ce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  397 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   34 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC Syncing #9485348 0x5ecf…7599     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  385 Qed  #9485731   23/25 peers     58 MiB chain   24 GiB db   33 MiB queue  249 KiB sync  RPC:  0 conn,    0 req/s,  430 µs
2020-02-15 05:35:25 UTC 10 -> GetNodeData:   item 0/1 – slow state fetch for hash 0xd01481ee138007c9f0d27f1c562c6f6f8f197bf20ec889f4bffbba12fab72223; took Instant { tv_sec: 6415638, tv_nsec: 480517832 }

Can someone please point out what could have gone wrong with my settings?

@fish-sammy
Copy link
Author

fish-sammy commented Feb 15, 2020

Turned on debug logging for sync module and I can't find any problem in there other than it is deactivating peers all the time?

2020-02-15 07:31:09 UTC IO Worker #0 DEBUG sync  Peer 79 timeout while we were asking them for ForkHeader; disconnecting.
2020-02-15 07:31:09 UTC IO Worker #0 DEBUG sync  Disconnected 79
2020-02-15 07:31:09 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:31:09 UTC IO Worker #2 DEBUG sync  Connected 10:Geth/v1.9.7-stable-a718daa6/linux-amd64/go1.13.4
2020-02-15 07:31:10 UTC IO Worker #1 DEBUG sync  Deactivating peer 43
2020-02-15 07:31:17 UTC IO Worker #0 INFO import  Syncing #9485532 0xfce6…4249     0.06 blk/s    2.0 tx/s    0.1 Mgas/s      0+  609 Qed  #9486145   23/25 peers     30 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:25 UTC IO Worker #1 INFO import  Syncing #9485532 0xfce6…4249     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     31 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:33 UTC IO Worker #3 INFO import  Syncing #9485533 0x5925…2ea8     0.06 blk/s    2.1 tx/s    0.2 Mgas/s      0+  609 Qed  #9486145   24/25 peers     31 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:41 UTC IO Worker #0 INFO import  Syncing #9485534 0x68b7…e3a3     0.06 blk/s    2.1 tx/s    0.2 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:50 UTC IO Worker #1 INFO import  Syncing #9485534 0x68b7…e3a3     0.06 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:58 UTC IO Worker #3 INFO import  Syncing #9485534 0x68b7…e3a3     0.12 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:31:58 UTC IO Worker #0 INFO import  Syncing #9485535 0xd326…aa8e     0.06 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     35 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:14 UTC IO Worker #0 INFO import  Syncing #9485535 0xd326…aa8e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     40 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:14 UTC IO Worker #3 INFO import  Syncing #9485535 0xd326…aa8e     0.06 blk/s    9.3 tx/s    0.6 Mgas/s      0+  609 Qed  #9486145   24/25 peers     41 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #3 INFO import  Syncing #9485536 0xbe02…1efd     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     42 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #0 INFO import  Syncing #9485536 0xbe02…1efd     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+  609 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   51 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:31 UTC IO Worker #3 DEBUG sync  Deactivating peer 69
2020-02-15 07:32:32 UTC IO Worker #0 DEBUG sync  Deactivating peer 62
2020-02-15 07:32:32 UTC IO Worker #2 DEBUG sync  Deactivating peer 35
2020-02-15 07:32:33 UTC IO Worker #3 DEBUG sync  Deactivating peer 60
2020-02-15 07:32:55 UTC IO Worker #3 INFO import  Syncing #9485537 0xbdd8…3dee     0.06 blk/s    5.8 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:55 UTC IO Worker #0 INFO import  Syncing #9485536 0xbe02…1efd     0.06 blk/s    5.8 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:32:55 UTC IO Worker #2 INFO import  Syncing #9485537 0xbdd8…3dee     0.06 blk/s    5.7 tx/s    0.3 Mgas/s      0+  597 Qed  #9486145   24/25 peers     43 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  454 µs
2020-02-15 07:33:04 UTC IO Worker #3 DEBUG sync  Disconnected 10
2020-02-15 07:33:04 UTC IO Worker #0 DEBUG sync  Deactivating peer 9
2020-02-15 07:33:04 UTC IO Worker #3 DEBUG sync  Deactivating peer 28
2020-02-15 07:33:04 UTC IO Worker #1 DEBUG sync  Deactivating peer 32
2020-02-15 07:33:05 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:05 UTC IO Worker #2 DEBUG sync  Deactivating peer 90
2020-02-15 07:33:06 UTC IO Worker #3 DEBUG sync  Deactivating peer 68
2020-02-15 07:33:06 UTC IO Worker #0 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:06 UTC IO Worker #3 DEBUG sync  Deactivating peer 72
2020-02-15 07:33:08 UTC IO Worker #3 DEBUG sync  Deactivating peer 10
2020-02-15 07:33:10 UTC IO Worker #3 DEBUG sync  Deactivating peer 35
2020-02-15 07:33:10 UTC IO Worker #3 DEBUG sync  Deactivating peer 7
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 22
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 45
2020-02-15 07:33:11 UTC IO Worker #0 DEBUG sync  Deactivating peer 28
2020-02-15 07:33:12 UTC IO Worker #2 DEBUG sync  Deactivating peer 78
2020-02-15 07:33:12 UTC IO Worker #3 DEBUG sync  Deactivating peer 72
2020-02-15 07:33:12 UTC IO Worker #2 DEBUG sync  Deactivating peer 67
2020-02-15 07:33:13 UTC IO Worker #3 DEBUG sync  Deactivating peer 65
2020-02-15 07:33:13 UTC IO Worker #1 DEBUG sync  Connected 90:Geth/v1.8.21-stable-ca0ce3e9/linux-amd64/go1.11.2
2020-02-15 07:33:13 UTC IO Worker #0 DEBUG sync  Disconnected 90
2020-02-15 07:33:13 UTC IO Worker #2 DEBUG sync  Connected 69:Geth/v1.8.21-stable-ca0ce3e9/linux-amd64/go1.11.2
2020-02-15 07:33:13 UTC IO Worker #0 DEBUG sync  Disconnected 69
2020-02-15 07:33:14 UTC IO Worker #2 DEBUG sync  Deactivating peer 59
2020-02-15 07:33:15 UTC IO Worker #2 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:15 UTC IO Worker #2 DEBUG sync  Connected 67:Geth/v1.9.8-unstable-8c1e8de8-20191125/linux-amd64/go1.13
2020-02-15 07:33:19 UTC IO Worker #0 INFO import  Syncing #9485537 0xbdd8…3dee     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   24/25 peers     44 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:19 UTC IO Worker #3 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   23/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:19 UTC IO Worker #2 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    1.0 tx/s    0.0 Mgas/s      0+  597 Qed  #9486145   24/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:28 UTC IO Worker #0 WARN sync  67 -> GetNodeData:   item 0/2 – slow state fetch for hash 0xb7e0794e3682be3c65cee8b3fcafd510fdebab7572b7e256ca22cf8ccfa7725d; took Instant { tv_sec: 6422723, tv_nsec: 831792813 }
2020-02-15 07:33:35 UTC IO Worker #2 DEBUG sync  Disconnected 67
2020-02-15 07:33:35 UTC IO Worker #0 DEBUG sync  Deactivating peer 6
2020-02-15 07:33:35 UTC IO Worker #2 DEBUG sync  Deactivating peer 9
2020-02-15 07:33:35 UTC IO Worker #0 DEBUG sync  Deactivating peer 29
2020-02-15 07:33:36 UTC IO Worker #0 DEBUG sync  Deactivating peer 68
2020-02-15 07:33:36 UTC IO Worker #0 DEBUG sync  Deactivating peer 26
2020-02-15 07:33:36 UTC IO Worker #3 WARN sync  67 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x77a74e359f735aa00acd12839ec0eacba67d51eb7d932294bdf41f998f33ae23; took Instant { tv_sec: 6422728, tv_nsec: 186277804 }
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  "Error sending nodes: Expired"
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  Deactivating peer 61
2020-02-15 07:33:36 UTC IO Worker #3 DEBUG sync  Deactivating peer 62
2020-02-15 07:33:38 UTC IO Worker #1 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:39 UTC IO Worker #2 DEBUG sync  Deactivating peer 71
2020-02-15 07:33:41 UTC IO Worker #3 DEBUG sync  Connected 62:Geth/v1.9.9-stable-01744997/windows-amd64/go1.13.4
2020-02-15 07:33:42 UTC IO Worker #3 DEBUG sync  Deactivating peer 12
2020-02-15 07:33:44 UTC IO Worker #3 INFO import  Syncing #9485538 0xcbe1…715e     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   24/25 peers     46 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:44 UTC IO Worker #2 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   23/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:44 UTC IO Worker #0 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    0.7 tx/s    0.1 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:33:46 UTC IO Worker #2 DEBUG sync  Disconnected 73
2020-02-15 07:33:48 UTC IO Worker #0 DEBUG sync  Deactivating peer 69
2020-02-15 07:33:49 UTC IO Worker #3 DEBUG sync  Deactivating peer 76
2020-02-15 07:33:49 UTC IO Worker #0 DEBUG sync  Deactivating peer 17
2020-02-15 07:33:50 UTC IO Worker #2 DEBUG sync  Deactivating peer 73
2020-02-15 07:33:50 UTC IO Worker #2 DEBUG sync  Deactivating peer 22
2020-02-15 07:33:50 UTC IO Worker #3 DEBUG sync  Deactivating peer 76
2020-02-15 07:33:51 UTC IO Worker #1 DEBUG sync  Deactivating peer 18
2020-02-15 07:33:51 UTC IO Worker #3 DEBUG sync  Deactivating peer 83
2020-02-15 07:33:52 UTC IO Worker #0 DEBUG sync  Deactivating peer 35
2020-02-15 07:33:53 UTC IO Worker #1 DEBUG sync  Deactivating peer 75
2020-02-15 07:33:53 UTC IO Worker #1 DEBUG sync  Deactivating peer 73
2020-02-15 07:33:53 UTC IO Worker #2 DEBUG sync  Connected 79:Geth/v1.9.10-stable-58cf5686/windows-amd64/go1.13.6
2020-02-15 07:34:08 UTC IO Worker #2 INFO import  Syncing #9485539 0xdf08…16f0     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   23/25 peers     48 MiB chain   24 GiB db   50 MiB queue  901 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:34:08 UTC IO Worker #3 INFO import  Syncing #9485540 0xdd5b…a8ff     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs
2020-02-15 07:34:08 UTC IO Worker #0 INFO import  Syncing #9485540 0xdd5b…a8ff     0.04 blk/s    2.4 tx/s    0.2 Mgas/s      0+  597 Qed  #9486145   24/25 peers     48 MiB chain   24 GiB db   50 MiB queue  902 KiB sync  RPC:  0 conn,    0 req/s,  248 µs

@fish-sammy
Copy link
Author

fish-sammy commented Feb 15, 2020

I started another node to re-sync the entire chain with the exact same pod specs/docker image/configuration file, and weirdly it is able to utilize more CPUs and sync initially.

eth-parity-2       5246m        2882Mi

and

2020-02-15 08:00:14 UTC IO Worker #0 INFO import  Syncing  #374270 0x1e46…4939  1343.40 blk/s 1595.6 tx/s   43.9 Mgas/s   9527+11007 Qed   #394791   23/25 peers    215 MiB chain    2 GiB db   50 MiB queue   50 MiB sync  RPC:  0 conn,    0 req/s,  112 µs

However, it ran into same issue after a while and now it looks like the same as my original nodes

eth-parity-2       1059m        24888Mi

and

2020-02-15 14:41:09 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µs
2020-02-15 14:41:29 UTC Syncing #1140796 0xd203…be2c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+24181 Qed  #1164977   22/25 peers    582 MiB chain   12 GiB db  121 MiB queue   69 MiB sync  RPC:  0 conn,    0 req/s,  107 µ

Not sure if it is relevant at all, but I use a custom Dockerfile which just downloads the executable as following

FROM debian:buster-slim as runner

ENV VERSION 2.7.2
ENV SHA256SUM fe992f0c9b229a4406e82b9ff6d388f4acb4d6ce2782cb79b7bc379e7965ae34
# show backtraces
ENV RUST_BACKTRACE 1

RUN set -ex \
  && apt-get update \
  && apt-get install -qq --no-install-recommends ca-certificates wget curl \
  && rm -rf /var/lib/apt/lists/*

RUN set -ex \
  && wget -q "https://releases.parity.io/ethereum/v${VERSION}/x86_64-unknown-linux-gnu/parity" \
  && echo "${SHA256SUM} parity" | sha256sum -c \
  && chmod u+x parity \
  && mv parity /usr/local/bin/ \
  && apt-get remove -qq wget

COPY ./config.testnet.toml /root/config.testnet.toml
COPY ./config.toml /root/config.toml
COPY ./docker-entrypoint.sh /root/entrypoint.sh
COPY ./check-readiness.sh /check-readiness.sh
RUN chmod +x /check-readiness.sh
COPY ./check-liveness.sh /check-liveness.sh
RUN chmod +x /check-liveness.sh

EXPOSE 9091 9092
ENTRYPOINT ["/root/entrypoint.sh"]

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 15, 2020

The GetNodeData entries in the logs are likely Geth nodes fast syncing off your node (it could also be light clients but that seems much less likely); the slow database fetches are really very very slow. Never seen anything that bad tbh.
I would definitely try with a shorter pruning history, 20 000 blocks is very high. Are you sure you need that much state?
Even if you do need the state for the high number of old blocks, as an experiment, I'd dial down that to, say, 100, and also remove the increased verifier threads and the extra cache settings, we've seen reports on memory leaks in some configurations. In other words, try with a config as close to the defaults as possible and let us know if the node behaves better. If that helps, then we can turn your settings back on one at a time. Thanks!

@fish-sammy
Copy link
Author

fish-sammy commented Feb 16, 2020

@dvdplm Thanks for replying! Unfortunately, due to the nature of the project I am working on, we need to keep 3 days worth of state data to be able to replay transactions. We have been running previous versions of Parity with the same pruning history for almost 2 years and never had any problem with it. With that said, I will try reducing the pruning history and get back to you here. I am just wondering if I reduce the pruning history right now and increase it later, will the node be able to get back to those state data retroactively? Or will it just then continue storing more since the point when I turn it back?

@fish-sammy
Copy link
Author

fish-sammy commented Feb 16, 2020

@dvdplm I reduced my configuration to the following as you suggested

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "on"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 2000 old state entries.
pruning_history = 100

The node now is not syncing at all

2020-02-16 04:17:19 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:37 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:46 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:46 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:17:54 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:03 UTC 82 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x6886e89b660f08cf1192ea5b1d524bb678ad82c96dcff9149aa5fd0c9dfb7b1d; took Instant { tv_sec: 27684, tv_nsec: 539620713 }
2020-02-16 04:18:03 UTC 82 -> GetNodeData:   item 0/178 – slow state fetch for hash 0x424d087e449e866929ffaff1ba29ca6938e652626309900478977dad4d6b5622; took Instant { tv_sec: 27678, tv_nsec: 576652674 }
2020-02-16 04:18:12 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:12 UTC 82 -> GetNodeData:   item 0/2 – slow state fetch for hash 0x6402f32704242f3a5e9fa5fc997ba4eeb42cdb37d22712bfce1770fa60751c40; took Instant { tv_sec: 27688, tv_nsec: 384819723 }
2020-02-16 04:18:21 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   22/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:21 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      9 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:30 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:48 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:57 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:18:57 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:06 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:24 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      4 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:32 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:32 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:41 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:19:58 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:07 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      5 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:07 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      6 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:16 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      7 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs
2020-02-16 04:20:34 UTC Syncing #9486815 0x4078…67d2     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 4976 Qed  #9491799   21/25 peers      8 MiB chain   24 GiB db  456 MiB queue    7 MiB sync  RPC:  0 conn,    0 req/s,    0 µs

Is it possible that my database is corrupted as a result of upgrading from 2.5.13 to 2.7.2 in a way which does not crash Parity but only significantly slows it down? If that is the case, this might be reproducible as both of my nodes are having the same issue.

@EthanHee
Copy link

I have the same problem when i use the docker image from "btccom/parity:v2.5.13-btcpool".here is my config.toml and logs:

# Parity Config Generator
# https://paritytech.github.io/parity-config-generator/
#
# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Main Network (ETH)
chain = "foundation"
# Parity continously syncs the chain
mode = "active"
# Disables auto downloading of new releases. Not recommended.
no_download = true

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

#[dapps]
# You won't be able to access any web Dapps.
#disable = true

[rpc]
#  JSON-RPC will be listening for connections on IP 0.0.0.0.
interface = "0.0.0.0"
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]

[mining]
# Account address to receive reward when block is mined.
author = "0xaa185ca60063d8418e0f623c5d39713bcc13041a"
# Blocks that you mine will have this text in extra data field.
# Set a different node-id for each node to ensure results of getwork are unique.
extra_data = "Easy to Mine"
# Prepare a block to seal even when there are no miners connected.
force_sealing = true
# Force the node to author new blocks when a new uncle block is imported.
reseal_on_uncle = true
# New pending block will be created for all transactions (both local and external).
reseal_on_txs = "all"
# New pending block will be created only once per 4000 milliseconds.
reseal_min_period = 4000
# Parity will keep/relay at most 8192 transactions in queue.
tx_queue_size = 8192
tx_queue_per_sender = 128

[network]
# Parity will sync by downloading latest state first. Node will be operational in couple minutes.
warp = true
# Specify a path to a file with peers' enodes to be always connected to.
reserved_peers = "/home/parity/.local/share/io.parity.ethereum/peer.list"
# Parity will try to maintain connection to at least 50 peers.
min_peers = 100
# Parity will maintain at most 200 peers.
max_peers = 200

[misc]
logging = "own_tx=trace,sync=info,chain=info,network=info,miner=trace"
log_file = "/home/parity/.local/share/io.parity.ethereum/parity.log"

[footprint]
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# If defined will never use more then 1024MB for all caches. (Overrides other cache settings).
cache_size = 1024

[snapshots]
# Disables automatic periodic snapshots.
disable_periodic = true
2020-02-16 08:31:03.882859050 UTC IO Worker #1 INFO import  Syncing #9465825 0xa517…b340     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-16 08:31:08.862218367 UTC IO Worker #1 INFO import  Syncing #9465825 0xa517…b340     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:13.867002799 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.20 blk/s   35.8 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:31:18.863398454 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:23.862675134 UTC IO Worker #1 INFO import  Syncing #9465826 0x54b9…9d14     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:31:28.864881371 UTC IO Worker #2 INFO import  Syncing #9465827 0x3bab…263e     0.20 blk/s   49.4 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   13 µs
2020-02-16 08:31:33.864014689 UTC IO Worker #2 INFO import  Syncing #9465827 0x3bab…263e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:38.869048210 UTC IO Worker #1 INFO import  Syncing #9465827 0x3bab…263e     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:43.872461997 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.40 blk/s   50.0 tx/s    2.1 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:48.880003171 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:53.865408752 UTC IO Worker #1 INFO import  Syncing #9465829 0x178e…48e3     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   97/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:31:58.866663314 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.20 blk/s   25.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:03.269516973 UTC http.worker00 WARN rpc  eth_accounts is deprecated and will be removed in future versions: Account management is being phased out see #9997 for alternatives.
2020-02-16 08:32:03.875034170 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-16 08:32:08.870466519 UTC IO Worker #2 INFO import  Syncing #9465830 0x932f…196f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 166 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:13.879979218 UTC IO Worker #2 INFO import  Syncing #9465831 0x645e…de83     0.20 blk/s   42.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:18.878494405 UTC IO Worker #1 INFO import  Syncing #9465832 0x735e…235f     0.20 blk/s   26.8 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-16 08:32:23.886999404 UTC IO Worker #1 INFO import  Syncing #9465832 0x735e…235f     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:28.866493074 UTC IO Worker #1 INFO import  Syncing #9465833 0xff0a…abf0     0.20 blk/s   25.6 tx/s    2.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-16 08:32:33.870349018 UTC IO Worker #1 INFO import  Syncing #9465833 0xff0a…abf0     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 2367 Qed  #9468204   98/100 peers   69 MiB chain 167 MiB db 258 MiB queue 1 GiB sync  RPC:  0 conn,    1 req/s,   14 µs

the sync speed even cannot catch up mainnet's .

@EthanHee
Copy link

what can i do for that?

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 16, 2020

I am just wondering if I reduce the pruning history right now and increase it later, will the node be able to get back to those state data retroactively? Or will it just then continue storing more since the point when I turn it back?

No, once pruned there is no way to get the pruned state back; it will, as you say, continue storing more from the point you increase it again.

We have upgraded all of out bootnodes from 2.5 to 2.7 and have not seen these issues, so whatever this is, I don't think it affects everyone (making it trickier to debug ofc).

@sammy1991106 Do you think you can share your rocksdb LOG file with us? You can find it in <your base path>/chains/ethereum/db/<your account>/overlayrecent/db/LOG. It's a long shot but maybe you're hitting some kind of pathological scenario wrt to write amplification. :/

@fish-sammy
Copy link
Author

fish-sammy commented Feb 16, 2020

@dvdplm Yes absolutely! I attached the LOGs from both nodes. Let me know you find anything.

LOG-0.txt
LOG-1.txt

@fish-sammy
Copy link
Author

fish-sammy commented Feb 16, 2020

@dvdplm In the meanwhile is it doable to warp sync to a specific block X, and then turn on tracing and full sync from there?

@EthanHee
Copy link

I run another parity node like the follows:

ExecStart=/usr/bin/parity --author "*******************" --mode active --pruning fast --db-compaction ssd --cache-size 1024 --config /etc/parity/config.toml --log-file /root/parity.log

and here is my config.toml

# This config should be placed in following path:
#   %AppData%\Parity\Ethereum\config.toml

[rpc]
#  JSON-RPC will be listening for connections on IP 0.0.0.0.
interface = "0.0.0.0"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "pubsub", "net", "parity", "parity_pubsub", "traces", "shh", "shh_pubsub", "personal", "parity_accounts", "secretstore", "rpc", "private"]

the log sa folllows:

2020-02-17 14:30:22  Syncing #9469862 0x3e34…fdce     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:28  Syncing #9469863 0xce69…1a9b     0.20 blk/s   29.6 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:30:33  Syncing #9469864 0x5b65…6aae     0.20 blk/s   11.4 tx/s    0.9 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:30:37  Syncing #9469864 0x5b65…6aae     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:30:43  Syncing #9469864 0x5b65…6aae     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:47  Syncing #9469865 0x9d00…5ead     0.20 blk/s   29.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:30:53  Syncing #9469866 0x0dfa…0905     0.20 blk/s    9.6 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:30:55  eth_accounts is deprecated and will be removed in future versions: Account management is being phased out see #9997 for alternatives.
2020-02-17 14:30:58  Syncing #9469867 0x6ef2…f0a9     0.20 blk/s    6.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    3 req/s,   14 µs
2020-02-17 14:31:03  Syncing #9469867 0x6ef2…f0a9     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:07  Syncing #9469867 0x6ef2…f0a9     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   89 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    2 req/s,   14 µs
2020-02-17 14:31:13  Syncing #9469868 0xf944…3f57     0.20 blk/s   29.4 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:17  Syncing #9469869 0xe821…9f5a     0.20 blk/s    8.2 tx/s    2.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:22  Syncing #9469869 0xe821…9f5a     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:28  Syncing #9469870 0xd581…2243     0.20 blk/s   20.2 tx/s    1.5 Mgas/s      0+ 1493 Qed  #9471364   35/50 peers      9 MiB chain   90 MiB db  140 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs
2020-02-17 14:31:32  Syncing #9469871 0x3a49…797e     0.20 blk/s   11.4 tx/s    2.0 Mgas/s      0+ 1481 Qed  #9471364   35/50 peers      9 MiB chain   91 MiB db  139 MiB queue   13 MiB sync  RPC:  0 conn,    1 req/s,   14 µs

the parity version is:

root@iZj6c20llck3brqzgtwgh7Z:~# parity --version
Parity Ethereum Client.
  version Parity-Ethereum/v2.7.2-stable-2662d19-20200206/x86_64-unknown-linux-gnu/rustc1.41.0
Copyright 2015-2020 Parity Technologies (UK) Ltd.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

By Wood/Paronyan/Kotewicz/Drwięga/Volf/Greeff
   Habermeier/Czaban/Gotchac/Redman/Nikolsky
   Schoedon/Tang/Adolfsson/Silva/Palm/Hirsz et al.

the same result that syc very very slowly.

hope response ,thank you .

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 18, 2020

warp sync to a specific block X, and then turn on tracing and full sync from there?

I'm afraid that is not currently possible. I don't know of a deep technical reason for that though so I think it can be done. But it's not a solution to your current problem though.

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 18, 2020

@EthanHee so your node is not a tracing node then? And using 2.5 it syncs significantly quicker?

@fish-sammy
Copy link
Author

fish-sammy commented Feb 18, 2020

@dvdplm I have re-synced two 2.5.13 nodes and two 2.7.2 nodes, with warp sync turned on and warp barrier targetting block 9450000. I realized that we only use trace_replayTransaction which is part of the Ad-hoc Tracing API and does not require the trace db. Here is what I observed.

  1. 2.5.13 syncs to the tip of chain way faster than 2.7.2. I can't really say how much faster because I ended up killing the 2.7.2 ones before they caught up. By the time 2.5.13 nodes were caught up, the 2.7.2 ones were still > 20000 blocks away from the tip.
  2. After both 2.5.13 and 2.7.2 finished the initial state download and were running full sync, I redirected our system to run against them to see how is the performance of trace_replayTransaction. What we basically do is simply replaying all transactions in blocks and asking for stateDiff only. The 2.5.13 nodes were way faster. I didn't do the math, but visually we were getting all state diffs like ~15 seconds per block with 2.7.2 v.s. less than 10 seconds per block with 2.5.13. (we experimented with both for like 2 hours so it should not really be that just transaction counts or gas cost in blocks are uneven)
  3. When the trace_replayTransaction calls were flooded to the nodes, 2.5.13 was able to use all the CPUs available to them while 2.7.2 only used 3 or 4 out of 16 cores (we set rpc server_threads to match the number of CPUs we have on the machine).

Let me know if you find anything. We would love to provide logs, debug and PR to help to solve the issue.

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 19, 2020

2. After both 2.5.13 and 2.7.2 finished the initial state download and were running full sync, I redirected our system to run against them to see how is the performance of trace_replayTransaction. What we basically do is simply replaying all transactions in blocks and asking for stateDiff only. The 2.5.13 nodes were way faster. I didn't do the math, but visually we were getting all state diffs like ~15 seconds per block with 2.7.2 v.s. less than 10 seconds per block with 2.5.13. (we experimented with both for like 2 hours so it should not really be that just transaction counts or gas cost in blocks are uneven)

Do I understand you correctly that here the two nodes are not archive nodes and not tracing, just plain old vanilla full nodes? I have benchmarked an archive+tracing node at both 2.5 and 2.7 today and I have not found any significant performance difference but my node is not fully synced. I have ~2.5 million blocks in the DB. Here are some numbers running trace_replayBlockTransactions for block 0x1a3d52 which returns a big trace (more than 50Mb) on both versions:

2.7, single big trace, single thread
------------------------------------
	$ wrk --latency -c 1 -d 1m -t 1 --timeout 5 -s ./trace.lua http://127.0.0.1:8545
	Running 1m test @ http://127.0.0.1:8545
	  1 threads and 1 connections
	  Thread Stats   Avg      Stdev     Max   +/- Stdev
	    Latency     1.08s   111.18ms   1.42s    74.55%
	    Req/Sec     0.22      0.42     1.00     78.18%
	  Latency Distribution
	     50%    1.06s
	     75%    1.16s
	     90%    1.25s
	     99%    1.42s
	  55 requests in 1.00m, 2.72GB read
	Requests/sec:      0.92
	Transfer/sec:     46.39MB
	
2.5, single big trace, single thread
------------------------------------
	$ wrk --latency -c 1 -d 1m -t 1 --timeout 5 -s ./trace.lua http://127.0.0.1:8545
	Running 1m test @ http://127.0.0.1:8545
	  1 threads and 1 connections
	  Thread Stats   Avg      Stdev     Max   +/- Stdev
	    Latency     1.15s   104.33ms   1.43s    65.38%
	    Req/Sec     0.02      0.14     1.00     98.08%
	  Latency Distribution
	     50%    1.14s
	     75%    1.23s
	     90%    1.29s
	     99%    1.43s
	  52 requests in 1.00m, 2.57GB read
	Requests/sec:      0.87
	Transfer/sec:     43.83MB

Do you think you could run time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["0x1a3d52", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:8545 > out-2.7.json on your 2.5 and 2.7 nodes and report back what timings you get? It's the same block I used with wrk above and here I see response times between 1.3 and 1.8 seconds on both versions, consistent with the results of wrk above.
Thanks!

EDIT: I ran my benchmark with --mode offline to reduce the noise on the node.

@richardpringle
Copy link

richardpringle commented Feb 19, 2020

Hey @dvdplm, I'm working with @sammy1991106 on this. While we wait for him to get you those numbers, I will say that one of the main concerns is performance under load. It seems as though 2.5 was able to better manage a fleury of trace_replayTransaction requests while it was syncing. It appears as though 2.5 is able to parallelize more work than 2.7. I'll have to take another look at the changes between 2.5 and 2.7 and see if I have any suspicions about what could have affected this.

In other words, I suspect that a single request on a single cpu while in offline mode will not yield a significant difference. @sammy1991106, I'll let you prove or disprove my hypothesis.

@richardpringle
Copy link

Possible related issues:
#11502
#11503

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 19, 2020

I'll have to take another look at the changes between 2.5 and 2.7 and see if I have any suspicions about what could have affected this.

Lovely, this definitely could use more eyes. One place to look is the switch to jsonrpc v14 and the changes related to threading, e.g. this. During our testing we did not see any regressions but we did not test on tracing nodes.

In other words, I suspect that a single request on a single cpu while in offline mode will not yield a significant difference.

I agree; what I was after is a confirmation that what I'm seeing on my system matches up with what you guys are seeing. If otoh we have a severe regression on a single query on a single thread I'd be more inclined to look into the database layer.

@richardpringle
Copy link

richardpringle commented Feb 20, 2020

Have to re-sync to get the block 0x1a3d52, but in the meantime, I'll post some "benchmarks" I ran with the state that we had.

Block: 0x9146a2

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  13.3M     59  0:00:02  0:00:02 --:--:-- 13.3M
real	0m2.130s
user	0m0.005s
sys	0m0.035s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.5.13
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  4716k     20  0:00:06  0:00:05  0:00:01 7521k
real	0m5.994s
user	0m0.008s
sys	0m0.037s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.6.8
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27.2M  100 27.2M  100   122  11.0M     49  0:00:02  0:00:02 --:--:-- 11.0M
real	0m2.557s
user	0m0.004s
sys	0m0.039s

$ echo $BLOCK $IMAGE
0x9146a2 parity-2.7.2

and block 0x914699

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  4248k     28  0:00:04  0:00:04 --:--:-- 4248k
real	0m4.396s
user	0m0.006s
sys	0m0.026s

$ echo $BLOCK $IMAGE
0x914699 parity-2.5.13
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  4360k     28  0:00:04  0:00:04 --:--:-- 4360k
real	0m4.274s
user	0m0.006s
sys	0m0.023s

$ echo $BLOCK $IMAGE
0x914699 parity-2.6.8
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.9M  100 17.9M  100   122  5255k     34  0:00:03  0:00:03 --:--:-- 5255k
real	0m3.576s
user	0m0.010s
sys	0m0.023s

$ echo $BLOCK $IMAGE
0x914699 parity-2.7.2

I noticed that if I run the query twice, I get a significant speed-up, so obviously there's some caching going on so I'm not sure that running a proper load test really gives useful results beyond the 99 percentile.

@dvdplm, I'll do the same thing for the block you were using too.

@richardpringle
Copy link

richardpringle commented Feb 21, 2020

Here are the "benchmarks" with block 0x1a3d52:

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
Warning: The file name argument '-0x1a3d52.json' looks like a flag.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  15.4M     37  0:00:03  0:00:03 --:--:-- 15.4M
real	0m3.279s
user	0m0.008s
sys	0m0.066s

$ echo $BLOCK $IMAGE
0x1a3d52 parity-2.5.13
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122   9.9M     24  0:00:05  0:00:05 --:--:-- 13.0M
real	0m5.090s
user	0m0.008s
sys	0m0.066s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.6.8
$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  16.8M     40  0:00:03  0:00:02  0:00:01 16.8M
real	0m3.004s
user	0m0.008s
sys	0m0.068s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2

Interestingly enough, I decided to run that last query a few more times to see if there was any speed up and it didn't improve by much, below is the fastest attempt:

$ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50.6M  100 50.6M  100   122  20.1M     48  0:00:02  0:00:02 --:--:-- 20.1M
real	0m2.662s
user	0m0.012s
sys	0m0.071s

$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2

Since the original posting of this issue, we have found optimizations related to our trace_replayTransaction strategy, however, that does not help us if the node eventually stops syncing. As we noticed issues related to concurrency and there have been some changes in thread management, it almost seems like our nodes are deadlocking during the syncing process (as they always tend to stop syncing altogether for 2.7.2.). Is this possible?

@dvdplm
Copy link
Collaborator

dvdplm commented Feb 25, 2020

While it's possible you're seeing a deadlock I would be more inclined to suspect thread starvation, where too small a pool of threads is continuously busy answering RPCs requests rather than making progress on sync. That's a guess though at this stage.

@richardpringle
Copy link

We actually saw the syncing issue while the rpc server was off.

@kryvel
Copy link

kryvel commented Mar 1, 2020

Same issue here
Parity Ethereum Client.
version Parity-Ethereum/v2.7.2-stable-2662d1925-20200206/x86_64-unknown-linux-gnu/rustc1.39.0

RPC server is on, but not used. Initial download of blockchain took more than 2 weeks...
Now I'm on 8455500 with 2 blk/s

@kcanty
Copy link

kcanty commented Mar 3, 2020

Also have this issue - Is there any way to revert the DB change and go back to 2.5.13 with the data? Early testing did not show this bug but now due to a stuck sync and getting behind it is a major issue as 2.7.2 cannot catch back up. @dvdplm or anyone?

any help or suggestions is greatly appreciated.

@ordian
Copy link
Collaborator

ordian commented Mar 4, 2020

@kcanty we're investigating the problem at the moment.

In the meantime your can you this tool to downgrade the database from 2.7 to 2.5.

@or2008
Copy link

or2008 commented Mar 15, 2020

any updated on this one?

@ianhe8x
Copy link

ianhe8x commented Apr 12, 2020

I got same issue. I have two nodes running on aws, one is a m5large vm and the other is run in a kubernetes cluster.
The vm one syncs fine, and i used its disk snapshot to create the other node in kubernetes.
The container node when using v2.7.2, can never catch up with latest, with mostly 0.00 blk/s.
(having a lot of GetNodeData slow warnings)
Then I downgrade its db to 2.5 and run it with v2.5.13, it became around 0.6 blk/s to 1blk/s.

@jiangxjcn
Copy link

I got the same question with parity 2.7.2.
After resync with parity 2.5.3(I start a new parity client at the same machine) , everything seems to be good.

@ianhe8x
Copy link

ianhe8x commented Apr 23, 2020

update from my side

I got same issue. I have two nodes running on aws, one is a m5large vm and the other is run in a kubernetes cluster.
The vm one syncs fine, and i used its disk snapshot to create the other node in kubernetes.
The container node when using v2.7.2, can never catch up with latest, with mostly 0.00 blk/s.
(having a lot of GetNodeData slow warnings)
Then I downgrade its db to 2.5 and run it with v2.5.13, it became around 0.6 blk/s to 1blk/s.

Update from my side:
My problem is because I/O, disk recovered from snapshot need to be warming-up to gain its full IOPS. After warming-up, my node can synced to latest block now.
Anyway, I think the 2.7.2 version is definetely slower than 2.5.x.

@CherryDT
Copy link

I have the same issue with Parity 2.7.2. It started misbehaving all of a sudden, and it syncs extremely slowly and has "slow fetch" messages as well.

I'm a bit afraid of downgrading though because I don't want to make the problem worse, I need this node to be working as soon as I can...
I'm also worried that get.parity.io seems to no longer exist; Parity was installed on this machine using bash <(curl https://get.parity.io -Lk) -r stable, so if I mess up I may not even be able to easily fix the mess afterwards. It's on Ubuntu.

What is the general advice in this case? Is the downgrade "safe"?

@kfo2010
Copy link

kfo2010 commented Apr 30, 2020

2.7.2 version...
was synced and running fine for weeks... then all of a sudden for no apparent reason... excessive...

2020-04-30 08:25:49 UTC IO Worker #1 WARN sync 9 -> GetNodeData: item 69/384 – slow state fetch for hash 0x60ffe6e056a92b05df4a3de978857e5bdf80c8110165a260df9d092aadeec00d; took Instant
{ tv_sec: 911, tv_nsec: 784678089 }
2020-04-30 08:25:55 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 159/384 – slow state fetch for hash 0xa429e294c3388463f448a8ac1c45ba461fbaa0fa3f1151d67b17b9f98f7bcf76; took Instan
t { tv_sec: 917, tv_nsec: 98440592 }
2020-04-30 08:25:55 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 54/384 – slow state fetch for hash 0x048003563c0c3a3ae64ffd0261affa50d2c1cde1a251fd8ee025e49d370ead56; took Instant
{ tv_sec: 917, tv_nsec: 438773547 }
2020-04-30 08:25:57 UTC IO Worker #3 WARN sync 9 -> GetNodeData: item 5/384 – slow state fetch for hash 0xdf6a6790a8d7bf0fda2b38aaa5b48aa870e02438e32874fb1ddd40e6841ead4a; took Instant
{ tv_sec: 919, tv_nsec: 108621033 }
2020-04-30 08:25:57 UTC IO Worker #0 WARN sync 9 -> GetNodeData: item 11/384 – slow state fetch for hash 0x85a0e528b093e7f7dd027c0f743f6dd65d74d2d7129a7d659963cc559c541539; took Instant
{ tv_sec: 919, tv_nsec: 544842060 }
2020-04-30 08:25:59 UTC IO Worker #1 WARN sync 9 -> GetNodeData: item 218/384 – slow state fetch for hash 0x6796bd131c3245f4dcf2d3ffa589a0aa2399c822a054ccb9751ad4a605be1b53; took Instan
t { tv_sec: 920, tv_nsec: 869450369 }

@fish-sammy
Copy link
Author

Just an update, we are still having the same issue with open-ethereum 3.0.0 with the following configuration

# This config should be placed in following path:
#   ~/.local/share/io.parity.ethereum/config.toml

[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"

[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091
# Threads for handling incoming connections for HTTP JSON-RPC server.
server_threads = 16

[websockets]
#  JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092

[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true

[network]
# Parity will sync by downloading latest state first. Node will be operational in couple minutes.
warp = true
# Parity will maintain additional 25 peers for snapshot sync.
snapshot_peers = 25
# Disable serving light peers.
no_serve_light = true

[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "off"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 20000 old state entries.
pruning_history = 20000
# If defined will never use more then 32768MB for all caches. (Overrides other cache settings).
cache_size = 4096

[mining]
# Maximum number of transactions per sender in the queue. By default it's 1% of the entire queue, but not less than 16.
tx_queue_per_sender = 256
# Maximum amount of memory that can be used by the transaction queue. Setting this parameter to 0 disables limiting.
tx_queue_mem_limit = 32
# Parity will keep/relay at most 8196 transactions in queue.
tx_queue_size = 8196

@jiangxjcn
Copy link

@sammy1991106
To be honest, i think 4096MB for all caches is far from enough to keep pruning-history=20000.
Maybe 32768MB is also not enough. I give parity 64000MB and it will use upto 128000MB memory.

@fish-sammy
Copy link
Author

fish-sammy commented May 26, 2020

@jiangxjcn I would agree if that cache also applies limit on pruning_history, which I am not sure if it does. The same configuration works fine for 2.5.13 though. I only see this issue with 2.7.2 and 3.0.0.

@meowsbits
Copy link
Contributor

meowsbits commented May 31, 2020

Redacted comment since it was rather large and later understood as off topic. See https://github.com/openethereum/openethereum/issues/11494#issuecomment-636523424.

@richardpringle
Copy link

@sammy1991106 didn't you notice the opposite (it synced no-problem) on ethereum-classic mainnet?

@meowsbits
Copy link
Contributor

For Ethereum Classic, probably related: https://github.com/openethereum/openethereum/issues/11104

@fish-sammy
Copy link
Author

@richardpringle No, I didn't. Both 2.7.2 and 3.0.0 work fine for ethereum-classis's Mainnet and Kotti, as well as ethereum's Ropsten.

@webcaetano
Copy link

same issue here

@or2008
Copy link

or2008 commented Jun 5, 2020

FYI just upgraded my 2.6.8 to 3.0.1 and from 1 miserable peer it currently syncing with 17/25 peers..

@phiresky
Copy link

phiresky commented Jun 5, 2020

For me, 2.7.2 just completely stops syncing every few days. The server still runs and RPC calls are still answered, but it just doesn't get any new blocks. Even though it shows 10+ connections. I have full debug logs of a few instances, but they don't seem to say anything interesting.

I think it got better when I enabled --no-periodic-snapshot (now only once per week), but it still happens. I wrote a script that just kills and restarts the parity node when it's out of date by more than 10 minutes, that works..

Edit: probably more relevant to #11737

@avolkov-dev
Copy link

I saw thie errors on OpenEthereum:3.0.1 Any ideas how to resolve this?

@avolkov-dev
Copy link

Will try thanks!

@kcanty
Copy link

kcanty commented Jun 18, 2020

@avolkov-dev yes the script from @phiresky works well for us - in production we ensure the script is running with monit and have the systemd service file for parity set to restart always.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests