Skip to content

Heavy HTTP requests halt the client #3536

@ghost

Description

Description

If I do consecutive HTTP querying of the node it either halts or starts restarting multiple times in a row.

Endpoints being called:

  1. eth/v1/beacon/states/finalized/validators/BLS_KEY
  2. eth/v1/beacon/states/finalized/finality_checkpoints

Approximate request rate is 3 requests / second.

I experimentally discovered that the reason behind the halt was the high usage of endpoints 1 and 2. once I disable the --http flag and restart the client the sync starts to catch on.

Version

v3.0.0 non-portable linux

Present Behaviour

I can see in the logs that the SLOT distance keeps increasing

First set of logs:

-- Logs begin at Tue 2022-08-30 15:44:25 UTC. --
Sep 02 14:19:51 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:19:51.852 CRIT Beacon block processing error           error: ValidatorPubkeyCacheLockTimeout, service: beacon
Sep 02 14:19:51 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:19:51.853 WARN BlockProcessingFailure                  outcome: ValidatorPubkeyCacheLockTimeout, msg: unexpected condition in processing block.
Sep 02 14:19:54 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:19:54.005 INFO Syncing                                 est_time: --, distance: 34 slots (6 mins), peers: 21, service: slot_notifier
Sep 02 14:19:54 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:19:54.005 WARN Syncing deposit contract block cache    est_blocks_remaining: 4300, service: slot_notifier
Sep 02 14:20:06 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:06.101 INFO Syncing                                 est_time: --, distance: 35 slots (7 mins), peers: 16, service: slot_notifier
Sep 02 14:20:06 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:06.125 WARN Syncing deposit contract block cache    est_blocks_remaining: 3521, service: slot_notifier
Sep 02 14:20:18 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:18.005 INFO Syncing                                 est_time: 3 mins, speed: 0.14 slots/sec, distance: 31 slots (6 mins), peers: 22, service: slot_notifier
Sep 02 14:20:18 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:18.005 WARN Syncing deposit contract block cache    est_blocks_remaining: 2833, service: slot_notifier
Sep 02 14:20:30 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:30.002 INFO Syncing                                 est_time: 5 mins, speed: 0.11 slots/sec, distance: 32 slots (6 mins), peers: 26, service: slot_notifier
Sep 02 14:20:30 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:30.002 WARN Syncing deposit contract block cache    est_blocks_remaining: 2040, service: slot_notifier
Sep 02 14:20:34 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:34.611 INFO Sync state updated                      new_state: Synced, old_state: Syncing Head Chain, service: sync
Sep 02 14:20:34 ip-172-31-4-210 lighthouse[44527]: Sep 02 14:20:34.614 INFO Subscribed to topics                    topics: ["/eth2/c2ce3aa8/beacon_block/ssz_snappy", "/eth2/c2ce3aa8/beacon_aggregate_and_proof/ssz_snappy", "/eth2/c2ce3aa8/voluntary_exit/ssz_snappy", "/eth2/c2ce3aa8/proposer_slashing/ssz_snappy", "/eth2/c2ce3aa8/attester_slashing/ssz_snappy", "/eth2/c2ce3aa8/sync_committee_contribution_and_proof/ssz_snappy"], service: network
Sep 02 14:20:41 ip-172-31-4-210 systemd[1]: consensus_client.service: Main process exited, code=killed, status=9/KILL
Sep 02 14:20:41 ip-172-31-4-210 systemd[1]: consensus_client.service: Failed with result 'signal'.
Sep 02 14:20:52 ip-172-31-4-210 systemd[1]: consensus_client.service: Scheduled restart job, restart counter is at 1.
Sep 02 14:20:52 ip-172-31-4-210 systemd[1]: Stopped Lighthouse Consensus client.
Sep 02 14:20:52 ip-172-31-4-210 systemd[1]: Started Lighthouse Consensus client.
Sep 02 14:20:52 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:52.930 INFO Logging to file                         path: "/home/ubuntu/consensus/beacon/logs/beacon.log"
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:52.948 INFO Lighthouse started                      version: Lighthouse/v3.0.0-18c61a5
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:52.949 INFO Configured for network                  name: goerli
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:52.967 INFO Data directory initialised              datadir: /home/ubuntu/consensus
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:52.977 INFO Deposit contract                        address: 0xff50ed3d0ec03ac01d4c79aad74928bff48a7b2b, deploy_block: 4367322
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:53.626 INFO Hot-Cold DB initialized                 split_state: 0xcb0999c801f90f55361838cc12a84b3a0972d39abe1596665452766bea458665, split_slot: 3801600, service: freezer_db
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:53.733 INFO Refusing to checkpoint sync             msg: database already exists, use --purge-db to force checkpoint sync, service: beacon
Sep 02 14:20:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:20:53.734 INFO Starting beacon chain                   method: resume, service: beacon
Sep 02 14:21:48 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:48.358 INFO Block production enabled                method: json rpc via http, endpoints: Auth { endpoint: "http://localhost:8551/", jwt_path: "/home/ubuntu/execution/geth/jwtsecret", jwt_id: None, jwt_version: None }
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.350 INFO Beacon chain initialized                head_slot: 3801664, head_block: 0xd315…7160, head_state: 0x1c01…5bbf, service: beacon
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.350 INFO Timer service started                   service: node_timer
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.358 INFO UPnP Attempting to initialise routes    service: UPnP
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.382 INFO Libp2p Starting                         bandwidth_config: 3-Average, peer_id: 16Uiu2HAmFdSH4dQimCrzswetJGuNbD9HSnFM53eadWxKJyGP2K5q, service: libp2p
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.391 INFO ENR Initialised                         tcp: Some(9000), udp: Some(9000), ip: Some(3.71.250.111), id: 0x6d0a..8b26, seq: 3, enr: enr:-Ly4QNWW3pMeus1LYyifTz8AINPXl44KKHy2MLhqMmnAnAgMZ04Q1ueefPN70gae5JIZK87uLAxo-dtdhEr_SZYoItQDh2F0dG5ldHOIAAAAAAAAAACEZXRoMpDCzjqoAgAQIP__________gmlkgnY0gmlwhANH-m-Jc2VjcDI1NmsxoQMsLTcOH0dR-EUsJJiIOJLpaDprJ12AXqeCjcwu2wMICIhzeW5jbmV0cwCDdGNwgiMog3VkcIIjKA, service: libp2p
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.414 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAmFdSH4dQimCrzswetJGuNbD9HSnFM53eadWxKJyGP2K5q, service: libp2p
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.484 INFO HTTP API started                        listen_address: 127.0.0.1:5052
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.542 INFO Execution engine online                 service: exec
Sep 02 14:21:51 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:51.543 INFO Issuing forkchoiceUpdated               forkchoice_state: ForkChoiceState { head_block_hash: 0x050e25d6b6a9ef0e2bbf1f88dc151991fae0b21ebbed8763c3ffb8e7df9fc930, safe_block_hash: 0x1efc8e7fe5e1cb5a5ffe5e5a52eec9ff00823794909fbe40fe3e1a99b218f010, finalized_block_hash: 0x31e7adca5f80369338f6bee0ce65ba5d6c42992f79e1003c2328f503ccbf4052 }, service: exec
Sep 02 14:21:52 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:52.104 INFO Sync state updated                      new_state: Syncing Head Chain, old_state: Stalled, service: sync
Sep 02 14:21:53 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:21:53.209 INFO Imported deposit log(s)                 new: 822, total: 178915, latest_block: 7514568, service: deposit_contract_rpc
Sep 02 14:22:01 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:01.427 INFO UPnP not available                      error: IO error: Resource temporarily unavailable (os error 11), service: UPnP
Sep 02 14:22:06 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:06.001 INFO Syncing                                 est_time: --, distance: 46 slots (9 mins), peers: 10, service: slot_notifier
Sep 02 14:22:06 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:06.002 WARN Syncing deposit contract block cache    est_blocks_remaining: 5048, service: slot_notifier

Second set of logs:

Sep 02 14:22:06 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:06.001 INFO Syncing                                 est_time: --, distance: 46 slots (9 mins), peers: 10, service: slot_notifier
Sep 02 14:22:06 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:06.002 WARN Syncing deposit contract block cache    est_blocks_remaining: 5048, service: slot_notifier
Sep 02 14:22:18 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:18.012 INFO Syncing                                 est_time: --, distance: 47 slots (9 mins), peers: 9, service: slot_notifier
Sep 02 14:22:18 ip-172-31-4-210 lighthouse[44582]: Sep 02 14:22:18.014 WARN Syncing deposit contract block cache    est_blocks_remaining: 5048, service: slot_notifier

Expected Behaviour

The sync distance should be decreasing or at least some sort of error should be thrown to indicate issues.

Steps to resolve

Suggestions:

  • Investigating why so few requests crash the node
  • Adding error messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions