Skip to content

Conversation

@michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Jun 6, 2024

Issue Addressed

Closes:

Proposed Changes

Resolve a long-standing performance pitfall involving the decompression of pubkeys on startup. This PR improves Lighthouse's startup time dramastically.

Additional Info

I propose we merge this PR as the first DB schema change adopted from tree-states, after the Electra PR which implements v20 is merged:

Blocked on a fix to the v20 schema:

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress optimization Something to make Lighthouse run more efficiently. database backwards-incompat Backwards-incompatible API change blocked and removed work-in-progress PR is a work-in-progress labels Jun 6, 2024
@michaelsproul michaelsproul force-pushed the uncompressed-pubkeys branch from 420a715 to 2579248 Compare June 27, 2024 05:59
@michaelsproul michaelsproul added ready-for-review The code is ready for review v5.3.0 Q3 2024 release with database changes! labels Jun 27, 2024
@michaelsproul
Copy link
Member Author

Ready for review for 5.3.0. Let's goooo 🚀

Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as is! Love it how straightforward it is.

I feel we could drop from the cache the need to compute and store compressed keys. Those are available on any state and seem to be used exclusively for sync committees.

@michaelsproul
Copy link
Member Author

michaelsproul commented Jun 27, 2024

Good idea. We can look into that in a future PR, maybe alongside consolidating the pubkey cache on the beacon state with the global one

@michaelsproul
Copy link
Member Author

There's something wrong with this PR and I haven't worked out what. I'm getting an error about the op pool being corrupt?

Jun 28 06:50:51.097 INFO Logging to file path: "/home/michael/.lighthouse/mainnet/beacon/logs/beacon.log"
Jun 28 06:50:51.102 INFO Lighthouse started version: Lighthouse/v5.2.1-cc5789b
Jun 28 06:50:51.102 INFO Configured for network name: mainnet
Jun 28 06:50:51.102 INFO Data directory initialised datadir: /home/michael/.lighthouse/mainnet
Jun 28 06:50:51.105 INFO Deposit contract address: 0x00000000219ab540356cbb839cbe05303d7705fa, deploy_block: 1118
4524
Jun 28 06:50:51.249 INFO Hot-Cold DB initialized split_state: 0x50d483a6162e195975ed3582b517aeb6a08c76a099e55e683c7471b1
c284abf2, split_slot: 9394368, service: freezer_db
Jun 28 06:50:51.249 INFO Blob DB initialized oldest_blob_slot: Some(Slot(9392448)), path: "/home/michael/.lighthouse
/mainnet/beacon/blobs_db", service: freezer_db
Jun 28 06:50:51.249 INFO Upgrading from v19 to v20
Jun 28 06:50:51.394 INFO Upgrading from v20 to v21
Jun 28 06:51:01.205 INFO Public key decompression in progress keys_decompressed: 200000
Jun 28 06:51:11.024 INFO Public key decompression in progress keys_decompressed: 400000
Jun 28 06:51:20.861 INFO Public key decompression in progress keys_decompressed: 600000
Jun 28 06:51:30.683 INFO Public key decompression in progress keys_decompressed: 800000
Jun 28 06:51:40.366 INFO Public key decompression in progress keys_decompressed: 1000000
Jun 28 06:51:49.616 INFO Public key decompression in progress keys_decompressed: 1200000
Jun 28 06:51:59.236 INFO Public key decompression in progress keys_decompressed: 1400000
Jun 28 06:52:02.636 INFO Public key decompression complete
Jun 28 06:52:03.501 INFO Refusing to checkpoint sync msg: database already exists, use --purge-db to force checkpoint sync,
service: beacon
Jun 28 06:52:03.501 INFO Starting beacon chain method: resume, service: beacon
Jun 28 06:52:04.350 CRIT Failed to start beacon node reason: DB error whilst reading persisted op pool: SszDecodeError(OffsetIntoFixedPortion(4))
Jun 28 06:52:04.350 INFO Internal shutdown received reason: Failed to start beacon node
Jun 28 06:52:04.350 INFO Shutting down.. reason: Failure("Failed to start beacon node")

This would suggest the v20 migration is screwed, but in isolation the v20 migration works fine

@michaelsproul michaelsproul force-pushed the uncompressed-pubkeys branch from 2579248 to cc5789b Compare June 28, 2024 06:57
@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Jun 28, 2024
@michaelsproul
Copy link
Member Author

The bug was in the v20 migration as suspected:

@michaelsproul michaelsproul added ready-for-review The code is ready for review blocked and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. ready-for-review The code is ready for review labels Jul 2, 2024
@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Jul 2, 2024
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed blocked ready-for-merge This PR is ready to merge. labels Jul 3, 2024
@chong-he
Copy link
Member

chong-he commented Jul 3, 2024

It's working, but I don't see Upgrading from v19 to v20 in my log though (was with v5.2.1 with v19 database). I guess because I am not running this PR: #5712?

Jul 03 02:57:05.200 INFO Logging to file                         path: "/home/hi/.lighthouse/mainnet/beacon/logs/beacon.log"
Jul 03 02:57:05.226 INFO Lighthouse started                      version: Lighthouse/v5.2.1-2da5e5e
Jul 03 02:57:05.226 INFO Configured for network                  name: mainnet
Jul 03 02:57:05.233 INFO Data directory initialised              datadir: /home/hi/.lighthouse/mainnet
Jul 03 02:57:05.249 INFO Deposit contract                        address: 0x00000000219ab540356cbb839cbe05303d7705fa, deploy_block: 11184524
Jul 03 02:57:05.516 INFO Hot-Cold DB initialized                 split_state: 0xe64c0d60eac08824895ce932236bb33c58b61c51f80fb5a27929a0ccb2055c40, split_slot: 9429152, service: freezer_db
Jul 03 02:57:05.518 INFO Blob DB initialized                     oldest_blob_slot: Some(Slot(9429152)), path: "/home/hi/.lighthouse/mainnet/beacon/blobs_db", service: freezer_db
Jul 03 02:57:05.522 INFO Upgrading from v20 to v21
Jul 03 02:57:18.316 INFO Public key decompression in progress    keys_decompressed: 200000
Jul 03 02:57:31.084 INFO Public key decompression in progress    keys_decompressed: 400000
Jul 03 02:57:43.867 INFO Public key decompression in progress    keys_decompressed: 600000
Jul 03 02:57:56.631 INFO Public key decompression in progress    keys_decompressed: 800000
Jul 03 02:58:09.394 INFO Public key decompression in progress    keys_decompressed: 1000000
Jul 03 02:58:22.177 INFO Public key decompression in progress    keys_decompressed: 1200000
Jul 03 02:58:34.936 INFO Public key decompression in progress    keys_decompressed: 1400000
Jul 03 02:58:39.691 INFO Public key decompression complete
Jul 03 02:58:41.082 INFO Refusing to checkpoint sync             msg: database already exists, use --purge-db to force checkpoint sync, service: beacon
Jul 03 02:58:41.082 INFO Starting beacon chain                   method: resume, service: beacon
Jul 03 02:58:45.051 INFO Block production enabled                method: json rpc via http, endpoint: Auth { endpoint: "http://localhost:8551/", jwt_path: "/home/hi/.ethereum/geth/jwtsecret", jwt_id: None, jwt_version: None }
Jul 03 02:58:45.076 WARN Error connecting to eth1 node endpoint  endpoint: http://localhost:8551/, auth=true, service: deposit_contract_rpc
Jul 03 02:58:45.076 ERRO Error updating deposit contract cache   error: Invalid endpoint state: RequestFailed("eth_chainId call failed HttpClient(url: http://localhost:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))"), retry_millis: 60000, service: deposit_contract_rpc
Jul 03 02:58:50.062 INFO Beacon chain initialized                head_slot: 9429152, head_block: 0xc9a7…189a, head_state: 0xe64c…5c40, service: beacon
Jul 03 02:58:50.063 INFO Timer service started                   service: node_timer
Jul 03 02:58:50.064 INFO UPnP Attempting to initialise routes
Jul 03 02:58:50.064 INFO Execution payloads are pruned           service: freezer_db
Jul 03 02:58:50.076 INFO ENR Initialised                         quic6: None, quic4: Some(9001), udp6: None, tcp6: None, tcp4: Some(9000), udp4: None, ip4: None, id: 0x31d6..d831, seq: 85, enr: enr:-LW4QGU5CCe81bQUD3IleeHYQyb87Lb5r0HGKuqx4QH77BUkRABySr_vou7mX33Ea7rsGwfmOGsZY5TiPR_ntJDbnvxVh2F0dG5ldHOIAAAAAAAAAACEZXRoMpBqlaGpBAAAAP__________gmlkgnY0hHF1aWOCIymJc2VjcDI1NmsxoQLHRreJ_fLUw0whRbgMLqrYY3av40ZGUyqKXWqQYD2wP4hzeW5jbmV0cwCDdGNwgiMo, service: libp2p
Jul 03 02:58:50.093 INFO Libp2p Starting                         bandwidth_config: 3-Average, peer_id: 16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.096 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.097 INFO Listening established                   address: /ip4/0.0.0.0/udp/9001/quic-v1/p2p/16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.112 INFO Deterministic long lived subnets enabled, subscription_duration_in_epochs: 256, subnets_per_node: 2, service: attestation_service
Jul 03 02:58:50.112 INFO Subscribing to long-lived subnets       subnets: [SubnetId(52), SubnetId(53)], service: attestation_service
Jul 03 02:58:50.115 INFO HTTP API started                        listen_address: 127.0.0.1:5052

It is running normally afterwards

@michaelsproul
Copy link
Member Author

Oh that was a log I added when debugging this PR, and then reverted (because the same commit added some asserts). I'll re-add it

Thanks for testing!

@michaelsproul
Copy link
Member Author

@mergify queue

@mergify
Copy link

mergify bot commented Jul 4, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at d84e3e3

mergify bot added a commit that referenced this pull request Jul 4, 2024
@mergify mergify bot merged commit d84e3e3 into sigp:unstable Jul 4, 2024
@michaelsproul michaelsproul deleted the uncompressed-pubkeys branch July 4, 2024 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backwards-incompat Backwards-incompatible API change database optimization Something to make Lighthouse run more efficiently. ready-for-review The code is ready for review v5.3.0 Q3 2024 release with database changes!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants