Skip to content

Conversation

@mattac21
Copy link
Contributor

@mattac21 mattac21 commented Sep 25, 2025

Description

When load testing the EVM mempool with BlockSTM, I encountered the following panic (on the main branch):

GetBlock called for invalid block number - this indicates a reorg attempt block_number=1165 module=Blockchain
panic: GetBlock should never be called on a Cosmos chain due to instant finality - this indicates a reorg is being attempted
goroutine 459284 [running]:
github.com/cosmos/evm/mempool.(*Blockchain).GetBlock(0xc004108488, {0x1f, 0xa6, 0xe4, 0x89, 0x81, 0x52, 0x12, 0x8e, 0x83, ...}, ...)
	github.com/cosmos/[email protected]/mempool/blockchain.go:162 +0x365
github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).reset(0xc001e1a1e0, 0xc0017ff408, 0xc0616b8288)
	github.com/cosmos/[email protected]/mempool/txpool/legacypool/reset_production.go:28 +0x225
github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).runReorg(0xc001e1a1e0, 0xc002b0a1c0, 0xc03e190450, 0x0, 0xc02f9961e0)
	github.com/cosmos/[email protected]/mempool/txpool/legacypool/legacypool.go:1285 +0x185
created by github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).scheduleReorgLoop in goroutine 188
	github.com/cosmos/[email protected]/mempool/txpool/legacypool/legacypool.go:1214 +0x18a

Upon further investigation, legacypool.Reset will panic if it is called with newHead=oldHead+2. Based on comments, reset thinks that this is a reorg, making the assumption that reset be never be called with a skipped header in between oldHead and newHead. However this is possible for Cosmos chains as demonstrated by the test that reproduces the behavior.

In the test we use the legacypool.BroadcastFn to simulate slow processing during runReorg, however it could be anything slows that function down. If at the same time during this processing, multiple new headers are pushed to newHeadCh and processed in the txpool.loop(), then the next time legacypool.Reset is called, it will be called where newHead.ParentHash != oldhead.Hash, causing a panic since GetBlock will be called on a Cosmos chain, which is not expected.

Since this function would only panic on this condition (and we are already replacing the function during tests), I simply removed the contents of the inner if statement that would panic if we encountered it with this state. From my understanding this is a valid state and there is no need to panic here.

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • tackled an existing issue or discussed with a team member
  • left instructions on how to review the changes
  • targeted the main branch

@mattac21 mattac21 changed the title Fix: Subpool Reset Panic fix: Subpool Reset Panic Sep 25, 2025
@mattac21 mattac21 changed the title fix: Subpool Reset Panic fix(mempool): legacy subpool panic on skipped header during reset Sep 26, 2025
@mattac21 mattac21 marked this pull request as ready for review September 26, 2025 16:38
Copy link
Contributor

@aljo242 aljo242 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small nit

@vladjdk vladjdk enabled auto-merge October 9, 2025 15:26
@vladjdk vladjdk added this pull request to the merge queue Oct 10, 2025
Merged via the queue into main with commit f0cedd6 Oct 10, 2025
26 of 28 checks passed
@vladjdk vladjdk deleted the ma/fix/subpool-reset-panic branch October 10, 2025 14:12
@cloudgray cloudgray mentioned this pull request Oct 11, 2025
3 tasks
zsystm pushed a commit to zsystm/evm that referenced this pull request Nov 2, 2025
…smos#668)

* add test to demonstrate subpool reset being called with invalid block hashes

* make test fail on hash mismatch

* update txpool cosmos reorg test to use real legacypool as the subpool

* update comment

* fix chan pointer madness

* remove logic that would only panic for cosmos chains from legacypool when reset was called with a skipped header

* remove unnecessary long sleep from test for success case

* fix race in test

* changelog

* reference pr link in log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants