Skip to content

Fix: Register all BFT sub-protocols during IBFT2→QBFT consensus migration#1

Merged
SaeeDawod merged 1 commit into
mainfrom
fix/consensus-migration-protocol-selection
Dec 1, 2025
Merged

Fix: Register all BFT sub-protocols during IBFT2→QBFT consensus migration#1
SaeeDawod merged 1 commit into
mainfrom
fix/consensus-migration-protocol-selection

Conversation

@saeeddawod
Copy link
Copy Markdown
Collaborator

@saeeddawod saeeddawod commented Dec 1, 2025

Summary

This PR fixes a critical bug that prevents successful IBFT2 → QBFT consensus migration in Besu. During migration, nodes would stall and fail to produce blocks because they could only speak one BFT wire protocol instead of both.

Problem

When running a consensus migration from IBFT2 to QBFT using a transitions configuration in genesis, the network would stall before reaching the fork block. Nodes could not exchange IBFT2 messages even though they were still running IBFT2 consensus pre-fork.

Root Cause

In ConsensusScheduleBesuControllerBuilder.createSubProtocolConfiguration(), the original code was:

return besuControllerBuilderSchedule
    .get(besuControllerBuilderSchedule.keySet().stream().skip(1).findFirst().orElseThrow())
    .createSubProtocolConfiguration(ethProtocolManager, maybeSnapProtocolManager);

Issues with this approach:

  1. Non-deterministic ordering: besuControllerBuilderSchedule is a Map (HashMap), so keySet().stream().skip(1) produces unpredictable results
  2. Only one protocol registered: Only the sub-protocol from ONE consensus mechanism was registered
  3. Peer communication failure: Nodes couldn't exchange IBFT2 messages → block production stalls

Solution

The fix registers all BFT sub-protocols from all scheduled consensus mechanisms:

  • IBF/1 (IBFT2) - for pre-migration communication
  • istanbul/100 (QBFT) - for post-migration communication

Testing

Migration Test Results

Phase Block Consensus Status
Pre-fork besu-eth#47-49 IBFT2 ✅ IbftBesuControllerBuilder
Fork besu-eth#50 QBFT ✅ QbftBesuControllerBuilder
Post-fork besu-eth#50-100+ QBFT ✅ Continuous block production

Log Evidence

# Pre-fork (IBFT2)
IbftBesuControllerBuilder | Produced #49 / 0 tx / ...

# Fork block (QBFT takes over)  
QbftBesuControllerBuilder | Imported empty block #50 / 0 tx / ...

# Post-fork (QBFT continues)
QbftBesuControllerBuilder | Produced empty block #73 / 0 tx / ...

Checklist

  • Manual migration test (IBFT2 → QBFT) - PASSED
  • No database changes
  • spotless / unit tests / acceptance tests

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Summary by Sourcery

Register all relevant BFT sub-protocols during consensus transitions to ensure nodes can communicate both before and after IBFT2→QBFT migration.

Bug Fixes:

  • Fix non-deterministic and incomplete sub-protocol registration that could prevent IBFT2 messages from being exchanged during consensus migration.

Enhancements:

  • Merge and deduplicate sub-protocol configurations from all scheduled consensus builders in a deterministic order to support multiple BFT wire protocols across transitions.

Summary by cubic

Fixes IBFT2→QBFT migration by registering all required BFT wire protocols so nodes can communicate before and after the fork and keep producing blocks.

  • Bug Fixes
    • Merge sub-protocol configs from all scheduled builders, sorted by block number.
    • Register both IBF/1 (IBFT2) and istanbul/100 (QBFT).
    • Deduplicate protocols by name to avoid double registration.
    • Removes non-deterministic selection that caused IBFT2 message failures and stalls.

Written for commit c0f000d. Summary will update automatically on new commits.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @saeeddawod, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue that caused network stalls during IBFT2 to QBFT consensus migrations in Besu. The fix ensures that nodes correctly register and utilize all required BFT wire protocols throughout the transition, enabling seamless communication and continuous block production across the fork block.

Highlights

  • Critical Bug Fix: Addresses a critical bug that prevented successful IBFT2 to QBFT consensus migration in Besu, causing nodes to stall and fail block production.
  • Root Cause Identified: The problem stemmed from the createSubProtocolConfiguration() method, which non-deterministically registered only one BFT wire protocol, preventing nodes from communicating across the migration boundary.
  • Comprehensive Protocol Registration: The solution ensures that all necessary BFT sub-protocols (IBF/1 for IBFT2 and istanbul/100 for QBFT) are registered, enabling seamless communication both before and after the migration block.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The de-duplication logic relies on SubProtocol.getName() uniqueness; consider documenting or enforcing the assumption that protocols with the same name (e.g. Eth) are wire-compatible so silently dropping later managers is always safe.
  • Since you sort by block number and keep the first occurrence when de-duplicating, it would help to clarify in a comment which consensus builder is expected to win in case of duplicate sub-protocol names (earliest vs latest transition) to avoid future confusion when adding new transitions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The de-duplication logic relies on `SubProtocol.getName()` uniqueness; consider documenting or enforcing the assumption that protocols with the same name (e.g. Eth) are wire-compatible so silently dropping later managers is always safe.
- Since you sort by block number and keep the first occurrence when de-duplicating, it would help to clarify in a comment which consensus builder is expected to win in case of duplicate sub-protocol names (earliest vs latest transition) to avoid future confusion when adding new transitions.

## Individual Comments

### Comment 1
<location> `app/src/main/java/org/hyperledger/besu/controller/ConsensusScheduleBesuControllerBuilder.java:237` </location>
<code_context>
+    // We merge all sub-protocol configurations, which registers both:
+    // - IBF/1 (IBFT2) for pre-migration communication
+    // - istanbul/100 (QBFT) for post-migration communication
+    final SubProtocolConfiguration mergedConfig = new SubProtocolConfiguration();
+    final java.util.Set<String> addedProtocolNames = new java.util.HashSet<>();
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the new sub-protocol merging logic into helper methods that encapsulate merging and de-duplication and avoid index-based parallel lists for clearer, safer code.

You can keep the new behavior but simplify the control-flow and data handling by pushing the merging/dedup logic into helpers and avoiding index-based parallel lists.

### 1. Extract the merge logic into a helper

Move the merging into a dedicated method so `createSubProtocolConfiguration` reads as “collect and merge configs” instead of doing all the work inline:

```java
@Override
protected SubProtocolConfiguration createSubProtocolConfiguration(
    final EthProtocolManager ethProtocolManager,
    final Optional<SnapProtocolManager> maybeSnapProtocolManager) {

  final SubProtocolConfiguration mergedConfig = new SubProtocolConfiguration();

  for (Map.Entry<Long, BesuControllerBuilder> entry :
      besuControllerBuilderSchedule.entrySet().stream()
          .sorted(Map.Entry.comparingByKey())
          .toList()) {

    final SubProtocolConfiguration builderConfig =
        entry.getValue().createSubProtocolConfiguration(ethProtocolManager, maybeSnapProtocolManager);

    mergeDistinctByName(mergedConfig, builderConfig);
  }

  return mergedConfig;
}

private void mergeDistinctByName(
    final SubProtocolConfiguration target,
    final SubProtocolConfiguration source) {

  final Set<String> existingNames = new HashSet<>();
  target.getSubProtocols().forEach(sp -> existingNames.add(sp.getName()));

  final List<SubProtocol> subProtocols = source.getSubProtocols();
  final List<ProtocolManager> protocolManagers = source.getProtocolManagers();

  for (int i = 0; i < subProtocols.size(); i++) {
    final SubProtocol subProtocol = subProtocols.get(i);
    if (existingNames.add(subProtocol.getName())) {
      target.withSubProtocol(subProtocol, protocolManagers.get(i));
    }
  }
}
```

This keeps all existing behavior (sorted by block, merged, deduped) but moves the complexity into a named method with a clear responsibility.

### 2. Avoid manual synchronization of parallel lists

If you can add a small helper on `SubProtocolConfiguration` (or a local helper) that returns paired entries, you can remove the index-based loop entirely:

```java
private static final class SubProtocolEntry {
  final SubProtocol protocol;
  final ProtocolManager manager;

  SubProtocolEntry(final SubProtocol protocol, final ProtocolManager manager) {
    this.protocol = protocol;
    this.manager = manager;
  }
}

private List<SubProtocolEntry> entriesOf(final SubProtocolConfiguration config) {
  final List<SubProtocolEntry> entries = new ArrayList<>();
  final List<SubProtocol> subProtocols = config.getSubProtocols();
  final List<ProtocolManager> protocolManagers = config.getProtocolManagers();
  for (int i = 0; i < subProtocols.size(); i++) {
    entries.add(new SubProtocolEntry(subProtocols.get(i), protocolManagers.get(i)));
  }
  return entries;
}

private void mergeDistinctByName(
    final SubProtocolConfiguration target,
    final SubProtocolConfiguration source) {

  final Set<String> existingNames = new HashSet<>();
  target.getSubProtocols().forEach(sp -> existingNames.add(sp.getName()));

  for (SubProtocolEntry entry : entriesOf(source)) {
    if (existingNames.add(entry.protocol.getName())) {
      target.withSubProtocol(entry.protocol, entry.manager);
    }
  }
}
```

This still respects the existing API but makes the relationship between `SubProtocol` and `ProtocolManager` explicit and less error-prone.

Either of these small refactors will keep the migration feature and deterministic behavior intact while making the method easier to read and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@SaeeDawod SaeeDawod merged commit 3e3e9e2 into main Dec 1, 2025
3 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a critical bug in the consensus migration from IBFT2 to QBFT by ensuring all necessary BFT sub-protocols are registered. The approach of iterating through all scheduled consensus builders, sorting them for determinism, and merging their sub-protocol configurations is sound. My review includes a few suggestions to improve code clarity and robustness.

entry.getValue().createSubProtocolConfiguration(ethProtocolManager, maybeSnapProtocolManager);
final List<SubProtocol> subProtocols = builderConfig.getSubProtocols();
final List<ProtocolManager> protocolManagers = builderConfig.getProtocolManagers();
for (int i = 0; i < subProtocols.size(); i++) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code assumes that subProtocols and protocolManagers lists have the same size. This is a common pattern with parallel collections and can be fragile. If protocolManagers were ever shorter than subProtocols, this would cause an IndexOutOfBoundsException. To make the code more robust, I recommend adding a precondition check to validate that the lists are of equal length before iterating, for example:

if (subProtocols.size() != protocolManagers.size()) {
  throw new IllegalStateException("Sub-protocols and protocol managers lists must have the same size.");
}

Comment on lines +253 to +256
if (!addedProtocolNames.contains(subProtocol.getName())) {
mergedConfig.withSubProtocol(subProtocol, protocolManager);
addedProtocolNames.add(subProtocol.getName());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to check for an element's presence in a Set before adding it can be made more concise. The Set.add() method returns true if the element was not already in the set, so you can perform the check and the addition in a single step within the if condition. This is a common Java idiom that improves readability.

                if (addedProtocolNames.add(subProtocol.getName())) {
                  mergedConfig.withSubProtocol(subProtocol, protocolManager);
                }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants