Skip to content

feat (chain): implement genesis UTXO extraction workaround for missing events#100

Merged
cosmir17 merged 7 commits intomainfrom
feat/genesis-utxo-extraction
Jun 16, 2025
Merged

feat (chain): implement genesis UTXO extraction workaround for missing events#100
cosmir17 merged 7 commits intomainfrom
feat/genesis-utxo-extraction

Conversation

@cosmir17
Copy link
Copy Markdown
Contributor

@cosmir17 cosmir17 commented Jun 13, 2025

Implements #98 & ticket : https://shielded.atlassian.net/browse/PM-17350

Summary

Implements a workaround to extract genesis UTXOs by applying the genesis transaction to a clean ledger state, enabling the indexer to detect pre-funded accounts that are missing due to Substrate's event blocking during genesis.

Context

Problem: The midnight-indexer cannot detect pre-funded UTXOs in the genesis block because midnight-node does not emit UnshieldedTokens events during genesis block construction. This is due to Substrate PR #5463 which intentionally blocks all events when block_number.is_zero().

Impact: Pre-funded development accounts (e.g., 0000000000000000000000000000000000000000000000000000000000000001) with 5M tokens each are invisible to wallets and applications relying on the indexer.

Technical Solution

Root Cause Analysis

  • Genesis UTXOs exist in node storage but no events are emitted to notify indexers
  • Standard event-driven indexing pattern fails for block height 0
  • Investigation confirmed this is a Substrate framework limitation, not a midnight-node bug

Implementation Approach

Added special-case logic in chain-indexer/src/application.rs that:

  1. Detects genesis blocks (height = 0) during processing
  2. Applies genesis transaction to a clean LedgerState using existing apply_transaction method
  3. Extracts UTXOs from the resulting ledger state's UTXO collection
  4. Populates transaction with extracted UTXOs in created_unshielded_utxos field
  5. Continues normal processing - UTXOs are stored and exposed via GraphQL API

Key Changes

Files Modified:

  • chain-indexer/src/application.rs (+99 lines): Core extraction logic
  • chain-indexer/src/domain/zswap.rs (+1 line): Export LedgerState type
  • chain-indexer/src/infra/node/runtimes.rs (+30/-26 lines): Support for both runtime versions
  • chain-indexer/examples/test_genesis.rs (+89 lines): Integration test

Architecture Decision:

  • Workaround stays contained in indexer (no node changes required)
  • Maintains event-driven architecture for all blocks except genesis
  • Uses existing ledger state application logic to ensure consistency

Test Results

Successfully tested with the example program:

## BLOCK: height=0, hash=fe98007b9e3bd24ffa78defa4af75085bb40e5438dc79ddb135c7fb02fd1eebc
*** TESTING GENESIS UTXO EXTRACTION ***
Genesis extraction completed
*** AFTER EXTRACTION - UTXOs: 28 ***

GraphQL query shows genesis UTXOs are now properly indexed and queryable, with correct values (5M tokens each) and owner addresses.

Screenshot 2025-06-13 at 21 40 55
note 'ignore the ordering' : #99

Impact

  • ✅ Resolves genesis UTXO visibility - Pre-funded accounts now appear in indexer
  • ✅ Enables wallet development - Applications can query genesis balances via GraphQL
  • ✅ Maintains consistency - No special client-side handling required
  • ✅ Zero breaking changes - Existing APIs work unchanged

Future Considerations

This is a workaround for a node-side limitation. A follow-up spike ticket (PM-17351) explores potential node-side solutions like:

  • Genesis runtime API for UTXO data
  • StateKey storage query endpoints
  • Genesis event emission workarounds

The current solution provides immediate functionality while keeping long-term architectural options open.

@cosmir17 cosmir17 requested a review from hseeberger June 13, 2025 21:14
@cosmir17 cosmir17 self-assigned this Jun 13, 2025
@cosmir17 cosmir17 requested a review from a team as a code owner June 13, 2025 21:14
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 13, 2025

Logo
Checkmarx One – Scan Summary & Details48abe9e4-e32a-4222-bdc7-75511323d45a

Great job, no security vulnerabilities found in this Pull Request

Copy link
Copy Markdown
Contributor

@hseeberger hseeberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite some comments. Please address, then I need to review again.

Comment thread chain-indexer/src/application.rs Outdated
if let Err(e) =
extract_genesis_unshielded_utxos(&mut block, &ledger_state, network_id).await
{
warn!("genesis UTXO extraction failed: {}", e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think we should fail if this important feature fails.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please follow the so far consistently applied approach for error handling to fail fast also here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from warning to fail-fast error handling.

Comment thread chain-indexer/src/application.rs Outdated
#[trace]
pub async fn extract_genesis_unshielded_utxos(
block: &mut Block,
_ledger_state: &LedgerState,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass the ledger state if you do not use it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused _ledger_state parameter

Comment thread chain-indexer/src/application.rs Outdated
let genesis_transaction = block
.transactions
.get_mut(0)
.context("cannot get genesis transaction")?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the convention to not use "cannot" for context invocations, because it would lead to a chain of "cannot"s. So please remove here and elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "cannot" from all .context() calls

Comment thread chain-indexer/src/application.rs Outdated
ledger_state: &LedgerState,
genesis_transaction: &mut Transaction,
) -> Result<usize, anyhow::Error> {
use std::ops::Deref;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change your IDE settings to not always add the use lines in the function scope. This is annoying to review and we likely miss some.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all imports to file top - no more function-scoped imports anywhere.

Comment thread chain-indexer/src/application.rs Outdated
let utxo_state = &midnight_ledger_state.utxo;
let mut extracted_count = 0;
for utxo in utxo_state.utxos.iter() {
let utxo_data = utxo.deref();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite this line and the following to not explicitly call deref. I guess you can omit this line and for the below one use convert_midnight_utxo_to_indexer_utxo(&*utxo_data, ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - removed the explicit deref and simplified

Comment thread chain-indexer/src/application.rs Outdated
fn convert_user_address_to_unshielded(
user_address: &midnight_coin_structure::coin::UserAddress,
) -> Result<indexer_common::domain::UnshieldedAddress, anyhow::Error> {
let address_bytes: [u8; 32] = user_address.0.0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove type annotation and fold this simple expression into the next line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

spent_unshielded_utxos_info.insert(tx_hash.into(), abstracted_spent);
}
// Use the most recent transaction hash, or fallback for genesis/failed transactions
let tx_hash = current_tx_hash.unwrap_or_else(|| [0u8; 32].into());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this change. Please add a better/more descriptive comment.
I thought that for the genesis block there are not events, so how is this change related to the topic of this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this change. Please add a better/more descriptive comment.

Enhanced comment with context.

I thought that for the genesis block there are not events, so how is this change related to the topic of this PR?

You're absolutely correct that genesis blocks don't emit events - that's exactly why we need the workaround in application.rs.

This runtimes.rs change is not related to genesis events. It's a separate fix for failed transactions in general (any block height) that don't have transaction context.

The issue: When transactions fail and no TxStart/TxSuccess event provides context, current_tx_hash becomes None. But UnshieldedTokens events can still be emitted for these failed transactions. Without a transaction hash, these events would be dropped.

The fix: Use a fallback hash [0u8; 32] to group events from failed transactions, ensuring they're still processed even without transaction context.

This change improves event processing reliability for failed transactions across all blocks, not just genesis. The genesis-specific logic is entirely in application.rs where we extract UTXOs directly from ledger state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise my explanation was incorrect.

the [0u8; 32] fallback isn't about 'completely failed transactions' having valid UTXOs.

What's actually happening:

  1. From my investigation: Unshielded functionality in midnight-node is in a transitional/unstable state - all unshielded tests are ignored with #[ignore = "TODO UNSHIELDED"]
  2. Event emission issues: There are known gaps in UnshieldedTokens event emission, particularly around:
    - Empty transactions (no UTXO changes)
    - Partial transaction failures where successful segments still emit events
    - Event ordering issues
  3. The fallback hash: Groups 'orphaned' UnshieldedTokens events that lack proper transaction context - not from failed transactions, but from events that get emitted when current_tx_hash is None due to processing issues.

Given that unshielded functionality appears to be incomplete/unstable in midnight-node, this change might be addressing event processing gaps rather than legitimate UTXO creation from failed transactions.

Should I investigate the specific scenarios where current_tx_hash becomes None to verify these events are actually valid? The midnight-node codebase suggests there may be broader unshielded functionality issues we need to account for.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you ask Andy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while not part of this PR, I think that there is no code setting the current_tx_hash back to None at the end of processing the UnshieldedTokens event. So if the "in the next round" there is no TxApplied or TxPartialSuccess, then the old current_tx_hash will be used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing an investigation now.

Copy link
Copy Markdown
Contributor Author

@cosmir17 cosmir17 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right to question this, and I've now done a thorough investigation in the midnight-node codebase to understand exactly when current_tx_hash is None.

Key Finding: The [0u8; 32] fallback hash addresses legitimate system transactions, not failed user transactions.

Primary scenario: System transactions like block rewards that create UTXOs via mint_coins without going through normal transaction processing:

// In on_finalize - Block reward minting (pallets/midnight/src/lib.rs)
match LedgerApi::mint_coins(&network_id, &state_key, reward, &beneficiary[..]) {
Ok(new_state_key) => {
Self::deposit_event(Event::PayoutMinted(...));
}
}

These system operations:

  • Create legitimate unshielded UTXOs
  • Don't have transaction hashes (they're not user transactions)
  • Represent valid state changes that should be indexed

Your concern about completely failed transactions having valid UTXOs doesn't apply here because:

  • Completely failed transactions return early and emit no events
  • Partial failures only emit events for successful segments (failed segments filtered out)
  • UTXOs with [0u8; 32] hash come from system transactions, not failed user transactions

The fallback hash is legitimate and correctly handles system transactions that create UTXOs outside the normal transaction flow. These represent actual ledger state changes that the indexer should capture.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a (short) respective comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while not part of this PR, I think that there is no code setting the current_tx_hash back to None at the end of processing the UnshieldedTokens event. So if the "in the next round" there is no TxApplied or TxPartialSuccess, then the old current_tx_hash will be used.

You're absolutely right - that's a real bug I missed! The current_tx_hash isn't reset to None after processing, so subsequent UnshieldedTokens events without corresponding TxApplied/TxPartialSuccess events would incorrectly use stale transaction hashes from previous transactions.

This means the fallback hash mechanism isn't working as intended. We need to fix this by resetting the transaction hash context appropriately.

The fix should be:
Event::Midnight(midnight::Event::TxApplied(tx_applied)) => {
current_tx_hash = Some(tx_applied.tx_hash);
}
Event::Midnight(midnight::Event::TxPartialSuccess(tx_partial)) => {
current_tx_hash = Some(tx_partial.tx_hash);
}
Event::Midnight(midnight::Event::UnshieldedTokens(event_data)) => {
let tx_hash = current_tx_hash.unwrap_or_else(|| [0u8; 32].into());
// ... process event
current_tx_hash = None; // Reset after processing
}

Good catch - this explains why the fallback might not be working as expected in practice!

made a commit

Comment thread chain-indexer/src/application.rs Outdated
let extracted_count = extract_utxos_from_ledger_state(&clean_ledger_state, genesis_transaction)
.context("cannot extract UTXOs from ledger state after genesis transaction")?;

if extracted_count == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this, because for mainnet there will not be any pre-funded wallets and even for the other environments it is not the responsibility of the Indexer to expect transactions there; it should index these, but not warn if they are absent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now extracted_count becomes obsolete and the function extract_utxos_from_ledger_state can be changed to return the extracted utxos instead of unit, making it less imperative. Then the mutation of the transaction can happen here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed assumption about genesis UTXOs.

Now extracted_count becomes ....

Yes - changed extract_utxos_from_ledger_state to return Vec instead of usize, removed the genesis_transaction mutation parameter, and moved the mutation to the caller:

let utxos = extract_utxos_from_ledger_state(&clean_ledger_state)?;
genesis_transaction.created_unshielded_utxos.extend(utxos);

This makes it functional instead of imperative, and extracted_count is now obsolete.

Comment thread chain-indexer/src/application.rs Outdated
// Handle genesis block: extract any pre-funded unshielded UTXOs
if block.height == 0 {
if let Err(e) =
extract_genesis_unshielded_utxos(&mut block, &ledger_state, network_id).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the signature of extract_genesis_unshielded_utxos only taking shared references and returning a "unit result", also looking at its implementation below: where are the extracted UTXOs stored?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The original signature was incorrect - it took &Block (shared reference) but needed to mutate the genesis transaction.

Fixed by changing to &mut Block and implementing the mutation inside the function

Comment thread chain-indexer/src/application.rs Outdated
);
}

// Handle genesis block: extract any pre-funded unshielded UTXOs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add trailing dot (please start memoizing this, I am adding this comment in too many reviews).

@hseeberger
Copy link
Copy Markdown
Contributor

I wonder whether we should add a test to e2e.rs to IndexerData::collect (at the end of the function) which for a network ID other than MainNet verifies that there are UTXOs for the genesis block.

@cosmir17
Copy link
Copy Markdown
Contributor Author

cosmir17 commented Jun 16, 2025

I wonder whether we should add a test to e2e.rs to IndexerData::collect (at the end of the function) which for a network ID other than MainNet verifies that there are UTXOs for the genesis block.

Added a test in e2e.rs IndexerData::collect().

@cosmir17 cosmir17 requested a review from hseeberger June 16, 2025 12:11
Comment thread indexer-tests/src/e2e.rs Outdated
let genesis_transaction = &genesis_block.transactions[0];

// Genesis transaction should have created unshielded UTXOs.
assert!(!genesis_transaction.unshielded_created_outputs.is_empty(),);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the trailing comma.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here an below.

Comment thread chain-indexer/src/application.rs Outdated
);
}

// Handle genesis block: extract any pre-funded unshielded UTXOs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The called function uses a clean ledger state. But why? The transactions of the genesis block are applied to the ledger state in zswap::apply_transaction_mut. I suggest to remove this section and the extract_genesis_unshielded_utxos (which should have been called inject_...) and then put

 let utxos = extract_utxos_from_ledger_state(&clean_ledger_state)?;
genesis_transaction.created_unshielded_utxos.extend(utxos);

as well as extract_utxos_from_ledger_state and needed functions to zswap::apply_transaction_mut. There line 155. Check if the block_parent_hash is [0; 32].

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Using a clean ledger state was inefficient and architecturally wrong.

Problem with original approach:

  • Applied genesis transaction to clean state just to extract UTXOs
  • Separate concern from natural transaction processing flow
  • Duplicate work since transaction was already being applied in zswap::apply_transaction_mut

Fixed with your suggested approach:

  • Moved logic to zswap::apply_transaction_mut at line 155+ where transaction processing naturally occurs
  • Uses block_parent_hash == ByteArray([0; 32]) for genesis detection as suggested
  • Extracts UTXOs from existing ledger state after transaction application
  • Removed separate extract_genesis_unshielded_utxos function entirely
  • Inlined helper functions as you noted they were unnecessary

The architecture is much cleaner now - genesis UTXO extraction happens naturally within transaction processing rather than as a separate step. Thanks for the excellent architectural insight!

Comment thread chain-indexer/src/application.rs Outdated
) -> Result<Vec<UnshieldedUtxo>, anyhow::Error> {
let midnight_ledger_state = &ledger_state.0.0;
let utxo_state = &midnight_ledger_state.utxo;
let mut utxos = Vec::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change imperative to Rust idiomatic Iterator pattern.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to functional iterator pattern

Comment thread chain-indexer/src/application.rs Outdated
}

/// Convert midnight-ledger UserAddress to indexer UnshieldedAddress.
fn convert_user_address_to_unshielded(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need a function for this one-liner.

Comment thread chain-indexer/src/application.rs Outdated
}

/// Convert a midnight-ledger Utxo to the indexer's UnshieldedUtxo format.
fn convert_midnight_utxo_to_indexer_utxo(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need a function for this (almost) one-liner.

@cosmir17 cosmir17 requested a review from hseeberger June 16, 2025 15:35
Copy link
Copy Markdown
Contributor

@hseeberger hseeberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the additional nit-picks, then merge.

Comment thread chain-indexer/examples/node.rs Outdated
@@ -1,3 +1,16 @@
// This file is part of midnight-indexer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples do not need this header. How was it added? Manually? Usually you should run https://github.com/midnightntwrk/midnight-indexer/blob/main/.license_headers.sh to add headers. Please remove this one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread chain-indexer/examples/test_genesis.rs Outdated
@@ -0,0 +1,80 @@
// This file is part of midnight-indexer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one.

Comment thread chain-indexer/src/domain/zswap.rs Outdated
/// Extract UTXOs from the midnight-ledger state and convert them to indexer format.
fn extract_utxos_from_ledger_state(
ledger_state: &LedgerState,
) -> Result<Vec<UnshieldedUtxo>, Error> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no error case. Remove the Result here and the result handling above.

let tx_hash = current_tx_hash.unwrap_or_else(|| [0u8; 32].into());

if !event_data.created.is_empty() {
let abstracted_created = event_data.created
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the abstracted prefix, because it causes confusion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

created_unshielded_utxos_info.insert(tx_hash.into(), abstracted_created);
}
if !event_data.spent.is_empty() {
let abstracted_spent = event_data.spent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above.

@cosmir17 cosmir17 enabled auto-merge (squash) June 16, 2025 19:34
@cosmir17 cosmir17 merged commit 8c37254 into main Jun 16, 2025
14 checks passed
@cosmir17 cosmir17 deleted the feat/genesis-utxo-extraction branch June 16, 2025 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants