Skip to content

SIMD#83 - SVM changes to allow conflicting entries#1468

Closed
Huzaifa696 wants to merge 14 commits intoanza-xyz:masterfrom
rakurai-io:allow-self-conflicting-entries-in-svm
Closed

SIMD#83 - SVM changes to allow conflicting entries#1468
Huzaifa696 wants to merge 14 commits intoanza-xyz:masterfrom
rakurai-io:allow-self-conflicting-entries-in-svm

Conversation

@Huzaifa696
Copy link
Copy Markdown

@Huzaifa696 Huzaifa696 commented May 23, 2024

Problem

Leader pipeline can only process entries with non-conflicting transactions which limits the innovation in scheduling strategy and thus better TPS.

Summary of Changes

Changes to enable SVM to handle self-conflicting entries:

  • Move all the account state changing functions (like fee and rent deduction) from loading accounts function to the execution part of the batch processing.
  • Perform the account state changing operations on tx-by-tx basis rather than at entry level. Carry the temp state of accounts during fee deduction and execution of txns. This is to use the intermediate account state for any conflicting txn in the same entry.

Related to:
#1025

@mergify mergify Bot requested a review from a team May 23, 2024 10:58
@buffalojoec buffalojoec added the CI Pull Request is ready to enter CI label May 23, 2024
@anza-team anza-team removed the CI Pull Request is ready to enter CI label May 23, 2024
@apfitzge apfitzge self-requested a review May 24, 2024 06:48
@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from 6de4977 to c5df124 Compare May 24, 2024 10:05
@apfitzge apfitzge added the CI Pull Request is ready to enter CI label May 28, 2024
@anza-team anza-team removed the CI Pull Request is ready to enter CI label May 28, 2024
Copy link
Copy Markdown

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments on interface.
Think there's a few checks that should be moved to be using updated acct state, also can probably reduce the number of account loads by changing the interface.

Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/src/account_loader.rs Outdated
Comment thread runtime/src/bank.rs Outdated
Comment on lines +3507 to +3512
let nonce_hash = tx.message().recent_blockhash();
if dedup_nonce_lookup.contains(nonce_hash) {
return Err(TransactionError::BlockhashNotFound);
} else {
dedup_nonce_lookup.insert(*nonce_hash);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check can make sense to be here for block-validation but not block-production (currently).

Imagine we have 2 txs in a batch, both using same nonce account.
The first one fails due to some non-recordable error, 2nd one is a recordable tx.

Block-validation, it doesn't particularly matter if 1st or 2nd one causes an error - the block is invalid if it contains a non-recordable error.
Block-production is a different story though. 1st tx has non-recordable error so it will eventually get dropped; but 2nd tx would get dropped here early on and now neither tx makes it into the block.
With the current separate batches, we'd still have recorded the 2nd tx in a separate batch.

Copy link
Copy Markdown
Author

@Huzaifa696 Huzaifa696 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, if we remove this check from here then if multiple txs try to use the same nonce and if there are no non-recordable error in the first tx then all the subsequent txs would fail due to SystemError::NonceBlockhashNotExpired as the nonce account state is updated for the second tx.

So, the error type would be SystemError::NonceBlockhashNotExpired rather than BlockhashNotFound, right?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ideally keep the error the same. It's easier to be confident that a tx is still protocol violating vs not if we don't change the error.

Comment thread svm/src/transaction_processor.rs Outdated
@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from 52815ff to ecf0f99 Compare June 12, 2024 14:05
Copy link
Copy Markdown

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like there's still a few bugs which I commented on. Also a few nits.

Need to go through all the tests next.

Comment thread runtime/src/bank.rs Outdated
Comment thread svm/src/account_loader.rs Outdated
Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/tests/integration_test.rs Outdated
all_transactions.push(sanitized_transaction);
transaction_checks.push(Err(TransactionError::BlockhashNotFound));

// A transaction for two transafers with same fee payer
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A transaction for two transafers with same fee payer
// A transaction for two transfers with same fee payer

Comment thread svm/tests/integration_test.rs Outdated
Copy link
Copy Markdown

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a pretty lengthy comment on one part.

I still think this PR is too limited in it's approach and should be doing the account loading from map at the last minute rather than early on + mutating as is currently implemented.

Comment thread svm/src/account_loader.rs Outdated
Comment thread svm/src/account_loader.rs Outdated
Comment thread svm/src/account_loader.rs Outdated
Comment thread svm/src/account_loader.rs Outdated
account_overrides: Option<&AccountOverrides>,
loaded_programs: &ProgramCacheForTxBatch,
unique_loaded_accounts: &mut UniqueLoadedAccounts,
) -> Result<LoadedTransaction> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not bind ourselves to this current return type.
LoadedTransaction should be the transaction state that is ready to be processed immediately. We ideally are not letting that be some stateful thing we're mutating, it should just be constructed after all checks immediately before execution, having the nonce and all up-to-date account info.

Here's how I view things:

This is called in load_accounts.
Ideally that function should only populate the unique_loaded_accounts map, it should not populate LoadedTransaction. Only errors are from program accounts not loading.

We should have a separate function which calculates the program indices, returning error if something fails there.

Finally we should get the LoadedTransaction immediately before execution of each transaction. Getting the up to date account, nonce. running checks on data size, rent, etc.

My imagined flow is:

// Load all accounts into `unique_loaded_accounts`; basic sanity checks on programs.
let initial_load_results = load_accounts(txs, &mut unique_loaded_accounts/* other args */);

// Calculate program indexes given the load results.
let program_indexes_results = calculate_program_indexes(txs, &initial_load_results);

// Loop over txs to execute
for (tx, load_result, program_index_result) in txs.iter().zip(initial_load_results).zip(program_indexes_results) {
    // do error checks
    
    let loaded_transaction = load_transaction(tx, load_result, program_index_result, /* other */);
    
    // execute
    
    if executed_successfully {
        // store all accounts back into the map
        update_unique_loaded_accounts(/*stuff*/);
    } else if executed_but_failed {
        // only update fee-payer, nonce.
        limited_update_unique_loaded_accounts(/*stuff*/);
    }
} 

Copy link
Copy Markdown
Author

@Huzaifa696 Huzaifa696 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All suggested changes as per the pseudo code are incorporated. Just an addition that I've created Vec<Result<LoadedTransaction, TransactionError>> before the execution loop so that it can be utilized in commit_transactions() down the line. One thing that can be done to improve performance is to declare this vector as thread_local to avoid re-allocation on every entry. Let me know your thoughts on that?

We're currently working on updating the unit tests according to these changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:
No need for pre-allocation of Vec<Result<LoadedTransaction, TransactionError>> after rebasing as LoadedTransaction is now moved inside ExecutedTransaction.

Unit tests also updated and verified with the latest changes.

@apfitzge
Copy link
Copy Markdown

Also please run clippy and fmt on your PR before committing; it will make review process more straight-forward if I do not have to also comment on fixing basic CI operations

@ryoqun
Copy link
Copy Markdown
Member

ryoqun commented Jun 23, 2024

i think this needs rebase on #1636

Also, maybe we need program_cache changes as well?

@ryoqun
Copy link
Copy Markdown
Member

ryoqun commented Jun 23, 2024

also needs to ensure there's no problem of non-disjoint entries for acccount_lookup_table functionality and find_send_votes() code-path.

Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/src/transaction_processor.rs Outdated
Comment thread svm/src/transaction_processor.rs Outdated
@apfitzge
Copy link
Copy Markdown

Also, maybe we need program_cache changes as well?

From my discussions with @alessandrod, program updates are not "available" in the slot they happen in. i.e. if there's a program upgrade in slot N all txs in slot N will use the old program bytes. All txs in slot >N would see the upgrade.

So I don't think there's any issues with program-cache, at least from the program modification perspective.

also needs to ensure there's no problem of non-disjoint entries for acccount_lookup_table functionality and find_send_votes() code-path.

Not sure about find_send_votes, so agree that needs to be checked.

I think ALT functionality should be good already. ALT resolution is "locked" to the state at start of the slot - i.e. changes are not available until next slot.

@apfitzge apfitzge self-assigned this Jul 17, 2024
@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from be317ea to 1827c74 Compare July 24, 2024 14:16
@Huzaifa696
Copy link
Copy Markdown
Author

Huzaifa696 commented Jul 25, 2024

i think this needs rebase on #1636

Also, maybe we need program_cache changes as well?

One significant change I noticed while rebasing is that the fee validation has been moved into a separate step and is now performed on an entire entry at once. Previously, this validation was done on a per-transaction basis during account loading. This change might cause issues in self-conflicting batches where transaction fee validation results change from transaction to transaction.

I am considering moving the fee validation to a point where it is done before the execution of each transaction. Any thoughts on this approach?

@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from a043e20 to ddd7761 Compare July 29, 2024 06:40
@Huzaifa696 Huzaifa696 marked this pull request as ready for review July 30, 2024 08:06
@Huzaifa696 Huzaifa696 requested review from apfitzge and ryoqun July 30, 2024 08:11
@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from 2058d19 to dfcf5aa Compare August 8, 2024 13:05
@Huzaifa696
Copy link
Copy Markdown
Author

Rebased and incorporated all refactoring changes.

@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from dfcf5aa to d69c3bf Compare August 8, 2024 13:10
@Huzaifa696 Huzaifa696 force-pushed the allow-self-conflicting-entries-in-svm branch from cf45ca0 to 96ae1a2 Compare September 10, 2024 10:14
Ubuntu and others added 13 commits September 10, 2024 10:15
check acct state change before updating in local lookup

increased lookup size, no need for nonce changes

add test

reverted changes to combine temp state in exec and loading accts

combine temp state dureing exec and acct loading

adjusted unit tests according to changed behaviour

update unique acct lookup after fee deduction

add tests

add test: InsufficientFundsForFee

update acct after nonce update, add feature gate to rent collection

update intermediate loaaded accts state

add test: Duplicate transactions

fixed unit tests except nonce issues

add test: nonce hash

full bug - nonce not updated in loaded tx

fmt code
- moved duplicate txns check before execution
- moved unique_loaded_accts struct inside load_accounts to avoid double loading
- moved the data size limit check to be done on per tx basis
- moved the duplicate nonce check at a later point where no non-recordable
 error could occur
- removed account size checking from loading accounts function
- dont update accounts for failed transactions
- remove extraneous comments
- incorporated the suggested refactoring
- removed all complilation errors
-  moved from hashset to ahashset
-  minimize copying in making LoadedTransaction
- pre-allocation of loaded_transactions vec
- removed loadedTransactions vec
- incorporated new timing counters
- updated interfaces for the used structs after rebasing
- removed test_load_transaction_accounts_fee_payer as rent and fee deduction has been removed from load_accounts
- bug fixes
- matched the interface changes in svm
- Refactored code according to the new API of load_accounts
- Refactored unit tests in integration_tests
- Removed test_inspect_account_fee_payer as we do inspection in
load_accounts instead for all accounts including the fee payer
- Removed test_process_transactions_account_in_use as this
goes against the SIMD-83 assumptions
@apfitzge apfitzge closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants