SIMD#83 - SVM changes to allow conflicting entries#1468
SIMD#83 - SVM changes to allow conflicting entries#1468Huzaifa696 wants to merge 14 commits intoanza-xyz:masterfrom
Conversation
6de4977 to
c5df124
Compare
apfitzge
left a comment
There was a problem hiding this comment.
some initial comments on interface.
Think there's a few checks that should be moved to be using updated acct state, also can probably reduce the number of account loads by changing the interface.
| let nonce_hash = tx.message().recent_blockhash(); | ||
| if dedup_nonce_lookup.contains(nonce_hash) { | ||
| return Err(TransactionError::BlockhashNotFound); | ||
| } else { | ||
| dedup_nonce_lookup.insert(*nonce_hash); | ||
| } |
There was a problem hiding this comment.
I think this check can make sense to be here for block-validation but not block-production (currently).
Imagine we have 2 txs in a batch, both using same nonce account.
The first one fails due to some non-recordable error, 2nd one is a recordable tx.
Block-validation, it doesn't particularly matter if 1st or 2nd one causes an error - the block is invalid if it contains a non-recordable error.
Block-production is a different story though. 1st tx has non-recordable error so it will eventually get dropped; but 2nd tx would get dropped here early on and now neither tx makes it into the block.
With the current separate batches, we'd still have recorded the 2nd tx in a separate batch.
There was a problem hiding this comment.
make sense, if we remove this check from here then if multiple txs try to use the same nonce and if there are no non-recordable error in the first tx then all the subsequent txs would fail due to SystemError::NonceBlockhashNotExpired as the nonce account state is updated for the second tx.
So, the error type would be SystemError::NonceBlockhashNotExpired rather than BlockhashNotFound, right?
There was a problem hiding this comment.
I think we should ideally keep the error the same. It's easier to be confident that a tx is still protocol violating vs not if we don't change the error.
52815ff to
ecf0f99
Compare
apfitzge
left a comment
There was a problem hiding this comment.
seems like there's still a few bugs which I commented on. Also a few nits.
Need to go through all the tests next.
| all_transactions.push(sanitized_transaction); | ||
| transaction_checks.push(Err(TransactionError::BlockhashNotFound)); | ||
|
|
||
| // A transaction for two transafers with same fee payer |
There was a problem hiding this comment.
| // A transaction for two transafers with same fee payer | |
| // A transaction for two transfers with same fee payer |
apfitzge
left a comment
There was a problem hiding this comment.
Left a pretty lengthy comment on one part.
I still think this PR is too limited in it's approach and should be doing the account loading from map at the last minute rather than early on + mutating as is currently implemented.
| account_overrides: Option<&AccountOverrides>, | ||
| loaded_programs: &ProgramCacheForTxBatch, | ||
| unique_loaded_accounts: &mut UniqueLoadedAccounts, | ||
| ) -> Result<LoadedTransaction> { |
There was a problem hiding this comment.
I think we should not bind ourselves to this current return type.
LoadedTransaction should be the transaction state that is ready to be processed immediately. We ideally are not letting that be some stateful thing we're mutating, it should just be constructed after all checks immediately before execution, having the nonce and all up-to-date account info.
Here's how I view things:
This is called in load_accounts.
Ideally that function should only populate the unique_loaded_accounts map, it should not populate LoadedTransaction. Only errors are from program accounts not loading.
We should have a separate function which calculates the program indices, returning error if something fails there.
Finally we should get the LoadedTransaction immediately before execution of each transaction. Getting the up to date account, nonce. running checks on data size, rent, etc.
My imagined flow is:
// Load all accounts into `unique_loaded_accounts`; basic sanity checks on programs.
let initial_load_results = load_accounts(txs, &mut unique_loaded_accounts/* other args */);
// Calculate program indexes given the load results.
let program_indexes_results = calculate_program_indexes(txs, &initial_load_results);
// Loop over txs to execute
for (tx, load_result, program_index_result) in txs.iter().zip(initial_load_results).zip(program_indexes_results) {
// do error checks
let loaded_transaction = load_transaction(tx, load_result, program_index_result, /* other */);
// execute
if executed_successfully {
// store all accounts back into the map
update_unique_loaded_accounts(/*stuff*/);
} else if executed_but_failed {
// only update fee-payer, nonce.
limited_update_unique_loaded_accounts(/*stuff*/);
}
} There was a problem hiding this comment.
All suggested changes as per the pseudo code are incorporated. Just an addition that I've created Vec<Result<LoadedTransaction, TransactionError>> before the execution loop so that it can be utilized in commit_transactions() down the line. One thing that can be done to improve performance is to declare this vector as thread_local to avoid re-allocation on every entry. Let me know your thoughts on that?
We're currently working on updating the unit tests according to these changes.
There was a problem hiding this comment.
Update:
No need for pre-allocation of Vec<Result<LoadedTransaction, TransactionError>> after rebasing as LoadedTransaction is now moved inside ExecutedTransaction.
Unit tests also updated and verified with the latest changes.
|
Also please run clippy and fmt on your PR before committing; it will make review process more straight-forward if I do not have to also comment on fixing basic CI operations |
|
i think this needs rebase on #1636 Also, maybe we need program_cache changes as well? |
|
also needs to ensure there's no problem of non-disjoint entries for acccount_lookup_table functionality and |
From my discussions with @alessandrod, program updates are not "available" in the slot they happen in. i.e. if there's a program upgrade in slot N all txs in slot N will use the old program bytes. All txs in slot >N would see the upgrade. So I don't think there's any issues with program-cache, at least from the program modification perspective.
Not sure about I think ALT functionality should be good already. ALT resolution is "locked" to the state at start of the slot - i.e. changes are not available until next slot. |
be317ea to
1827c74
Compare
One significant change I noticed while rebasing is that the fee validation has been moved into a separate step and is now performed on an entire entry at once. Previously, this validation was done on a per-transaction basis during account loading. This change might cause issues in self-conflicting batches where transaction fee validation results change from transaction to transaction. I am considering moving the fee validation to a point where it is done before the execution of each transaction. Any thoughts on this approach? |
a043e20 to
ddd7761
Compare
2058d19 to
dfcf5aa
Compare
|
Rebased and incorporated all refactoring changes. |
dfcf5aa to
d69c3bf
Compare
cf45ca0 to
96ae1a2
Compare
check acct state change before updating in local lookup increased lookup size, no need for nonce changes add test reverted changes to combine temp state in exec and loading accts combine temp state dureing exec and acct loading adjusted unit tests according to changed behaviour update unique acct lookup after fee deduction add tests add test: InsufficientFundsForFee update acct after nonce update, add feature gate to rent collection update intermediate loaaded accts state add test: Duplicate transactions fixed unit tests except nonce issues add test: nonce hash full bug - nonce not updated in loaded tx fmt code
- moved duplicate txns check before execution - moved unique_loaded_accts struct inside load_accounts to avoid double loading - moved the data size limit check to be done on per tx basis - moved the duplicate nonce check at a later point where no non-recordable error could occur
- removed account size checking from loading accounts function - dont update accounts for failed transactions - remove extraneous comments
- incorporated the suggested refactoring - removed all complilation errors
- moved from hashset to ahashset - minimize copying in making LoadedTransaction - pre-allocation of loaded_transactions vec
- removed loadedTransactions vec - incorporated new timing counters
- updated interfaces for the used structs after rebasing - removed test_load_transaction_accounts_fee_payer as rent and fee deduction has been removed from load_accounts - bug fixes
- matched the interface changes in svm
- Refactored code according to the new API of load_accounts - Refactored unit tests in integration_tests - Removed test_inspect_account_fee_payer as we do inspection in load_accounts instead for all accounts including the fee payer - Removed test_process_transactions_account_in_use as this goes against the SIMD-83 assumptions
96ae1a2 to
c60cb7f
Compare
Problem
Leader pipeline can only process entries with non-conflicting transactions which limits the innovation in scheduling strategy and thus better TPS.
Summary of Changes
Changes to enable SVM to handle self-conflicting entries:
Related to:
#1025