-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: set storage compute cost > gas cost #8924
Conversation
54a6853
to
80e918b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but ideally we need a test in integration-tests/src/tests/client/features
checking that we can fit less receipts in a block with introduced compute cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Related to this change, we would also need to create a tracking issue for resolving the discrepancy between gas and compute cost and link that issue in the Compute Costs NEP as agreed in https://github.com/near/NEPs/blob/master/neps/nep-0455.md#using-compute-costs.
This is a protocol feature, updating the storage cost as agreed in near#8006 to allow flat storage to be deployed without undercharging risks.
80e918b
to
cb22943
Compare
Ok, this was more annoying than I anticipated. 😩 But finally, I have a bunch of integration tests on the client level now. They verify that the new compute costs values result in tighter chunk bounds when they should, but don't affect parameters they shouldn't. @Longarithm please take a look at the tests and let me know if something is unclear. I tried my best to add descriptive comments but it still feels convoluted to me. Hint: Gas costs change between v60 and v61 for the same TX regardless of compute costs. The two big reasons are:
I solved number one by making sure I only use idempotent WASM calls, and I call them once to "warm-up" the trie in a separate block. There is still the chunk cache varying the cost but that's okay because N receipt in the same chunk will always use the exact same amount of gas. Number 2, I don't know how to cleanly solve it on an integration test level. So I just hacked my way around it. 🙊 Basically, I carefully sized my test receipts such that with and without finite-wasm enabled, the number of receipts per chunk is the same. But then with compute costs enabled, the number of receipts that fit go down, which is the main thing I wanted to test. Mh, maaaybe this whole test setup is overengineered and too complicated at this point... 🤔 If you have an idea how we can drastically simplify this, please let me know. |
Tracking issue: #8938 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I don't see a simpler way. We have environment to test just runtime without blocks/chunks execution in integration-tests/src/tests/standard_cases/mod.rs
, but it doesn't support protocol upgrades.
integration-tests/src/tests/client/features/increase_storage_compute_cost.rs
Show resolved
Hide resolved
integration-tests/src/tests/client/features/increase_storage_compute_cost.rs
Show resolved
Hide resolved
integration-tests/src/tests/client/features/increase_storage_compute_cost.rs
Outdated
Show resolved
Hide resolved
integration-tests/src/tests/client/features/increase_storage_compute_cost.rs
Outdated
Show resolved
Hide resolved
integration-tests/src/tests/client/features/increase_storage_compute_cost.rs
Outdated
Show resolved
Hide resolved
std::env::set_var("NEAR_TESTS_IMMEDIATE_PROTOCOL_UPGRADE", "1"); | ||
near_o11y::testonly::init_test_logger(); | ||
|
||
let new_protocol_version = 61; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let new_protocol_version = 61; | |
let new_protocol_version = ProtocolFeature::ComputeCosts.protocol_version(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is accurate. ProtocolFeature::ComputeCosts.protocol_version()
is when we introduce NEP-455. It's just a "coincident" that we also change storage related costs in the same version. But we could also do storage cost changes in version 62.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm. Then I would introduce separate protocol feature like StorageComputeCosts
for consistency.
And why we don't have specific test for ProtocolFeature::ComputeCosts
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we done that for gas price changes in the past? Because the change is defined in 61.yaml and we can't do cfg
for that. So it would be a completely dead enum variant that's used nowhere and therefore doesn't turn on/off anything. And the assignment in version.rs
to version 61 would make no difference at all. That's a general problem for changes that are defined by the config and not by code, like this change here.
Regarding tests, we only have tests in https://github.com/near/nearcore/blob/master/runtime/near-vm-logic/src/gas_counter.rs#L356. These are not what I consider integrations tests, though. They just check the the compute limit is as expected.
When we merged that, I felt that a integration tests as done now isn't strictly necessary, because we don't expect anything to be observable with the compute costs feature alone. Whereas now, when we set the compute costs, we know the end-to-end effect we expect and can write an integration test for it.
But now I am not sure about that reasoning. Do you think it would make sense in the future to always require integration tests when we add a protocol feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we done that for gas price changes in the past?
We did it for LowerStorageCost
, LowerDataReceiptAndEcrecoverBaseCost
, LowerRegularOpCost
and some more. I think it's a good idea to have indication of such protocol changes in the code.
Compute costs are a cornercase because they didn't change behaviour alone. And 61 in the code leaves a strong feeling that it should be tied with ProtocolFeature::ComputeCosts
.
Some weird compromise I came up with:
const INCREASED_STORAGE_COSTS_PROTOCOL_VERSION = 61;
let new_protocol_version = INCREASED_STORAGE_COSTS_PROTOCOL_VERSION;
assert!(new_protocol_version >= ProtocolFeature::ComputeCosts.protocol_version());
let old_protocol_version = new_protocol_version - 1;
...
I'm still not fully comfortable with that but don't want to think about it too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did it for LowerStorageCost, LowerDataReceiptAndEcrecoverBaseCost, LowerRegularOpCost and some more. I think it's a good idea to have indication of such protocol changes in the code.
Oh right, I didn't know that or maybe just forgot. Seems weird to me to have a feature that doesn't actually do anything.
I like that compromise! I can see this is still quite hacky... But it does reflect the (somewhat ugly) reality that we have a protocol feature that could be turned on/off independently of what we have in 61.yaml, but that would cause problems. So having an assertion instead of just assigning ProtocolFeature::ComputeCosts.protocol_version()
makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems to be resolved, I'll add the merge label
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2. ## Feature to stabilize Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2. This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together. ## Context Find more details in: - Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html - Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md - Tracking issue: #7327 ## Testing and QA * Flat storage structs are covered by unit tests; * Integration tests check that chain behaviour is preserved and costs are changed as expected; * Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same; * We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged. ## Checklist - [x] Include compute costs after they are merged - #8924 - [x] https://nayduck.near.org/#/run/2916 - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
This is a protocol feature, updating the storage cost as agreed in #8006 to allow flat storage to be deployed without undercharging risks.
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2. ## Feature to stabilize Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2. This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together. ## Context Find more details in: - Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html - Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md - Tracking issue: #7327 ## Testing and QA * Flat storage structs are covered by unit tests; * Integration tests check that chain behaviour is preserved and costs are changed as expected; * Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same; * We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged. ## Checklist - [x] Include compute costs after they are merged - #8924 - [x] https://nayduck.near.org/#/run/2916 - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2. ## Feature to stabilize Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2. This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together. ## Context Find more details in: - Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html - Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md - Tracking issue: #7327 ## Testing and QA * Flat storage structs are covered by unit tests; * Integration tests check that chain behaviour is preserved and costs are changed as expected; * Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same; * We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged. ## Checklist - [x] Include compute costs after they are merged - #8924 - [x] https://nayduck.near.org/#/run/2916 - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
This is a protocol feature, updating the storage cost as agreed in #8006 to allow flat storage to be deployed without undercharging risks.
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2. ## Feature to stabilize Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2. This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together. ## Context Find more details in: - Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html - Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md - Tracking issue: #7327 ## Testing and QA * Flat storage structs are covered by unit tests; * Integration tests check that chain behaviour is preserved and costs are changed as expected; * Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same; * We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged. ## Checklist - [x] Include compute costs after they are merged - #8924 - [x] https://nayduck.near.org/#/run/2916 - [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
This is a protocol feature, updating the storage cost as agreed in #8006
to allow flat storage to be deployed without undercharging risks.