Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set storage compute cost > gas cost #8924

Merged
merged 6 commits into from
Apr 21, 2023

Conversation

jakmeier
Copy link
Contributor

@jakmeier jakmeier commented Apr 18, 2023

This is a protocol feature, updating the storage cost as agreed in #8006
to allow flat storage to be deployed without undercharging risks.

@jakmeier jakmeier changed the title set new compute costs feat: set storage compute cost > gas cost Apr 18, 2023
@jakmeier jakmeier marked this pull request as ready for review April 18, 2023 13:49
@jakmeier jakmeier requested a review from a team as a code owner April 18, 2023 13:49
Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but ideally we need a test in integration-tests/src/tests/client/features checking that we can fit less receipts in a block with introduced compute cost.

Copy link
Contributor

@aborg-dev aborg-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Related to this change, we would also need to create a tracking issue for resolving the discrepancy between gas and compute cost and link that issue in the Compute Costs NEP as agreed in https://github.com/near/NEPs/blob/master/neps/nep-0455.md#using-compute-costs.

This is a protocol feature, updating the storage cost as agreed in near#8006
to allow flat storage to be deployed without undercharging risks.
@jakmeier
Copy link
Contributor Author

Ok, this was more annoying than I anticipated. 😩 But finally, I have a bunch of integration tests on the client level now. They verify that the new compute costs values result in tighter chunk bounds when they should, but don't affect parameters they shouldn't.

@Longarithm please take a look at the tests and let me know if something is unclear. I tried my best to add descriptive comments but it still feels convoluted to me.

Hint: Gas costs change between v60 and v61 for the same TX regardless of compute costs. The two big reasons are:

  1. Storage operations are trie-shape dependent, so before and after upgrading has a different cost even just for that reason.
  2. Finite-wasm is also include in version 61, which increases the gas costs for WASM functions. This can also lead to fewer receipts fitting a receipt.

I solved number one by making sure I only use idempotent WASM calls, and I call them once to "warm-up" the trie in a separate block. There is still the chunk cache varying the cost but that's okay because N receipt in the same chunk will always use the exact same amount of gas.

Number 2, I don't know how to cleanly solve it on an integration test level. So I just hacked my way around it. 🙊 Basically, I carefully sized my test receipts such that with and without finite-wasm enabled, the number of receipts per chunk is the same. But then with compute costs enabled, the number of receipts that fit go down, which is the main thing I wanted to test.

Mh, maaaybe this whole test setup is overengineered and too complicated at this point... 🤔 If you have an idea how we can drastically simplify this, please let me know.

@jakmeier
Copy link
Contributor Author

Looks good to me!

Related to this change, we would also need to create a tracking issue for resolving the discrepancy between gas and compute cost and link that issue in the Compute Costs NEP as agreed in https://github.com/near/NEPs/blob/master/neps/nep-0455.md#using-compute-costs.

Tracking issue: #8938
Draft PR to link in the NEPs repo: near/NEPs#477

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I don't see a simpler way. We have environment to test just runtime without blocks/chunks execution in integration-tests/src/tests/standard_cases/mod.rs, but it doesn't support protocol upgrades.

std::env::set_var("NEAR_TESTS_IMMEDIATE_PROTOCOL_UPGRADE", "1");
near_o11y::testonly::init_test_logger();

let new_protocol_version = 61;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let new_protocol_version = 61;
let new_protocol_version = ProtocolFeature::ComputeCosts.protocol_version();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate. ProtocolFeature::ComputeCosts.protocol_version() is when we introduce NEP-455. It's just a "coincident" that we also change storage related costs in the same version. But we could also do storage cost changes in version 62.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm. Then I would introduce separate protocol feature like StorageComputeCosts for consistency.
And why we don't have specific test for ProtocolFeature::ComputeCosts then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we done that for gas price changes in the past? Because the change is defined in 61.yaml and we can't do cfg for that. So it would be a completely dead enum variant that's used nowhere and therefore doesn't turn on/off anything. And the assignment in version.rs to version 61 would make no difference at all. That's a general problem for changes that are defined by the config and not by code, like this change here.

Regarding tests, we only have tests in https://github.com/near/nearcore/blob/master/runtime/near-vm-logic/src/gas_counter.rs#L356. These are not what I consider integrations tests, though. They just check the the compute limit is as expected.

When we merged that, I felt that a integration tests as done now isn't strictly necessary, because we don't expect anything to be observable with the compute costs feature alone. Whereas now, when we set the compute costs, we know the end-to-end effect we expect and can write an integration test for it.

But now I am not sure about that reasoning. Do you think it would make sense in the future to always require integration tests when we add a protocol feature?

Copy link
Member

@Longarithm Longarithm Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we done that for gas price changes in the past?

We did it for LowerStorageCost, LowerDataReceiptAndEcrecoverBaseCost, LowerRegularOpCost and some more. I think it's a good idea to have indication of such protocol changes in the code.

Compute costs are a cornercase because they didn't change behaviour alone. And 61 in the code leaves a strong feeling that it should be tied with ProtocolFeature::ComputeCosts.

Some weird compromise I came up with:

    const INCREASED_STORAGE_COSTS_PROTOCOL_VERSION = 61;
    let new_protocol_version = INCREASED_STORAGE_COSTS_PROTOCOL_VERSION;
	assert!(new_protocol_version >= ProtocolFeature::ComputeCosts.protocol_version());
    let old_protocol_version = new_protocol_version - 1;
    ...

I'm still not fully comfortable with that but don't want to think about it too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did it for LowerStorageCost, LowerDataReceiptAndEcrecoverBaseCost, LowerRegularOpCost and some more. I think it's a good idea to have indication of such protocol changes in the code.

Oh right, I didn't know that or maybe just forgot. Seems weird to me to have a feature that doesn't actually do anything.

I like that compromise! I can see this is still quite hacky... But it does reflect the (somewhat ugly) reality that we have a protocol feature that could be turned on/off independently of what we have in 61.yaml, but that would cause problems. So having an assertion instead of just assigning ProtocolFeature::ComputeCosts.protocol_version() makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to be resolved, I'll add the merge label

@near-bulldozer near-bulldozer bot merged commit 84c4d6d into near:master Apr 21, 2023
@jakmeier jakmeier deleted the fs-compute-costs branch April 21, 2023 16:29
near-bulldozer bot pushed a commit that referenced this pull request Apr 21, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this pull request Apr 25, 2023
This is a protocol feature, updating the storage cost as agreed in #8006
to allow flat storage to be deployed without undercharging risks.
nikurt pushed a commit that referenced this pull request Apr 25, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this pull request Apr 25, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
nikurt pushed a commit that referenced this pull request Apr 28, 2023
This is a protocol feature, updating the storage cost as agreed in #8006
to allow flat storage to be deployed without undercharging risks.
nikurt pushed a commit that referenced this pull request Apr 28, 2023
The code is literally removing `protocol_feature_flat_state` and moving feature to stable protocol. We also disable `test_state_sync` as this is part of refactor we can do in Q2.

## Feature to stabilize

Here we stabilize Flat Storage for reads, which means that all state reads in the client during block processing will query flat storage instead of Trie. Flat Storage is another index for blockchain state, reducing number of DB accesses for state read from `2 * key.len()` in the worst case to 2.

This will trigger background creation of flat storage, using 8 threads and finishing in 15h for RPC node and 2d for archival node. After that all non-contract reads will go through flat storage, for which special "chunk views" will be created. When protocol upgrade happens, contracts reads will go through flat storage as well. Also compute costs will change as Option 3 suggests [here](#8006 (comment)). It is to be merged separately, but we need to ensure that both costs change and flat storage go into next release together.

## Context

Find more details in:
- Overview: https://near.github.io/nearcore/architecture/storage/flat_storage.html
- Approved NEP: https://github.com/near/NEPs/blob/master/neps/nep-0339.md
- Tracking issue: #7327

## Testing and QA

* Flat storage structs are covered by unit tests;
* Integration tests check that chain behaviour is preserved and costs are changed as expected;
* Flat storage spent ~2 months in betanet with assertion that flat and trie `ValueRef`s are the same;
* We were running testnet/mainnet nodes for ~2 months with the same assertion. We checked that performance is not degraded, see e.g. https://nearinc.grafana.net/d/Vg9SREA4k/flat-storage-test?orgId=1&var-chain_id=mainnet&var-node_id=logunov-mainnet-fs-1&from=1677804289279&to=1678088806154 checking that even with finality lag of 50 blocks performance is not impacted. Small exception is that we updated data layout several times during development, but we checked that results are unchanged.

## Checklist
- [x] Include compute costs after they are merged - #8924
- [x] https://nayduck.near.org/#/run/2916
- [x] Update CHANGELOG.md to include this protocol feature in the `Unreleased` section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants