Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on keys #1676

Merged
merged 4 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/STORAGE_KEYS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Storage keys

CosmWasm provides a generic key value store to contract developers via the
`Storage` trait. This is powerful but the nature of low level byte operations
makes it hard to use for high level storage types. In this document we discuss
the foundations of storage key composition all the way up to cw-storage-plus.

In a simple world, all you need is a `&[u8]` key which you can get e.g. using
`&17u64.to_be_bytes()`. This is an 8 bytes key with an encoded integer. But if
you have multiple data types in your contract, you want to prefix those keys in
order to avoid collisions. A simple concatenation is not sufficient because you
want to avoid collisions when part of the prefixes and part of the key overlap.
E.g. `b"keya" | b"x"` and `b"key" | b"ax"` (`|` denotes concatenation) must not
have the same binary representation.

In the early days, multiple approaches of key namespacing were discussed and
were documented here: https://github.com/webmaster128/key-namespacing. The "0x00
separated ASCIIHEX" approach was never used but "Length-prefixed keys" is used.

To recap, Length-prefixed keys have the following layout:

```
len(namespace_1) | namespace_1
| len(namespace_2) | namespace_2
| len(namespace_3) | namespace_3
| ...
| len(namespace_m) | namespace_m
| key
```

In this repo (package `cosmwasm-storage`), the following functions were
implemented:

```rust
pub fn to_length_prefixed(namespace: &[u8]) -> Vec<u8>

pub fn to_length_prefixed_nested(namespaces: &[&[u8]]) -> Vec<u8>

fn concat(namespace: &[u8], key: &[u8]) -> Vec<u8>
```

With the emerging cw-storage-plus we see two additions to that approach:

1. Manually creating the namespace and concatenating it with `concat` makes no
sense anymore. Instead `namespace` and `key` are always provided and a
composed database key is created.
2. Using a multi component namespace becomes the norm.

This led to the following addition in cw-storage-plus:

```rust
/// This is equivalent concat(to_length_prefixed_nested(namespaces), key)
/// But more efficient when the intermediate namespaces often must be recalculated
pub(crate) fn namespaces_with_key(namespaces: &[&[u8]], key: &[u8]) -> Vec<u8> {
```

In contrast to `concat(to_length_prefixed_nested(namespaces), key)` this direct
implementation saves once vector allocation since the final length can be
pre-computed and reserved. Also it's shorter to use.

Also since `to_length_prefixed` returns the same result as
`to_length_prefixed_nested` when called with one namespace element, there is no
good reason to preserve the single component version.

## 2023 updates

With the deprecation if cosmwasm-storage and the adoption of the system in
cw-storage-plus, it is time to do a few changes to the Length-prefixed keys
standard, without breaking existing users.

1. Remove the single component `to_length_prefixed` implementation and fully
commit to the multi-component version. This shifts focus from the recursive
implementation to the compatible iterative implementation.
2. Rename "namespaces" to just "namespace" and let one namespace have multiple
components.
3. Adopt the combined namespace + key encoder `namespaces_with_key` from
cw-storage-plus.
4. Add a decomposition implementation

Given the importance of Length-prefixed keys for the entire CosmWasm ecosystem,
those implementations should be maintained in cosmwasm-std. The generic approach
allows building all sorts of storage solutions on top of it and it allows
indexers to parse storage keys for all of them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

5 changes: 5 additions & 0 deletions packages/std/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ mod timestamp;
mod traits;
mod types;

// This modules is very advanced and will not be used directly by the vast majority of users.
// We want to offer it to ensure a stable storage key composition system but don't encourage
// contract devs to use it directly.
pub mod storage_keys;

pub use crate::addresses::{instantiate2_address, Addr, CanonicalAddr, Instantiate2AddressError};
pub use crate::binary::Binary;
pub use crate::coin::{coin, coins, has_coins, Coin};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,61 @@

/// Calculates the raw key prefix for a given namespace as documented
/// in https://github.com/webmaster128/key-namespacing#length-prefixed-keys
pub fn to_length_prefixed(namespace: &[u8]) -> Vec<u8> {
let mut out = Vec::with_capacity(namespace.len() + 2);
out.extend_from_slice(&encode_length(namespace));
out.extend_from_slice(namespace);
pub fn to_length_prefixed(namespace_component: &[u8]) -> Vec<u8> {
let mut out = Vec::with_capacity(namespace_component.len() + 2);
out.extend_from_slice(&encode_length(namespace_component));
out.extend_from_slice(namespace_component);
out
}

/// Calculates the raw key prefix for a given nested namespace
/// as documented in https://github.com/webmaster128/key-namespacing#nesting
pub fn to_length_prefixed_nested(namespaces: &[&[u8]]) -> Vec<u8> {
pub fn to_length_prefixed_nested(namespace: &[&[u8]]) -> Vec<u8> {
let mut size = 0;
for &namespace in namespaces {
size += namespace.len() + 2;
for component in namespace {
size += component.len() + 2;
}

let mut out = Vec::with_capacity(size);
for &namespace in namespaces {
out.extend_from_slice(&encode_length(namespace));
out.extend_from_slice(namespace);
for component in namespace {
out.extend_from_slice(&encode_length(component));
out.extend_from_slice(component);
}
out
}

/// Encodes the length of a given namespace as a 2 byte big endian encoded integer
fn encode_length(namespace: &[u8]) -> [u8; 2] {
if namespace.len() > 0xFFFF {
panic!("only supports namespaces up to length 0xFFFF")
/// Encodes the length of a given namespace component
/// as a 2 byte big endian encoded integer
fn encode_length(namespace_component: &[u8]) -> [u8; 2] {
if namespace_component.len() > 0xFFFF {
panic!("only supports namespace components up to length 0xFFFF")
}
let length_bytes = (namespace.len() as u32).to_be_bytes();
let length_bytes = (namespace_component.len() as u32).to_be_bytes();
[length_bytes[2], length_bytes[3]]
}

/// Encodes a namespace + key to a raw storage key.
///
/// This is equivalent concat(to_length_prefixed_nested(namespace), key)
/// but more efficient when the namespace serialization is not persisted because
/// here we only need one vector allocation.
pub fn namespace_with_key(namespace: &[&[u8]], key: &[u8]) -> Vec<u8> {
// As documented in docs/STORAGE_KEYS.md, we know the final size of the key,
// which allows us to avoid reallocations of vectors.
let mut size = key.len();
for component in namespace {
size += 2 /* encoded component length */ + component.len() /* the actual component data */;
}

let mut out = Vec::with_capacity(size);
for component in namespace {
out.extend_from_slice(&encode_length(component));
out.extend_from_slice(component);
}
out.extend_from_slice(key);
out
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down Expand Up @@ -69,7 +92,7 @@ mod tests {
}

#[test]
#[should_panic(expected = "only supports namespaces up to length 0xFFFF")]
#[should_panic(expected = "only supports namespace components up to length 0xFFFF")]
fn to_length_prefixed_panics_for_too_long_prefix() {
let limit = 0xFFFF;
let long_namespace = vec![0; limit + 1];
Expand Down Expand Up @@ -108,6 +131,15 @@ mod tests {
);
}

#[test]
fn to_length_prefixed_nested_returns_the_same_as_to_length_prefixed_for_one_element() {
let tests = [b"" as &[u8], b"x" as &[u8], b"abababab" as &[u8]];

for test in tests {
assert_eq!(to_length_prefixed_nested(&[test]), to_length_prefixed(test));
}
}

#[test]
fn to_length_prefixed_nested_allows_many_long_namespaces() {
// The 0xFFFF limit is for each namespace, not for the combination of them
Expand Down Expand Up @@ -169,8 +201,29 @@ mod tests {
}

#[test]
#[should_panic(expected = "only supports namespaces up to length 0xFFFF")]
#[should_panic(expected = "only supports namespace components up to length 0xFFFF")]
fn encode_length_panics_for_large_values() {
encode_length(&vec![1; 65536]);
}

#[test]
fn namespace_with_key_works() {
// Empty namespace
let enc = namespace_with_key(&[], b"foo");
assert_eq!(enc, b"foo");
let enc = namespace_with_key(&[], b"");
assert_eq!(enc, b"");

// One component namespace
let enc = namespace_with_key(&[b"bar"], b"foo");
assert_eq!(enc, b"\x00\x03barfoo");
let enc = namespace_with_key(&[b"bar"], b"");
assert_eq!(enc, b"\x00\x03bar");

// Multi component namespace
let enc = namespace_with_key(&[b"bar", b"cool"], b"foo");
assert_eq!(enc, b"\x00\x03bar\x00\x04coolfoo");
let enc = namespace_with_key(&[b"bar", b"cool"], b"");
assert_eq!(enc, b"\x00\x03bar\x00\x04cool");
}
}
5 changes: 5 additions & 0 deletions packages/std/src/storage_keys/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
mod length_prefixed;

// Please note that the entire storage_keys module is public. So be careful
// when adding elements here.
pub use length_prefixed::{namespace_with_key, to_length_prefixed, to_length_prefixed_nested};
6 changes: 4 additions & 2 deletions packages/storage/src/bucket.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
use serde::{de::DeserializeOwned, ser::Serialize};
use std::marker::PhantomData;

use cosmwasm_std::{to_vec, StdError, StdResult, Storage};
use cosmwasm_std::{
storage_keys::{to_length_prefixed, to_length_prefixed_nested},
to_vec, StdError, StdResult, Storage,
};
#[cfg(feature = "iterator")]
use cosmwasm_std::{Order, Record};

use crate::length_prefixed::{to_length_prefixed, to_length_prefixed_nested};
#[cfg(feature = "iterator")]
use crate::namespace_helpers::range_with_prefix;
use crate::namespace_helpers::{get_with_prefix, remove_with_prefix, set_with_prefix};
Expand Down
6 changes: 4 additions & 2 deletions packages/storage/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
mod bucket;
mod length_prefixed;
mod namespace_helpers;
mod prefixed_storage;
mod sequence;
mod singleton;
mod type_helpers;

pub use bucket::{bucket, bucket_read, Bucket, ReadonlyBucket};
pub use length_prefixed::{to_length_prefixed, to_length_prefixed_nested};
webmaster128 marked this conversation as resolved.
Show resolved Hide resolved
pub use prefixed_storage::{prefixed, prefixed_read, PrefixedStorage, ReadonlyPrefixedStorage};
pub use sequence::{currval, nextval, sequence};
pub use singleton::{singleton, singleton_read, ReadonlySingleton, Singleton};

// Re-exported for backwads compatibility.
// See https://github.com/CosmWasm/cosmwasm/pull/1676.
pub use cosmwasm_std::storage_keys::{to_length_prefixed, to_length_prefixed_nested};
3 changes: 1 addition & 2 deletions packages/storage/src/namespace_helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,7 @@ fn namespace_upper_bound(input: &[u8]) -> Vec<u8> {
#[cfg(test)]
mod tests {
use super::*;
use crate::length_prefixed::to_length_prefixed;
use cosmwasm_std::testing::MockStorage;
use cosmwasm_std::{storage_keys::to_length_prefixed, testing::MockStorage};

#[test]
fn prefix_get_set() {
Expand Down
6 changes: 4 additions & 2 deletions packages/storage/src/prefixed_storage.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
use cosmwasm_std::Storage;
use cosmwasm_std::{
storage_keys::{to_length_prefixed, to_length_prefixed_nested},
Storage,
};
#[cfg(feature = "iterator")]
use cosmwasm_std::{Order, Record};

use crate::length_prefixed::{to_length_prefixed, to_length_prefixed_nested};
#[cfg(feature = "iterator")]
use crate::namespace_helpers::range_with_prefix;
use crate::namespace_helpers::{get_with_prefix, remove_with_prefix, set_with_prefix};
Expand Down
3 changes: 1 addition & 2 deletions packages/storage/src/singleton.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
use serde::{de::DeserializeOwned, ser::Serialize};
use std::marker::PhantomData;

use cosmwasm_std::{to_vec, StdError, StdResult, Storage};
use cosmwasm_std::{storage_keys::to_length_prefixed, to_vec, StdError, StdResult, Storage};

use crate::length_prefixed::to_length_prefixed;
use crate::type_helpers::{may_deserialize, must_deserialize};

/// An alias of Singleton::new for less verbose usage
Expand Down