Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear storage design #188

Open
axic opened this issue Apr 25, 2019 · 4 comments
Open

Linear storage design #188

axic opened this issue Apr 25, 2019 · 4 comments

Comments

@axic
Copy link
Member

axic commented Apr 25, 2019

Memory-mapped linear storage in Ethereum was proposed IIRC by @AlexeyAkhunov. Here we give a potential implementation of that in ewasm.

The proposal is that instead of fixed-length/fixed-key storage values, storage is linear, extendable and accessed like memory. We can take a design cue from POSIX: mmap.

API: storageMap(memoryOffset: i32, storageOffset: i32, storageLength: i32)

In the naive implementation this would:

  • keep track of each storageMap
  • disallow overlapping and duplicate areas
  • load and copy storage into the memory area at the first invocation
  • at termination of the execution the tracked areas from memory are written back to storage

In an optimised implementation with control over the VM could, while still requiring storageMap, load bytes / words / etc. on-demand at the first time the area is accessed and keep track of changed areas.

In the future when Wasm support multiple memory segments, instead of mapping into the "main" working memory, storage could be mapping 1-to-1 in a dedicated segment, without the need of a host function.

@cdetrio
Copy link
Contributor

cdetrio commented Apr 25, 2019

If the storage isn't written until execution terminates, what will happen on re-entry (the contract instance calls another contract, which calls back)?

@axic
Copy link
Member Author

axic commented Apr 25, 2019

I think that is the question where we'll find all the subtle details and complications.

I'd say the practical decision right now is that storage in synchronised at the point of a call, e.g. the storage is written out prior to executing call and is loaded after call finishes.

@jakelang
Copy link
Member

Proposal: Storage Modules

In order to handle the case of re-entrancy, we need the storage mappings (or at least info about revisions to storage) to persist across calls.

With storage modules, we have a scheme where storage is mapped into the memory of a different module than that of the executing contract.

The storage module exists from instantiation to the end of the transaction (top-level call).

Depending on the feasibility of dynamic Wasm module linking, storage modules are instantiated on first call to storageMap or on first instantiation of the corresponding contract.

A "storage module" consists of one or more linear memories, which are exported (depending on which option is used below). A running contract which maps storage to memory will import an exported memory from this module.

Options

Here are a few possible variants of the storage module scheme.

  • One storage module per contract - High instantiation overhead, option for lazy instantation.
  • One module for all contracts, one memory segment per contract
    • Large upfront instantiation cost
    • No option for lazy memory initialization without implementation-level hacks
    • Need to know statically how many memory segments must be initialized for a transaction to minimize overhead from instantiating unneeded memories
  • One module and memory segment for all calls
    • Minimal runtime cost (one module and memory upfront, memory is grown as needed)
    • Potentially unsafe. Need to ensure isolation of different contracts' storage mappings. Could be done through checked I/O or memory segmentation
    • Because of the above note, access to mapped memory might have to be via imported functions (performance concern)

API

API will remain roughly the same. However, with the last option, I/O methods will be exposed to the contract which can trap on invalid memory access.

Questions

  • If we minimize host function overhead, is this needed? The surrounding VM can keep track of persistent memory and storage mappings across calls and expose fast I/O methods to the contract.
  • How does this play into a potential contract linking scheme? Perhaps this is not necessary if transactions result in static linking of all involved contracts at instantiation time.

@jakelang
Copy link
Member

Proposal 2: Storage Paging (ROUGH(!!) DRAFT)

This is an alternative memory mapping proposal which avoids the complexities of persistent wasm modules as storage while retaining the benefits of memory-mapped access.

In this scheme, the VM around the executing module (called "host" here) maintains a data structure which persists for the lifetime of the transaction, and keeps track of mapped storage units. In operating systems parlance, this is known as the frame table.

Furthermore, each instance of a Wasm module (representing a contract call) is associated with its own table of mappings from memory offsets within the module to elements of the frame table. Such a structure is better known as a page table.

This scheme can be thought of as a rudimentary paging system intended solely for mmap-style operations, where data is mapped directly from storage into a virtual memory space. It is also adaptable to the capabilities model if we do not allow multiple virtual pages for the same frame.

The Frame Table

The frame table keeps track of mapped "frames", which are fixed-size contiguous regions of linear storage.

It is maintained by the host and persists until the end of the transaction.

Because each contract has its own linear storage, we also include the address of the contract in the frame.

The index of the frame in the frame table can then be referenced by elements of a module instance's associated page table.

FrameTable ::= vec(address: bytes20, pagenum: u64)

The Page Table

Every instance of a Wasm module has its own page table. Each page corresponds to an element of the frame table.

The page table is initialized on first call to storageMap and lives until the module returns.

PageTable ::= vec(offset: u32, frame: usize)

Memory mappings

When a contract calls storageMap(storage_offset: u64, num: u32, dest: u32), the VM will interrupt and do the following:

  • if storage_offset + len overflows, then trap.
  • if dest + len is out of bounds or overflows, then trap.
  • if the memory to be mapped into overlaps with an already-mapped region of memory, then trap.
  • Determine which frames need to be mapped. This can be calculated as (storage_offset / PAGE_SIZE, storage_offset / PAGE_SIZE + 1, ... storage_offset / PAGE_SIZE + num).
  • For each frame, append a page to the page table. The memory offset is calculated as (dest + n * PAGE_SIZE). If the frame in question is already in the page table, use the original's index. Otherwise, appendsaid frame to the page table.

When a Wasm module returns or calls another Wasm contract, the changes to memory are checked. If the page was mutated, write the page to the corresponding frame. If we encounter a page fault, then trap.

Questions

  • The smaller the page size gets, the larger the memory footprint is. Repeated O(n) lookups in the frame table get worse, as well. OTOH, larger pages result in unnecessary I/O, if the user desired to access a much smaller region of memory. What is the sweet spot for page size?
  • More questions later lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants