Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a general caching abstraction to wasix #4003

Closed
Michael-F-Bryan opened this issue Jun 16, 2023 · 2 comments
Closed

Introduce a general caching abstraction to wasix #4003

Michael-F-Bryan opened this issue Jun 16, 2023 · 2 comments
Labels
🎉 enhancement New feature! priority-medium Medium priority issue 🏚 stale Inactive issues or PR
Milestone

Comments

@Michael-F-Bryan
Copy link
Contributor

Motivation

There are many places across the Wasmer CLI and WASIX where we want to use caching to avoid unnecessary work. At the moment, each of these places is hand-rolling its own caching solution.

Off the top of my head, I can think of the following:

  • wasmer_wasix::runtime::resolver::WapmSource - caches the response of a GetPackage() query against the registry
  • wasmer_wasix::runtime::module_cache - contains various implementations of caches for WebAssembly modules (in-memory, thread-local, filesystem, etc.)
  • wasmer_wasix::runtime::package_loader::BuiltinPackageLoader - caches *.webc files downloaded from the registry
  • wasmer_wasix::runtime::resolver::WebSource - caches *.webc files downloaded from the internet via a bare URL

These caching solutions all tend to have the same properties or requirements:

  • It's a key-value store where the keys are arbitrary strings
  • We want the files to be cached on disk so they can be picked up from subsequent wasmer runs
  • The values are often "big" and we want to mmap (via shared_buffers::OwnedBuffer) them rather than reading into memory
  • On-disk caches need to be first saved to a temporary file and moved into place to avoid seeing results before they've finished being written
  • Cached values should be kept in memory once loaded from the filesystem
  • We often want a way to invalidate a cache key, whether that is via a timeout, by checking if an ETag header or hash has changed, or whatever
  • We want the option to use stale values if the "main" method of fetching the data fails (e.g. because of an network error)
  • We need to work in both sync and async contexts

Proposed solution

I was thinking of creating a concrete type with an API similar to a HashMap<CollectionName, HashMap<Key, Value>>, except it'll pass out instances of shared_buffer::OwnedBuffer and automatically manage the synchronisation of on-disk and in-memory caches.

This would probably hook into Wasmer Edge's caching facilities, too.

Additional context

This originally came up when I was working on #3983. Having one or two places where we do caching is fine, but I noticed I was doing the same in-memory/on-disk dance in several places and all of it is essentially untested.

Copy link

stale bot commented Jun 19, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the 🏚 stale Inactive issues or PR label Jun 19, 2024
Copy link

stale bot commented Jul 20, 2024

Feel free to reopen the issue if it has been closed by mistake.

@stale stale bot closed this as completed Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎉 enhancement New feature! priority-medium Medium priority issue 🏚 stale Inactive issues or PR
Projects
None yet
Development

No branches or pull requests

2 participants