-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to read arrangements from disk and cache values inside Cursor / Storage? #520
Comments
Hello! It's a good question, and I'm not certain there will be satisfying answers. I think the two ideas you've proposed run in to other constraints, so let me talk those through. First, the Minor, but I think the problem is probably not the But the core problem that I see is that if you want an LRU cache, or something like this, .. it will invalidate some data in response to requests for data it needs to bring in. How does this "lifetime" get communicated to Rust? I think the problem is that it is not a lexical lifetime, which means that Rust is not obviously well equipped to express it. Hypothetically, if you have a one element cache, then every new access is going to invalidate previous elements, and none of the DD code will work anyhow. An alternate approach, making this up ad lib, is that the This is a bit like what we've done a few times with DD, which is "let virtual memory handle it". If you treat it as just .. data that Rust has, potentially lots more than you have physical memory, there are a few controls that let you respond to "ack; I don't have the data mapped in". Virtual memory is one, and I think @antiguru may be able to speak to how I'll keep pondering, but it does seem tricky to both 1. maintain "references" that live longer than the borrow of the cursor, and 2. adapt resources in response to the cursor position. If you see other idioms out there where folks have succeeded with this, lmk! |
An alternate approach, which is a pretty substantial (but not unmotivated) re-design, would be to have DD expose the ability to chunk data more explicitly. Right now you can partition data between workers, which is useful. But perhaps each worker would further like to chunk their data down to smaller key ranges, say. This would allow them to "page in" the data for a chunk of keys, do that work without modifying the backing store, and then release those resources as it moves on to other chunks of keys. It doesn't address the question of what if the data for one key is too large to fit, but I guess that will always exist (as long as one is allowed to say "I only have 1KB; what now"). |
Ahah! Thank you, this all makes sense.
This works for me 😃. I store data on files and can promise that it can be accessed at any time during program execution. I was originally thrown by the Keen to hear more about |
Ah, the |
I'm trying to implement disk-backed arrangements, where updates are stored as immutable batches on disk and chunks of the batch are loaded into memory as you seek around the cursor.
I have set up:
DiskBatch
that holds a path to a file containing the data, and a mapping of key ranges (chunks) to offsets in the file. TheBuilder
that outputsDiskBatch
collects and sorts the data in memory and then writes it out to disk inBuilder::done
.DiskCursor
that implementsCursor
and usesDiskBatch
as theStorage
associated type, withrange_pos
,key_pos
andval_pos
fields (similar toOrdValCursor
).The problem I have run into is trying to add the caching layer - ideally an LRU cache of chunks. The
key
method ofCursor
is this:Let's say for example that the
Key<'a>
associated type is hard-coded to&'a String
. The lifetime on&'a Self::Storage
and the return type means that the "cache" must belong to the storage. Since there isn't a mutable reference to storage, I believe the only option is to use aRefCell
(maybe around aHashMap
), but I can't return a&'a
reference to content inside aRefCell
-RefCell::borrow()
returns aRef
with a lifetime that only lasts until the end of the current block.I tried changing
Key<'a>
to an ownedString
orRef<String>
, but this is not possible because these don't implementCopy
.Ideas:
&self
to&mut self
on key and other relatedCursor
methods, and update the lifetimes to allow theKey<'a>
to be returned from the lifetime attached toself
/Cursor
. Or maybe just update the lifetimes without making the referencemut
- mutation can occur during step / seek / rewind. I instinctively thought that theCursor
would be the most appropriate place for a cache, since it already holds other mutable values, while keepingStorage
immutable at all times.&'a Self::Storage
to&'a mut Self::Storage
, allowing the cache to be implemented withHashMap
orlru::LruCache
, without theRefCell
wrapper.Please let me know if I've missed some other way to achieve this. This worked really nicely for
Copy
types likeu32
, but I've had no luck with other types.The text was updated successfully, but these errors were encountered: