Conversation
westonpace
left a comment
There was a problem hiding this comment.
A few questions to help me understand how this operation works
| let mut indices = Vec::with_capacity(old_indices.len()); | ||
| for idx in old_indices { | ||
| let index = dataset | ||
| .open_generic_index(&column.name, &idx.uuid.to_string()) | ||
| .await?; | ||
| indices.push(index); | ||
| } |
There was a problem hiding this comment.
Again, could maybe use Vec::from_iter here?
There was a problem hiding this comment.
When I change to
let indices = stream::iter(old_indices.iter())
.zip(repeat((dataset.clone(), col_name.clone())))
.map(|(meta, (ds, col_name))| async move {
ds.open_generic_index(&col_name, &meta.uuid.to_string())
.await
})
.buffered(10)
.try_collect::<Vec<_>>()
.await?;
it complaints
error: higher-ranked lifetime error
--> lance/src/index.rs:300:5
|
300 | #[instrument(skip_all)]
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: could not prove `Pin<Box<{async block@lance/src/index.rs:300:5: 300:28}>>: CoerceUnsized<Pin<Box<(dyn futures::Future<Output = Result<(), lance_core::Error>> + std::marker::Send + 'i)>>>`
= note: this error originates in the attribute macro `instrument` (in Nightly builds, run with -Z macro-backtrace for more info)
error: could not compile `lance` (lib) due to previous error
There was a problem hiding this comment.
Ah, the return of the dreaded rust-lang/rust#102211
Don't worry too much about it but I I've had luck with something like this...
let indices = stream::iter(old_indices.iter())
.zip(repeat((dataset.clone(), col_name.clone())))
.map(|(meta, (ds, col_name))| async move {
ds.open_generic_index(&col_name, &meta.uuid.to_string())
.await
})
.collect::<Vec<_>>();
let indices = stream::iter(indices)
.buffered(10)
.try_collect::<Vec<_>>()
.await?;
...or you can just remove the call to buffered (that's the method that usually introduces the bogus error).
| /// If `num_indices_to_merge` is 0, a new delta index will be created. | ||
| /// If `num_indices_to_merge` is 1, the delta updates will be merged into the latest index. | ||
| /// If `num_indices_to_merge` is more than 1, the delta updates and latest N indices | ||
| /// will be merged into one single index. |
There was a problem hiding this comment.
I think we need to explain to the user why they would change this parameter.
There was a problem hiding this comment.
Don't worry too much. We can tweak this if/when the parameter becomes public. I'm guessing that there is some kind of cost-to-compute/accuracy-of-index tradeoff here? Or is it a cost-to-compute/cost-to-search tradeoff?
In other words, "why wouldn't I merge the indicies into one big index every time?" or "why wouldn't I make every index a delta index?"
I'm still not sure it is clear from the comment.
Allow users to speicfy how many delta indices to be merged