-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subarray range API wrappers #72
Conversation
@rroelke Can you give this a quick skim to see if anything jumps out at you? Your |
d66ed61
to
7a98ae0
Compare
I've rewritten the range.rs module to now use the better macro rules patterns that I stole from @rroelke and I think its a lot more reasonable. I've also improved the types a bit since we have three distinct cases to worry about:
The reason 2 is an issue is that its never valid for dimensions ranges. Also, this cleaned up the code quite nicely in a lot of ways where things weren't super awesome, like ranges in case 2 weren't getting checked for equal length because we have to cover case 3 where different length values are valid. |
I'm currently porting over a couple examples that exercise these APIs. I'll flip to a review when I finish those and add some tests for the ranges module. Hopefully that'll happen before sync tomorrow. |
79b9efd
to
f782ad6
Compare
@rroelke This is ready for review. I've managed to get fairly decent test coverage on the range.rs module last night. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will come back to this Monday morning...
@@ -381,6 +386,56 @@ impl Datatype { | |||
|| self.is_datetime_type() | |||
|| self.is_time_type() | |||
} | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I don't think I've seen this except for a mod tests
item before, but this makes sense. Neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember what crate I was looking through ages ago, but they were using a similar pattern for cross platform code that was so awesome that it just stuck.
#[cfg(unix)]
fn do_thing(&self) {
....
}
#[cfg(windows)]
fn do_thing(&self) {
...
}
Having just spent a year working on a cross platform C++ library for a living, the simplicity of that made it stick pretty thoroughly.
tiledb/api/src/datatype/mod.rs
Outdated
Datatype::StringUtf32 => unimplemented!(), | ||
Datatype::StringUcs2 => unimplemented!(), | ||
Datatype::StringUcs4 => unimplemented!(), | ||
Datatype::Char => { type $typename = i8; $then }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine for now but I sorta wonder if in the future we should use newtypes for these so that we can do custom display, and etc.
And also now that I'm playing more with the Arrow datatypes I wonder if we should draw some inspiration from them too.
Just musing. Looks good for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. In an ideal world our type system would get split between concrete and logical types though that's apparently contentious.
tiledb/api/src/range.rs
Outdated
type Error = crate::error::Error; | ||
fn try_from(value: (u32, Box<[$U]>, Box<[$U]>)) -> | ||
TileDBResult<MultiValueRange> { | ||
if value.0 < 2 || value.0 == u32::MAX { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking that I follow - value.0
is meant to be the length? Is the idea with value.0 < 2
that you should (must) use a SingleValueRange
instead?
How come you don't just check value.1.len()
and value.2.len()
against 2 and against each other? That sounds more user-friendly. Is it because the u32
here holds a fixed cell val num?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be more clear if that were correctly typed as a CellValNum. The logic here is saying "If you have a CellValNum::Fixed(1) or CellValNum::Var, go use the correct range type.
Fun fact, this isn't even used for domains because core doesn't support them, but it felt reasonable to include since it'd be kinda weird to not be able to create them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you didn't just use CellValNum
? That would be a lot clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totes just saw this in my email so I'm un-resolving this thread. We should probably codify a better protocol for when to resolve an open thread like this rather than relying on timeouts like I did here.
To answer the question though, I basically hadn't internalized the CellValNum
type at the time I wrote this. I was relying on my knowledge of internals rather than preventing the edge cases. As in, I was more worried about user experience with syntax than forcing correctness by construction.
I'll go back and change this to something better.
Also, it occurs to me that CellValNum
only has the Fixed
and Var
variants. I'm pretty sure it should have Single
Multi
and Var
variants to correctly express the current constraints. That'll probably be a decent change worthy of a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasons favoring CellValNum::Single
:
- clarifies some pattern matching,
CellValNum::Fixed(nz) if nz.get() == 1
could be changed toCellValNum::Single
- construction of
CellValNum::Fixed(1)
is less work
Reasons against:
- Some code doesn't care whether
nz.get() == 1
and this code will get slightly worse - To prevent overlap between
CellValNum::Single
andCellValNum::Multi(1)
we would probably need to invent a new typeBoundedInt<2, u32::MAX - 1>
or something. I don't hate that idea and of course there is already a crate for it.
At some point in a branch I added fn single() -> Self
which addresses the construction problem. So I feel sort of neutrally about this notion.
But regardless I look forward to seeing this changed to CellValNum
c09ce32
to
6df6f31
Compare
f4cbf19
to
5f52681
Compare
tiledb/api/src/query/mod.rs
Outdated
@@ -46,6 +47,20 @@ pub trait Query<'ctx> { | |||
fn finalize(self) -> TileDBResult<Array<'ctx>> | |||
where | |||
Self: Sized; | |||
|
|||
fn subarray(&'ctx self) -> TileDBResult<Subarray<'ctx>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the overload of the 'ctx
lifetime name, I'd suggest 'query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it turns out to be not fine. I was pretty sure that 'ctx
was more correct here when I started, but who cares about naming. But now I've had rustc tell me "No" enough times that I'm pretty sure I understand specifically why (except one particular detail I need to go find an answer for).
Regardless, I believe the 'ctx
is correct here because its referring to the Schema reference in the Subarray struct which flows from the same 'ctx
lifetime that parameterizes the Query instance. What you might be thinking of is the lifetime of the Subarray struct instance itself, which is the elided lifetime of the Query instance. This can be confirmed you insert a drop(query)
after getting the Subarray and then attempting to use the subarray.
The only thing I couldn't figure out is why a restricted lifetime wouldn't work. I couldn't decide if this was because the compiler tracks enough to realize that the lifetime parameter for the Query must match the lifetime parameter of the Schema (regardless of what they're called) or if there's a weird issue with restricted lifetimes on generic trait methods. But all I could get in the attempt was for the rustc to tell me that 'ctx: 'query
is shadowing an already defined lifetime 'ctx
. I'm still reading on that one.
Good stuff, I'll approve after the last two tweaks :) |
I assume one is the 'ctx -> 'query which is fine, but what's the other tweak? |
Using |
The `&'ctc self` in `Query::subarray` was required because `Array::schema(&self) -> TileDBResult<Schema>` was incorrect. It should have ben `TileDBResult<Schema<'ctx>>`.
@rroelke discovered that it is in fact not safe to keep a reference to a Subarray after a Query is dropped. This ties the lifetime of Subarray to the lifetime of Query to prevent the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all the good discussions!
This is a draft PR to show the current approach to how I'm handling the range APIs. I'm gonna ask for a quick PR review to make sure that I'm not terribly off base.
I did find a few new macro patterns in @rroelke's latest PR that could reduce some of the repetitiveness in range.rs that I'll think harder about doing tomorrow with fresh eyes.