Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subarray range API wrappers #72

Merged
merged 19 commits into from
May 6, 2024
Merged

Add subarray range API wrappers #72

merged 19 commits into from
May 6, 2024

Conversation

davisp
Copy link
Collaborator

@davisp davisp commented Apr 23, 2024

This is a draft PR to show the current approach to how I'm handling the range APIs. I'm gonna ask for a quick PR review to make sure that I'm not terribly off base.

I did find a few new macro patterns in @rroelke's latest PR that could reduce some of the repetitiveness in range.rs that I'll think harder about doing tomorrow with fresh eyes.

@davisp davisp requested a review from rroelke April 23, 2024 22:18
@davisp
Copy link
Collaborator Author

davisp commented Apr 23, 2024

@rroelke Can you give this a quick skim to see if anything jumps out at you?

Your Foo: Type macro thing is something I'm going to go back and apply all over in range.rs. I'm also contemplating replacing my Range::as_u8_slices with your enum variant macro approach that is similar to fn_typed!.

@davisp davisp force-pushed the pd/sc-45895/wrap-range-apis branch 2 times, most recently from d66ed61 to 7a98ae0 Compare April 24, 2024 22:34
@davisp
Copy link
Collaborator Author

davisp commented Apr 24, 2024

I've rewritten the range.rs module to now use the better macro rules patterns that I stole from @rroelke and I think its a lot more reasonable. I've also improved the types a bit since we have three distinct cases to worry about:

  1. cell_val_num == 1
  2. cell_val_num > 1 && cell_val_num < u32::MAX
  3. cell_val_num == u32::MAX

The reason 2 is an issue is that its never valid for dimensions ranges. Also, this cleaned up the code quite nicely in a lot of ways where things weren't super awesome, like ranges in case 2 weren't getting checked for equal length because we have to cover case 3 where different length values are valid.

@davisp
Copy link
Collaborator Author

davisp commented Apr 24, 2024

I'm currently porting over a couple examples that exercise these APIs. I'll flip to a review when I finish those and add some tests for the ranges module. Hopefully that'll happen before sync tomorrow.

@davisp davisp force-pushed the pd/sc-45895/wrap-range-apis branch 4 times, most recently from 79b9efd to f782ad6 Compare April 25, 2024 23:47
@davisp davisp marked this pull request as ready for review April 26, 2024 17:55
@davisp
Copy link
Collaborator Author

davisp commented Apr 26, 2024

@rroelke This is ready for review. I've managed to get fairly decent test coverage on the range.rs module last night.

Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will come back to this Monday morning...

@@ -381,6 +386,56 @@ impl Datatype {
|| self.is_datetime_type()
|| self.is_time_type()
}

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I don't think I've seen this except for a mod tests item before, but this makes sense. Neat.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember what crate I was looking through ages ago, but they were using a similar pattern for cross platform code that was so awesome that it just stuck.

#[cfg(unix)]
fn do_thing(&self) {
    ....
}

#[cfg(windows)]
fn do_thing(&self) {
   ...
}

Having just spent a year working on a cross platform C++ library for a living, the simplicity of that made it stick pretty thoroughly.

Datatype::StringUtf32 => unimplemented!(),
Datatype::StringUcs2 => unimplemented!(),
Datatype::StringUcs4 => unimplemented!(),
Datatype::Char => { type $typename = i8; $then },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine for now but I sorta wonder if in the future we should use newtypes for these so that we can do custom display, and etc.

And also now that I'm playing more with the Arrow datatypes I wonder if we should draw some inspiration from them too.

Just musing. Looks good for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. In an ideal world our type system would get split between concrete and logical types though that's apparently contentious.

type Error = crate::error::Error;
fn try_from(value: (u32, Box<[$U]>, Box<[$U]>)) ->
TileDBResult<MultiValueRange> {
if value.0 < 2 || value.0 == u32::MAX {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking that I follow - value.0 is meant to be the length? Is the idea with value.0 < 2 that you should (must) use a SingleValueRange instead?

How come you don't just check value.1.len() and value.2.len() against 2 and against each other? That sounds more user-friendly. Is it because the u32 here holds a fixed cell val num?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be more clear if that were correctly typed as a CellValNum. The logic here is saying "If you have a CellValNum::Fixed(1) or CellValNum::Var, go use the correct range type.

Fun fact, this isn't even used for domains because core doesn't support them, but it felt reasonable to include since it'd be kinda weird to not be able to create them at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you didn't just use CellValNum? That would be a lot clearer

Copy link
Collaborator Author

@davisp davisp May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totes just saw this in my email so I'm un-resolving this thread. We should probably codify a better protocol for when to resolve an open thread like this rather than relying on timeouts like I did here.

To answer the question though, I basically hadn't internalized the CellValNum type at the time I wrote this. I was relying on my knowledge of internals rather than preventing the edge cases. As in, I was more worried about user experience with syntax than forcing correctness by construction.

I'll go back and change this to something better.

Also, it occurs to me that CellValNum only has the Fixed and Var variants. I'm pretty sure it should have Single Multi and Var variants to correctly express the current constraints. That'll probably be a decent change worthy of a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasons favoring CellValNum::Single:

  • clarifies some pattern matching,CellValNum::Fixed(nz) if nz.get() == 1 could be changed to CellValNum::Single
  • construction of CellValNum::Fixed(1) is less work

Reasons against:

  • Some code doesn't care whether nz.get() == 1 and this code will get slightly worse
  • To prevent overlap between CellValNum::Single and CellValNum::Multi(1) we would probably need to invent a new type BoundedInt<2, u32::MAX - 1> or something. I don't hate that idea and of course there is already a crate for it.

At some point in a branch I added fn single() -> Self which addresses the construction problem. So I feel sort of neutrally about this notion.

But regardless I look forward to seeing this changed to CellValNum

@ihnorton ihnorton changed the title Add subarrray range API wrappers Add subarray range API wrappers May 2, 2024
@davisp davisp force-pushed the pd/sc-45895/wrap-range-apis branch 2 times, most recently from c09ce32 to 6df6f31 Compare May 3, 2024 19:19
@davisp davisp force-pushed the pd/sc-45895/wrap-range-apis branch from f4cbf19 to 5f52681 Compare May 3, 2024 20:18
@@ -46,6 +47,20 @@ pub trait Query<'ctx> {
fn finalize(self) -> TileDBResult<Array<'ctx>>
where
Self: Sized;

fn subarray(&'ctx self) -> TileDBResult<Subarray<'ctx>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the overload of the 'ctx lifetime name, I'd suggest 'query

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it turns out to be not fine. I was pretty sure that 'ctx was more correct here when I started, but who cares about naming. But now I've had rustc tell me "No" enough times that I'm pretty sure I understand specifically why (except one particular detail I need to go find an answer for).

Regardless, I believe the 'ctx is correct here because its referring to the Schema reference in the Subarray struct which flows from the same 'ctx lifetime that parameterizes the Query instance. What you might be thinking of is the lifetime of the Subarray struct instance itself, which is the elided lifetime of the Query instance. This can be confirmed you insert a drop(query) after getting the Subarray and then attempting to use the subarray.

The only thing I couldn't figure out is why a restricted lifetime wouldn't work. I couldn't decide if this was because the compiler tracks enough to realize that the lifetime parameter for the Query must match the lifetime parameter of the Schema (regardless of what they're called) or if there's a weird issue with restricted lifetimes on generic trait methods. But all I could get in the attempt was for the rustc to tell me that 'ctx: 'query is shadowing an already defined lifetime 'ctx. I'm still reading on that one.

@rroelke
Copy link
Member

rroelke commented May 6, 2024

Good stuff, I'll approve after the last two tweaks :)

@davisp
Copy link
Collaborator Author

davisp commented May 6, 2024

I assume one is the 'ctx -> 'query which is fine, but what's the other tweak?

@rroelke
Copy link
Member

rroelke commented May 6, 2024

I assume one is the 'ctx -> 'query which is fine, but what's the other tweak?

Using CellValNum

davisp added 5 commits May 6, 2024 10:21
The `&'ctc self` in `Query::subarray` was required because
`Array::schema(&self) -> TileDBResult<Schema>` was incorrect. It should
have ben `TileDBResult<Schema<'ctx>>`.
@rroelke discovered that it is in fact not safe to keep a reference to a
Subarray after a Query is dropped. This ties the lifetime of Subarray to
the lifetime of Query to prevent the issue.
Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for all the good discussions!

@davisp davisp merged commit a7bd02b into main May 6, 2024
2 checks passed
@davisp davisp deleted the pd/sc-45895/wrap-range-apis branch May 6, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants