Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [sc-50382] [tables] or [rs] implement accumulating a subarray from the intersection of predicate ranges #133

Merged

Conversation

rroelke
Copy link
Member

@rroelke rroelke commented Jul 5, 2024

Story details: https://app.shortcut.com/tiledb-inc/story/50382

In tables we might see a query with multiple predicates on dimensions. Each of these predicates can restrict the subarray we want to select (using range analysis - see technical discussion 49433 ). As we run over the predicates, we want to accumulate the minimal subarray across all predicates.

If we have "a AND b", then we want to compute the intersection of the range which can satisfy "a" and the range which can satisfy "b" for our subarray.

If we have "a OR b", then we want to append multiple ranges to the subarray - the range which can satisfy "a" plus the range which can satisfy "b" (and happily, core will consolidate them for us).

If we have "(a OR b) AND c", then we must distribute the intersection, e.g. "(a AND c) OR (b AND c)".

If we have "(a AND b) OR c" then... probably we don't really have to do anything.

The scope of this story is to write the interval arithmetic intersection with property-based testing, and to write the function which distributes an intersection to each clause of a disjunction (and validate the implementation with property-based testing).

I chose to implement the functionality in rs since it is sort of generic and it seems plausible that someone somewhere someday might also want it. Side note, I am thinking maybe we should break the range stuff out into a separate crate in the workspace.

Anyway...

This pull request implements what is described, alongside a small bit of some extra arithmetic stuff (MultiValueRange::num_cells being what comes to mind). A new (possibly redundant?) struct SubarrayData is added to hold the range set for each dimension and is where we implement the critical logic.

@rroelke rroelke requested a review from davisp July 5, 2024 13:41
Copy link
Collaborator

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine. The only question I have is why we implemented MultiValue range stuff? I know that I even originally wrote the MultiValueRange and you've just extended this implementation to cover it, but I don't think core supports them?

B: BitsOrd + ?Sized,
{
if matches!(left_upper.bits_cmp(right_lower), Ordering::Less)
|| matches!(right_upper.bits_cmp(left_lower), Ordering::Less)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an assert for left_lower <= left_upper && right_lower <= right_upper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Added bits_le, bits_lt, and etc. and then used those here.

} else {
left_upper
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be a good idea to assert lower <= upper here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rroelke
Copy link
Member Author

rroelke commented Jul 9, 2024

The only question I have is why we implemented MultiValue range stuff? I know that I even originally wrote the MultiValueRange and you've just extended this implementation to cover it, but I don't think core supports them?

At this writing I don't think core does, but there may eventually come a time when it does... and we'll be ready

That's not a great reason to support it now, so I looked back at #72 to see why you added it, and the best I can tell is that it was because one of the functions (TypedRange::from_slices) needed to fill in a case for CellValNum between 1 and Var. And you mention there "The reason [this case] is an issue is that its never valid for dimensions ranges" so at no point were we particularly happy with it.

So I guess it just exists to fill a gap in the API which doesn't show up in practice.

@rroelke rroelke merged commit 7a725e9 into main Jul 9, 2024
2 checks passed
@rroelke rroelke deleted the rr/sc-50382-subarray-intersection-arithmetic-and-distribution branch July 9, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants