Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test which writes data and then reads it back for an arbitrary dense array schema and subarray #98

Merged
merged 62 commits into from
May 28, 2024

Conversation

rroelke
Copy link
Member

@rroelke rroelke commented May 21, 2024

This ended up being quite a doozy!

A dense write requires data input for each of the attributes as well as the subarray that is being written to. The strategy which generates a dense write therefore generates an initial set of cells (query::strategy::Cells) for some "bounding subarray", and the shrinking strategy is to narrow the range of cells covered by the subarray. This is done by computing the offset of the "current" subarray compared to the "bounding" subarray and then slicing the cells.

There is also a shrinking strategy for the Cells themselves. Note that this is not used by this PR but will be used for sparse writes, where the dimension coordinates are part of the cell data. This strategy explores the space by dividing the cells into N pieces, running the test N times once for each chunk missing. Each of these iterations tells us if a chunk was necessary for failure. Then we recur down to just the pieces which were needed for failure.

This pull request exposed several bugs with our filter pipeline strategy specifically, but also some arithmetic issues with other parts. There are lots of incidental fixes and changes to strategies to thread around more requirements.

rroelke added 30 commits May 9, 2024 14:48
@rroelke rroelke requested a review from davisp May 21, 2024 15:30
pub fn check_datatype(&self, datatype: Datatype) -> TileDBResult<()> {
check_datatype!(self, datatype);
Ok(())
}

/// Merges this range into the `other` list of sorted, non-overlapping ranges and returns the result.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this stub while thinking about "do multiple writes in sequence and read back to make everything checked out" but decided to restrict the scope of this PR. Will remote it.

Copy link
Collaborator

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. +1 to merge once that test is fixed.

let dimension_params = DimensionRequirements {
datatype: Some(dimension_type),
extent_limit: {
// Dth root of T is the same as T^(1/D)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

($field:expr, $data:pat, $then:expr) => {
typed_field_data_go!($field, _DT, $data, $then, $then)
};
($lexpr:expr, $rexpr:expr, $DT:ident, $lpat:pat, $rpat:pat, $same_type:expr, $else:expr) => {{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not insisting you change this, but I'm not very fond of how these are all smashed together in one macro. I've been staring at this for a good 5m trying to match clauses up to follow the recursive bits and invocation logic and then I realized this third variant isn't related to the other two? Personally I'd just make this its own macro with a more descriptive name.

Also, if possible, I'd move the helper variant to the top so that its easier to compare their argument lists rather than attempting to memorize them while scrolling back and forth.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kinda reminds me of the proptest! macro which has like 20 variations that are impossible to deduce the intended use cases and you just have to find examples to suss out how to use it properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty odd that doc comments are not even allowed inside of macro_rules!. I'm quite happy to shuffle some pieces around to make it easier to understand.

@rroelke
Copy link
Member Author

rroelke commented May 23, 2024

+1 to merge once that test is fixed

The is the failure caused by our use of tiledb 2.21 which has a difference in datatype_is_byte that is causing our filter pipelines to fail. For example, we see the following error in the logs:

FilterPipeline: Filter CHECKSUM_SHA256 produces BLOB but second filter BIT_WIDTH_REDUCTION does not accept this type.

Here's BitWidthReductionFilter::accepts_input_datatype in core:

bool BitWidthReductionFilter::accepts_input_datatype(Datatype datatype) const {
  if (datatype_is_integer(datatype) || datatype_is_datetime(datatype) ||
      datatype_is_time(datatype) || datatype_is_byte(datatype)) {
    return true;
  }
  return false;
}

our filter strategy sets thinigs up with the newer code of datatype_is_byte, so we fail in 2.21.

@davisp
Copy link
Collaborator

davisp commented May 23, 2024

Ah right right. I'll get the CI updated to use core nightlies and we should be able to get this fixed and merged today.

@rroelke
Copy link
Member Author

rroelke commented May 23, 2024

I need to not forget - add the "non-empty domain" checks to the test.

@rroelke rroelke merged commit 4c78e62 into main May 28, 2024
2 checks passed
@rroelke rroelke deleted the rr/sc-45982-write-query-proptest branch May 28, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants