Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement conversion between TIleDB Schema and Arrow Schema #31

Merged
merged 15 commits into from
Mar 27, 2024

Conversation

rroelke
Copy link
Member

@rroelke rroelke commented Mar 27, 2024

This pull request implements functions to convert between TileDB schema and Arrow schema.

As in #28, we use a serializable objects to store details about the TileDB schema and/or dimension which do not have a natural field in the Arrow Schema.

There aren't any notable new designs here, just lots of details which were missing including several PartialEq implementations, some fields from Debug, some ffi methods which were not implemented yet, and some updates to the proptest strategies. Among the missing methods were the Schema's filter list setters and getters - these depend on the Domain to generate sound values, so I was unable to add them to proptest as we have discussed as length in Slack.

Resolves sc#43826.

Copy link

@rroelke rroelke requested review from davisp and abigalekim March 27, 2024 14:50
use tiledb::{error::Error as TileDBError, fn_typed, Result as TileDBResult};

use crate::datatype::{arrow_type_physical, tiledb_type_physical};

/// Encapsulates TileDB filter data for storage in Arrow Field metadata
#[derive(Deserialize, Serialize)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to its own module arrow/src/filter.rs.

use proptest::prelude::*;

#[test]
fn test_tiledb_arrow_tiledb() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the attribute module we also have test_arrow_tiledb_arrow. For attribute the tiledb metadata is optional, but for dimension it is required, so it's much harder to write that test.

Copy link
Collaborator

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +94 to +95
DimensionKey::Index(idx) => Ok(idx < self.ndim()),
DimensionKey::Name(name) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh! Nice, this is a pattern to remember to avoid having to have multiple functions in a lot of the places where we allow lookups by index or name.

@rroelke rroelke merged commit a4c5e46 into main Mar 27, 2024
1 check passed
@rroelke rroelke deleted the rr/sc-43826-arrow-schema-conversion branch March 27, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants