-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Variant] VariantMetadata is allowed to contain the empty string #7956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FYI @codephage2020 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scovich
I also pushed another test to this PR that fails without this change:
#[test]
fn test_variant_object_empty_fields() {
let mut builder = VariantBuilder::new();
builder.new_object()
.with_field("", 42)
.finish().unwrap();
let (metadata, value) = builder.finish();
// Resulting object is valid and has a single empty field
let variant = Variant::try_new(&metadata, &value).unwrap();
let variant_obj = variant.as_object().unwrap();
assert_eq!(variant_obj.len(), 1);
assert_eq!(variant_obj.get(""), Some(Variant::from(42)));
}| // Ensure the StructArray has a metadata field of BinaryView | ||
|
|
||
| let Some(metadata_field) = VariantArray::find_metadata_field(&inner) else { | ||
| let Some(metadata_field) = VariantArray::find_metadata_field(inner) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clippy was complaining about this locally so I fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a gap in CI, I have a PR to fix it here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Great work on this👍!
|
|
||
| let mut offsets_iter = map_bytes_to_offsets(offset_bytes, self.header.offset_size); | ||
| let mut current_offset = offsets_iter.next().unwrap_or(0); | ||
| let mut offsets = map_bytes_to_offsets(offset_bytes, self.header.offset_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An insignificant point. I named it *_iter, which exists in both metadata and object. If you want to make modifications, they should be consistent.
# Which issue does this PR close? - Related to #6736 # Rationale for this change I noticed in #7956 that some Clippy errors were introduced but not caught by CI. # What changes are included in this PR? Add `parquet-variant-compute` to the CI for parqet-variant related PRs # Are these changes tested? It is only tests # Are there any user-facing changes? No
Which issue does this PR close?
Rationale for this change
Introduced a minor regression, in (accidentally?) forbidding the empty string as a dictionary key. Fix the bug and simplify the code a bit further while we're at it.
What changes are included in this PR?
Revert the unsorted dictionary check back to what it had been (it just uses
Iterator::is_sorted_bynow, instead ofprimitive.slice::is_sorted_by).Remove the redundant offset monotonicity check from the ordered dictionary path, relying on the fact that string slice extraction will anyway fail if the offsets are not monotonic. Improve the error message now that it does double duty.
Are these changes tested?
New unit tests for dictionaries containing the empty string. As a side effect, we now have at least a little coverage for sorted dictionaries -- somehow, I couldn't find any existing unit test that creates a sorted dictionary??
Are there any user-facing changes?
No