Skip to content

Conversation

@scovich
Copy link
Contributor

@scovich scovich commented Jul 17, 2025

Which issue does this PR close?

Rationale for this change

Introduced a minor regression, in (accidentally?) forbidding the empty string as a dictionary key. Fix the bug and simplify the code a bit further while we're at it.

What changes are included in this PR?

Revert the unsorted dictionary check back to what it had been (it just uses Iterator::is_sorted_by now, instead of primitive.slice::is_sorted_by).

Remove the redundant offset monotonicity check from the ordered dictionary path, relying on the fact that string slice extraction will anyway fail if the offsets are not monotonic. Improve the error message now that it does double duty.

Are these changes tested?

New unit tests for dictionaries containing the empty string. As a side effect, we now have at least a little coverage for sorted dictionaries -- somehow, I couldn't find any existing unit test that creates a sorted dictionary??

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 17, 2025
@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

FYI @codephage2020

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scovich

I also pushed another test to this PR that fails without this change:

    #[test]
    fn test_variant_object_empty_fields() {
        let mut builder = VariantBuilder::new();
        builder.new_object()
            .with_field("", 42)
            .finish().unwrap();
        let (metadata, value) = builder.finish();

        // Resulting object is valid and has a single empty field
        let variant = Variant::try_new(&metadata, &value).unwrap();
        let variant_obj = variant.as_object().unwrap();
        assert_eq!(variant_obj.len(), 1);
        assert_eq!(variant_obj.get(""), Some(Variant::from(42)));
    }

// Ensure the StructArray has a metadata field of BinaryView

let Some(metadata_field) = VariantArray::find_metadata_field(&inner) else {
let Some(metadata_field) = VariantArray::find_metadata_field(inner) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy was complaining about this locally so I fixed it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a gap in CI, I have a PR to fix it here:

@alamb alamb merged commit a5afda2 into apache:main Jul 18, 2025
12 checks passed
Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Great work on this👍!


let mut offsets_iter = map_bytes_to_offsets(offset_bytes, self.header.offset_size);
let mut current_offset = offsets_iter.next().unwrap_or(0);
let mut offsets = map_bytes_to_offsets(offset_bytes, self.header.offset_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An insignificant point. I named it *_iter, which exists in both metadata and object. If you want to make modifications, they should be consistent.

alamb added a commit that referenced this pull request Jul 18, 2025
# Which issue does this PR close?


-  Related to #6736 


# Rationale for this change

I noticed in #7956 that some
Clippy errors were introduced but not caught by CI.

# What changes are included in this PR?

Add `parquet-variant-compute` to the CI for parqet-variant related PRs

# Are these changes tested?

It is only tests


# Are there any user-facing changes?
No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants