Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions arrow-row/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,41 @@ mod variable;
///
///```
///
/// ## Union Encoding
///
/// A union value is encoded as a single type-id byte followed by the row encoding of the selected child value.
/// The type-id byte is always present; union arrays have no top-level null marker, so nulls are represented by the child encoding.
///
/// For example, given a union of Int32 (type_id = 0) and Utf8 (type_id = 1):
///
/// ```text
/// ┌──┬──────────────┐
/// 3 │00│01│80│00│00│03│
/// └──┴──────────────┘
/// │ └─ signed integer encoding (non-null)
/// └──── type_id
///
/// ┌──┬────────────────────────────────┐
/// "abc" │01│02│'a'│'b'│'c'│00│00│00│00│00│03│
/// └──┴────────────────────────────────┘
/// │ └─ string encoding (non-null)
/// └──── type_id
///
/// ┌──┬──────────────┐
/// null Int32 │00│00│00│00│00│00│
/// └──┴──────────────┘
/// │ └─ signed integer encoding (null)
/// └──── type_id
///
/// ┌──┬──┐
/// null Utf8 │01│00│
/// └──┴──┘
/// │ └─ string encoding (null)
/// └──── type_id
/// ```
///
/// See [`UnionArray`] for more details on union types.
///
/// # Ordering
///
/// ## Float Ordering
Expand All @@ -431,6 +466,12 @@ mod variable;
/// The encoding described above will order nulls first, this can be inverted by representing
/// nulls as `0xFF_u8` instead of `0_u8`
///
/// ## Union Ordering
///
/// Values of the same type are ordered according to the ordering of that type.
/// Values of different types are ordered by their type id.
Comment on lines +469 to +472
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention reversing via negating the type_id here:

arrow-rs/arrow-row/src/lib.rs

Lines 1742 to 1747 in 2507946

let type_id_byte = if opts.descending {
!(type_id as u8)
} else {
type_id as u8
};
data[*offset] = type_id_byte;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am trying to prepare the 57.2.0 I took the liberty of making this change directly to this PR in 08efd42. If I got it wrong or you would like other changes, I will be happy to make a follow on PR

/// The type_id is negated when descending order is specified.
///
/// ## Reverse Column Ordering
///
/// The order of a given column can be reversed by negating the encoded bytes of non-null values
Expand Down
Loading