Skip to content

Conversation

@jorgecarleitao
Copy link
Member

This PR removes PrimitiveArray::new. PrimitiveArray::new is pub, but it is dangerous because:

  • when the buffer's content is not aligned with T, many of its methods cause UB
  • when used with null_count > 0, many calls panic as the null bitmap is None, but the null_count != 0
  • when used with null_count > 0, it creates an array out of spec (as the buffer for the null bitmap is None but the null count is not zero)

Since:

  • a change in this method's signature (to either add the bitmap or remove null_count) requires a backward incompatible change
  • it is only used in tests
  • we have good offers to create primitive arrays:
    • from an ArrayData,
    • from a vector or vector of optionals
    • from an iterator

This PR removes it.

@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new. ARROW-10448: [Rust] Remove PrimitiveArray::new that is out of spec Oct 31, 2020
@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new that is out of spec ARROW-10448: [Rust] Remove PrimitiveArray::new that causes UB Oct 31, 2020
@github-actions
Copy link

@jorgecarleitao jorgecarleitao changed the title ARROW-10448: [Rust] Remove PrimitiveArray::new that causes UB ARROW-10448: [Rust] Remove PrimitiveArray::new that can cause UB Nov 1, 2020
@nevi-me nevi-me self-requested a review November 3, 2020 22:24
@nevi-me
Copy link
Contributor

nevi-me commented Nov 3, 2020

I think we have a similar problem with building arrays from ArrayDataBuilder because when building ArrayData, we don't check if the length of the array corresponds with the buffer's length (or if it's even specified).

So something like the below gets created, but ends up as a 0-len array

// try build array data without specifying length
ArrayData::builder(DataType::_).buffers(vec![buffer1, buffer2]).build()

@jorgecarleitao
Copy link
Member Author

I think we have a similar problem with building arrays from ArrayDataBuilder because when building ArrayData, we don't check if the length of the array corresponds with the buffer's length (or if it's even specified).

So something like the below gets created, but ends up as a 0-len array

// try build array data without specifying length
ArrayData::builder(DataType::_).buffers(vec![buffer1, buffer2]).build()

That is a very good point.

I think that we have a trade-off here: do we check that the ArrayData is according to spec (and incur the testing cost), or do we not check and live with potential UB? We can have APIs for a safe and unsafe, but we still need to decide what is public and what is not :P

This could be a good topic for the mailing list, as it is a general question, specially for IPC and the c data interface.

@alamb
Copy link
Contributor

alamb commented Nov 6, 2020

Is it the case that "UB" means "Undefined Behavior"?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good improvement to me. 👍 It also makes it clearer to people how to construct arrays (aka use the Builders) rather than also potentially using PrimitiveArray::new directly

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants