Skip to content

Seal Array trait#9092

Merged
alamb merged 3 commits intoapache:mainfrom
tustvold:seal-arrow-array
Jan 7, 2026
Merged

Seal Array trait#9092
alamb merged 3 commits intoapache:mainfrom
tustvold:seal-arrow-array

Conversation

@tustvold
Copy link
Contributor

@tustvold tustvold commented Jan 2, 2026

Which issue does this PR close?

Rationale for this change

This trait is not meant to be overridden, and doing so will break many kernels in sometimes subtle ways.

What changes are included in this PR?

Seals the Array trait to prevent implementation outside of arrow-array.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 2, 2026
@tustvold tustvold added the api-change Changes to the arrow API label Jan 2, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold @viirya and @Jefffrey

@alamb
Copy link
Contributor

alamb commented Jan 3, 2026

BTW while this is technically an API change and should wait for the next major release, I think it is important enough to merge for 57.2.0 (a minor release) and will not be disruptive (as I don't think anyone has implemented this)

I had a brief look around geoarrow as it is well designed in my opinion and a non trivial extension on top of Arrayss - https://github.com/geoarrow/geoarrow-rs/blob/main/rust/geoarrow-array/src/trait_.rs

I didn't see any impl Array but maybe @kylebarron / @paleolimbot could confirm that that they don't know of any potential downstream issues of making it impossible to impl Array.

Copy link
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement Array for external types but was never able to get the downcasting working because of the closed nature of DataType. So I don't think it was previously possible to implement Array externally: sealing the array just makes it explicit

@gabotechs
Copy link
Contributor

Such a same getting this trait sealed. I was finding it very convenient for implementing a GPU-based dyn Array baked by https://github.com/rapidsai/cudf in https://github.com/gabotechs/libcudf-rs.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@tustvold
Copy link
Contributor Author

tustvold commented Jan 5, 2026

I presume you're referring to https://github.com/gabotechs/libcudf-rs/blob/main/src/column_view.rs#L125

If so why not just define your own trait, it looks like you're only using a very limited subset of Array anyway, and passing that array into any arrow-rs kernel will cause it at best panic (as downcast won't work correctly).

@gabotechs
Copy link
Contributor

@tustvold It's mainly for wiring it up with https://github.com/apache/datafusion. As DataFusion moves RecordBatches between nodes, which contain dynamic dyn Arrays, that was my means of transportation.

Not saying that the change in this PR does not make sense though, I believe it does, but I wonder what could be the alternative. Maybe letting DataFusion be the one that exposes a customizable trait for transporting data?

@scovich
Copy link
Contributor

scovich commented Jan 5, 2026

I tried to implement Array for external types but was never able to get the downcasting working because of the closed nature of DataType. So I don't think it was previously possible to implement Array externally: sealing the array just makes it explicit

When implementing the variant extension type, we also tried to useArray, but we hit the same 1:1 constraint -- every concrete Array must have a corresponding DataType enum variant in order for downcasting to work properly, and every DataType enum variant must have its own concrete type implementing Array for the same reason. In our case, the (proposed) Array::data_type method for VariantArray had to return DataType::Struct, but attempting to downcast it as StructArray would fail/panic.

So the DataType enum already logically seals the Array trait, but the compiler cannot enforce that. Actually sealing the trait just closes the compiler loophole that currently allows (always provably and unfixably incorrect) third-party implementations of the trait.

Aside: the same 1:1 constrait is also why Array cannot (and must never) require Any, even tho it makes casting a lot harder. Because &dyn Array and Arc<dyn Array> both impl Array (as does &Arc<Arc<dyn Array>> by transitivity) attempting any kind of casting via Array: Any would wreak havoc whenever those convenience wrapper impl are involved.

@scovich
Copy link
Contributor

scovich commented Jan 5, 2026

DataFusion moves RecordBatches between nodes, which contain dynamic dyn Arrays, that was my means of transportation.

Not saying that the change in this PR does not make sense though, I believe it does, but I wonder what could be the alternative. Maybe letting DataFusion be the one that exposes a customizable trait for transporting data?

Could it pass ArrayData instead? Should be easy enough to do e.g. StructArray::from(record_batch).into_data() and then RecordBatch::from(StructArray::from(array_data))?

@alamb
Copy link
Contributor

alamb commented Jan 6, 2026

I am expecting an issue shortly for this PR, and then I think we should merge it to avoid confusion.

I realize this will cause @gabotechs some pain downstream, but let's figure out a better API for backing arrays with GPU memory as a follow on

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it pass ArrayData instead? Should be easy enough to do e.g. StructArray::from(record_batch).into_data() and then RecordBatch::from(StructArray::from(array_data))?

Yeah, I think I can do either this or something similar, thanks for the options! this change makes sense to me 👍

@alamb
Copy link
Contributor

alamb commented Jan 7, 2026

Thanks again for everyone's thoughts. Also thanks to @shinmao for the report

@alamb alamb merged commit 721f373 into apache:main Jan 7, 2026
27 checks passed
@waynexia
Copy link
Member

Another one impl Array for T example here https://github.com/GreptimeTeam/greptimedb/blob/e0285209cb62b398ce89a033a1101991d4a28b27/src/promql/src/range_array.rs#L228 . It wraps a dictionary array into a matrix-like structure.

Despite whether this impl is reasonable, or has a better alternative, Array is one of the biggest abstractions exposed from arrow. Many downstream lib like datafusion only accepts Array (record batch), this sealing also removes a large hack point to play around with data representation and computation. And adding a new implementation of Array to this arrow library is a hard long journey (like variant array) or not even acceptable (like range array for a specific usage). IMHO, this breaking is shipped too fast, and on a minor version.

@alamb
Copy link
Contributor

alamb commented Jan 15, 2026

@waynexia I believe the concern and speed was related to the soundness report here

Given your usecase I believe we should find some other way to sort this out. I filed a ticket to discuss the idea here

I am happy to run another arrow release on the 57 line to resolve this issue once we have consensus

gabotechs added a commit to gabotechs/arrow-rs that referenced this pull request Jan 21, 2026
alamb pushed a commit that referenced this pull request Jan 26, 2026
This reverts commit 721f373.

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Discussed in #9184
- Closes #9184
 
# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

As discussed in Discussed in
#9184, rather than sealing the
`Array` trait, it's going to potentially be marked `unsafe` so that
projects have time to find alternatives for their use-cases that are now
satisfied by a custom `Array` implementation.


# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Clean revert of #9092

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Not applicable here

# Are there any user-facing changes?

Yes, users can no implement the `Array` trait in their codebases again.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
alamb pushed a commit to alamb/arrow-rs that referenced this pull request Jan 30, 2026
…ache#9234)

This reverts commit 721f373.

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Discussed in apache#9184
- Closes apache#9184
 
# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

As discussed in Discussed in
apache#9184, rather than sealing the
`Array` trait, it's going to potentially be marked `unsafe` so that
projects have time to find alternatives for their use-cases that are now
satisfied by a custom `Array` implementation.


# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Clean revert of apache#9092

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Not applicable here

# Are there any user-facing changes?

Yes, users can no implement the `Array` trait in their codebases again.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
alamb added a commit that referenced this pull request Feb 2, 2026
…unsafe` (#9234) (#9313)

- Part of #9240
- Related to #9184

This is a backport of the following PR to the 57 line
- #9234 from @gabotechs  

As discussed in #9184, rather
than sealing the Array trait which broke several downstream crates, this
PR markes the trait as `unsafe` so that projects have time to find
alternatives for their use-cases that are now satisfied by a custom
Array implementation.

Co-authored-by: Gabriel <45515538+gabotechs@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Soundness Bug in try_binary when Array is implemented incorrectly in external crate

9 participants