Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Jul 28, 2024

Rationale for this change

Add the newly ratified extension type.

What changes are included in this PR?

The C++/Python implementation only.

Are these changes tested?

Yes

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #43454 has been automatically assigned in GitHub to PR creator.

@lidavidm lidavidm marked this pull request as ready for review July 29, 2024 05:12
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my review comments from #41823 (review)

For the Python bindings, looks good as well, just a few minor comments:

  • We probably want to add OpaqueScalar to pyarrow/__init__.py as well? (although I see we forgot this for the tensor type, which I assume you have been imitating)
  • Can you add the three new classes to test_extension_type_constructor_errors in test_misc.py?
  • Can you list the new type in the python API docs (docs/source/python/api/datatypes.rst, docs/source/python/api/arrays.rst
    )? See eg #39652 for a previous PR adding a new type where this is added to the docs.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 30, 2024
@lidavidm
Copy link
Member Author

lidavidm commented Aug 2, 2024

I believe I've fixed everything here, thanks for the reviews!

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked in detail at the C++ side again, but the Python changes look good!

@lidavidm
Copy link
Member Author

lidavidm commented Aug 6, 2024

Any further comments here?

@joellubi
Copy link
Member

joellubi commented Aug 6, 2024

Any further comments here?

None for me. I based some of my work on Bool8 on the implementation here, so ideally we could get another C++ reviewer to approve as a third party.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple minor comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: checked_cast?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also compare with an extension type that is not Opaque?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Aug 8, 2024
@github-actions github-actions bot removed the awaiting changes Awaiting changes label Aug 12, 2024
@github-actions github-actions bot added the awaiting change review Awaiting change review label Aug 12, 2024
@lidavidm
Copy link
Member Author

I'm going to merge this in a bit given the radio silence

@lidavidm lidavidm merged commit 6e7125b into apache:main Aug 14, 2024
@lidavidm lidavidm removed the awaiting change review Awaiting change review label Aug 14, 2024
@lidavidm lidavidm deleted the gh-43454 branch August 14, 2024 01:41
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6e7125b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 38 possible false positives for unstable benchmarks that are known to sometimes produce them.

felipecrv pushed a commit that referenced this pull request Aug 21, 2024
### Rationale for this change

C++ and Python implementations of #43234

### What changes are included in this PR?

- Implement C++ `Bool8Type`, `Bool8Array`, `Bool8Scalar`, and tests
- Implement Python bindings to C++, as well as zero-copy numpy conversion methods
- TODO: docs waiting for rebase on #43458

### Are these changes tested?

Yes

### Are there any user-facing changes?

Bool8 extension type will be available in C++ and Python libraries

* GitHub Issue: #17682

Authored-by: Joel Lubinitsky <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
@rok
Copy link
Member

rok commented Aug 21, 2024

Apologies for the late question. Is opaque type intentionally not registered?

static void CreateGlobalRegistry() {
g_registry = std::make_shared<ExtensionTypeRegistryImpl>();
#ifdef ARROW_JSON
// Register canonical extension types
auto fst_ext_type =
checked_pointer_cast<ExtensionType>(extension::fixed_shape_tensor(int64(), {}));
ARROW_CHECK_OK(g_registry->RegisterType(fst_ext_type));
auto bool8_ext_type = checked_pointer_cast<ExtensionType>(extension::bool8());
ARROW_CHECK_OK(g_registry->RegisterType(bool8_ext_type));
#endif
}

If it's an oversight I can add registration in #37298.

@lidavidm
Copy link
Member Author

It was an oversight, thanks for spotting it

@jorisvandenbossche
Copy link
Member

If it's an oversight I can add registration in #37298.

Let's do that as a separate PR (it should also require to update the tests slightly, and then it doesn't depend on the timeline of merging the other PR)

@jorisvandenbossche
Copy link
Member

Opened #43787 to keep track of it.

rok added a commit that referenced this pull request Aug 22, 2024
…3788)

This is to resolve #43787

> The Opaque extension type implementation for C++ (plus python bindings) was added in #43458, but it was not registered by default, which we should do for canonical extension types (see #43458 (comment))

Additionally, this adds `bool8` extension type builds with `ARROW_JSON=false` as discussed [here](5258819#r145613657)

### Rationale for this change

Canonical types should be registered by default if possible (except e.g. if they can't be compiled due to `ARROW_JSON=false`).

### What changes are included in this PR?

This adds default registration for `opaque`, changes when `bool8` is built and moves all canonical tests under the same test target.

### Are these changes tested?

Changes are tested by previously existing tests.

### Are there any user-facing changes?

`opaue` will now be registered by default and `bool8` will be present in case `ARROW_JSON=false` at build time.
* GitHub Issue: #43787

Authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Rok Mihevc <[email protected]>
QuietCraftsmanship pushed a commit to QuietCraftsmanship/arrow that referenced this pull request Jul 7, 2025
…3788)

This is to resolve #43787

> The Opaque extension type implementation for C++ (plus python bindings) was added in apache/arrow#43458, but it was not registered by default, which we should do for canonical extension types (see apache/arrow#43458 (comment))

Additionally, this adds `bool8` extension type builds with `ARROW_JSON=false` as discussed [here](apache/arrow@441ecb0#r145613657)

### Rationale for this change

Canonical types should be registered by default if possible (except e.g. if they can't be compiled due to `ARROW_JSON=false`).

### What changes are included in this PR?

This adds default registration for `opaque`, changes when `bool8` is built and moves all canonical tests under the same test target.

### Are these changes tested?

Changes are tested by previously existing tests.

### Are there any user-facing changes?

`opaue` will now be registered by default and `bool8` will be present in case `ARROW_JSON=false` at build time.
* GitHub Issue: #43787

Authored-by: Rok Mihevc <[email protected]>
Signed-off-by: Rok Mihevc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants