GH-37582: [Go][Parquet] Implement Float16 logical type #37599

benibus · 2023-09-06T23:08:01Z

Rationale for this change

There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (#36073), so we should add one for Go as well.

What changes are included in this PR?

Adds LogicalType definitions and methods for Float16
Adds support for Float16 column statistics and comparators
Adds support for interchange between Parquet and Arrow's half-precision float

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Closes: [Go][Parquet] Implement Float16 logical type #37582

github-actions · 2023-09-06T23:08:32Z

⚠️ GitHub issue #37582 has been automatically assigned in GitHub to PR creator.

benibus

This should cover the base requirements, but LMK if I missed anything. There's a bit of weird special-casing involved with the column statistics since float16 requires an independent Statistics type despite not being physical - but maybe there's a better way.

go/parquet/metadata/statistics_types.tmpldata

benibus · 2023-09-14T17:54:32Z

go/parquet/internal/testutils/random.go

+	for {
+		f16 := float16.FromBits(uint16(r.Uint64n(math.MaxUint16 + 1)))
+		f64 := float64(f16.Float32())
+		if !math.IsNaN(f64) && !math.IsInf(f64, 0) {


The other randFloat functions don't exclude infinities. Should we change that? The checks for approximate equality will fail when comparing infinities of the same sign.

It might make more sense to leave the NaN's and infinities to ensure we're covering those cases for testing. Alternately we could just make sure we test those cases separately

zeroshade

Overall this looks great, just a few nitpicks.

go/arrow/float16/float16.go

go/parquet/file/column_writer_types.gen.go.tmpl

zeroshade · 2023-09-27T18:14:43Z

go/parquet/internal/testutils/random.go

+	for {
+		f16 := float16.FromBits(uint16(r.Uint64n(math.MaxUint16 + 1)))
+		f64 := float64(f16.Float32())
+		if !math.IsNaN(f64) && !math.IsInf(f64, 0) {


It might make more sense to leave the NaN's and infinities to ensure we're covering those cases for testing. Alternately we could just make sure we test those cases separately

go/arrow/float16/float16.go

go/parquet/schema/reflection_test.go

benibus · 2023-10-18T19:07:27Z

Note that the updated parquet.thrift file that this uses is currently contained in the C++ PR #36073, which will need to be synced with apache/parquet-format#184 once it's finalized.

So, fair warning: regenerating the thrift files from this branch alone will basically break everything (for now).

zeroshade · 2023-10-18T19:38:00Z

Overall this is looking good to me, though I won't merge this until after the format PR with the thrift changes gets finalized and merged.

zeroshade · 2023-10-30T15:17:48Z

@benibus the fix for the failing CIs has been merged, can you rebase and update this so that we can get a clean CI on it? I believe the float16 format has been merged so once you rebase and we have all green i'll be happy to merge this!

benibus · 2023-10-31T18:16:35Z

Rebased. A couple CI failures still, but they look unrelated.

zeroshade · 2023-11-13T18:30:12Z

Can you please resolve the conflicts (and potentially rebase if necessary)? We can check the CI afterwards

NaNs are still excluded from `randFloat16`, as this matches the native float equivalents

benibus · 2023-11-13T20:54:12Z

Alright, I've fixing the conflicts and rebased again. It looks like everything's green now.

zeroshade

LGTM

…37599) ### Rationale for this change There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (apache#36073), so we should add one for Go as well. ### What changes are included in this PR? - [x] Adds `LogicalType` definitions and methods for `Float16` - [x] Adds support for `Float16` column statistics and comparators - [x] Adds support for interchange between Parquet and Arrow's half-precision float ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37582 Authored-by: benibus <[email protected]> Signed-off-by: Matt Topol <[email protected]>

conbench-apache-arrow · 2023-11-13T23:04:01Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit bff5fb9.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

…37599) ### Rationale for this change There is an active proposal for a Float16 logical type in Parquet (apache/parquet-format#184) with C++/Python implementations in progress (apache#36073), so we should add one for Go as well. ### What changes are included in this PR? - [x] Adds `LogicalType` definitions and methods for `Float16` - [x] Adds support for `Float16` column statistics and comparators - [x] Adds support for interchange between Parquet and Arrow's half-precision float ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * Closes: apache#37582 Authored-by: benibus <[email protected]> Signed-off-by: Matt Topol <[email protected]>

github-actions bot added Component: Go awaiting review Awaiting review labels Sep 6, 2023

benibus force-pushed the GH-37582-parquet-float16 branch from 88e0c88 to 07eae6f Compare September 14, 2023 16:30

benibus commented Sep 14, 2023

View reviewed changes

benibus marked this pull request as ready for review September 14, 2023 18:21

benibus requested a review from zeroshade as a code owner September 14, 2023 18:21

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 14, 2023

zeroshade requested changes Sep 27, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 27, 2023

benibus force-pushed the GH-37582-parquet-float16 branch from 07eae6f to c0109db Compare October 18, 2023 18:35

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 18, 2023

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 18, 2023

This was referenced Oct 24, 2023

Add Float16/Half-float logical type to Parquet apache/arrow-rs#4986

Closed

Add Float16/Half-float logical type to Parquet jorgecarleitao/arrow2#1585

Open

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 26, 2023

benibus force-pushed the GH-37582-parquet-float16 branch from 3a1fe0f to f0ead7d Compare October 31, 2023 14:41

benibus added 3 commits November 13, 2023 14:19

Regenerate files from parquet.thrift

272493a

Add foundational type defs and methods

83afd36

Additions to arrow.float16 pkg

3098461

benibus added 8 commits November 13, 2023 14:32

Implement Float16Statistics

74e7843

Support float16 in RNG test utils

17b754a

Add pqarrow schema conversions

fa73b2b

Support read/write to/from arrow.FLOAT16

f07c116

More arrow.float16 additions/enhancements

6519b61

Address feedback on column writer statistics

316d390

Address testing feedback

f3dc4ce

NaNs are still excluded from `randFloat16`, as this matches the native float equivalents

Support reflection from float16.Num

a7922f9

benibus force-pushed the GH-37582-parquet-float16 branch from f0ead7d to a7922f9 Compare November 13, 2023 20:07

zeroshade approved these changes Nov 13, 2023

View reviewed changes

zeroshade merged commit bff5fb9 into apache:main Nov 13, 2023
24 checks passed

zeroshade removed the awaiting change review Awaiting change review label Nov 13, 2023

github-actions bot added the awaiting merge Awaiting merge label Nov 13, 2023

benibus mentioned this pull request Nov 13, 2023

GH-36036: [C++][Python][Parquet] Implement Float16 logical type #36073

Merged

zhangjiashen mentioned this pull request Nov 29, 2023

PARQUET-1647: [Java][Parquet] Implement FLOAT16 logical type apache/parquet-java#1142

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-37582: [Go][Parquet] Implement Float16 logical type #37599

GH-37582: [Go][Parquet] Implement Float16 logical type #37599

benibus commented Sep 6, 2023 •

edited

Loading

github-actions bot commented Sep 6, 2023

benibus left a comment

benibus Sep 14, 2023

zeroshade Sep 27, 2023

zeroshade left a comment

zeroshade Sep 27, 2023

benibus commented Oct 18, 2023

zeroshade commented Oct 18, 2023

zeroshade commented Oct 30, 2023

benibus commented Oct 31, 2023

zeroshade commented Nov 13, 2023

benibus commented Nov 13, 2023

zeroshade left a comment

conbench-apache-arrow bot commented Nov 13, 2023

GH-37582: [Go][Parquet] Implement Float16 logical type #37599

GH-37582: [Go][Parquet] Implement Float16 logical type #37599

Conversation

benibus commented Sep 6, 2023 • edited Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Sep 6, 2023

benibus left a comment

Choose a reason for hiding this comment

benibus Sep 14, 2023

Choose a reason for hiding this comment

zeroshade Sep 27, 2023

Choose a reason for hiding this comment

zeroshade left a comment

Choose a reason for hiding this comment

zeroshade Sep 27, 2023

Choose a reason for hiding this comment

benibus commented Oct 18, 2023

zeroshade commented Oct 18, 2023

zeroshade commented Oct 30, 2023

benibus commented Oct 31, 2023

zeroshade commented Nov 13, 2023

benibus commented Nov 13, 2023

zeroshade left a comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Nov 13, 2023

benibus commented Sep 6, 2023 •

edited

Loading