-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-43624: [Go] Add JSON/UUID extension types, extend arrow -> parquet logical type mapping #43679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the thorough review @pitrou. The UUID extension was moved from a previously |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my very basic knowledge of Go, this looks ok to me. I'd rather @zeroshade vet it though :-)
go/arrow/extensions/json.go
Outdated
|
|
||
| // GetOneForMarshal implements arrow.Array. | ||
| func (a *JSONArray) GetOneForMarshal(i int) interface{} { | ||
| return a.Value(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this instead use ValueJSON and return the json.RawMessage instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes more sense!
go/arrow/extensions/json_test.go
Outdated
| expectedJSON := strings.ReplaceAll(` | ||
| {"json":"foobar"} | ||
| {"json":null} | ||
| {"json":{"foo":"bar"}} | ||
| {"json":42} | ||
| {"json":true} | ||
| {"json":[1,true,"3",null,{"five":5}]}`, | ||
| "\t", "") + "\n" // strip indentation, add trailing newline | ||
| require.Equal(t, expectedJSON, buf.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably use JSONEq instead of needing to do the ReplaceAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecordToJSON returns jsonlines rather than json itself, but it is still cleaner to do a line-by-line JSONEq comparison. I just added that.
| *array.ExtensionBuilder | ||
| } | ||
|
|
||
| func NewUUIDBuilder(mem memory.Allocator) *UUIDBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have missed that, just added one!
zeroshade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple nits to address first, otherwise I'm good with this
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d47b305. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 13 possible false positives for unstable benchmarks that are known to sometimes produce them. |
#### Summary Goes with cloudquery/plugin-pb-go#445 Interesting bit is https://arrow.apache.org/blog/2024/10/23/arrow-go-18.0.0-release/#canonical-extension-types that we might be able to drop our UUID and JSON own extension types, but probably better to do that separately Had to do 3ae4778 to get the tests working, followed apache/arrow#43679 as a reference ---
Rationale for this change
JSONextension type implementation.What changes are included in this PR?
UUIDTypefrominternaltoarrow/extensionsJSONcanonical extension typeCustomParquetTypeinterface so extension types can specify their targetLogicalTypein ParquetfieldToNodeto split upPrimitiveNodetype mapping for leaves fromGroupNodecompositionLogicalTypeto use only value receiversAre these changes tested?
Yes
Are there any user-facing changes?
UUIDandJSONextension types are available to end users.CustomParquetTypeinterface to control Parquet conversion without needing to fork or upstream the change.