Skip to content

Conversation

@zeroshade
Copy link
Member

This supercedes #93 by having written out the fixed files that the original PR will be updated with.

@zeroshade
Copy link
Member Author

@aihuaxu can you take a look at these please?

@aihuaxu
Copy link
Contributor

aihuaxu commented Aug 17, 2025

Thanks @zeroshade I also did similar quickly from GO in #93. I have tested out and see apache/arrow-go#476 for two issues I notice there.

@zeroshade
Copy link
Member Author

@aihuaxu I addressed those issues when I regenerated these here, I didn't have permission to push to your fork so that's why I created this PR 😄

@aihuaxu
Copy link
Contributor

aihuaxu commented Aug 18, 2025

@aihuaxu I addressed those issues when I regenerated these here, I didn't have permission to push to your fork so that's why I created this PR 😄

Yeah Same for me.

I have verified the newly files addressed those issue. Thanks a lot.

@emkornfield
Copy link

This supercedes #93 by having written out the fixed files that the original PR will be updated with.

@zeroshade these were generated by reading in the Java files and then writing them out via Go? Does Go do any of its own shredding or is it just round-tripping the Arrow representation (my main concern are safe-guards in place if Arrow representation fed in is not correct)?

@zeroshade
Copy link
Member Author

It was simply quicker to read in the files generated by Java as Arrow and then write them back out to Parquet (properly marking Variant types etc.) than to put together something to generate the test cases from scratch.

Go has https://github.com/apache/arrow-go/blob/main/arrow/extensions/variant.go#L126 which allows creating your own shredded Variant array which can get written to Parquet with tests added by apache/arrow-go@2cf2b29.

(my main concern are safe-guards in place if Arrow representation fed in is not correct)?

Tests are added by apache/arrow-go#455 which performs validation when constructing the Arrow representation and writing to Parquet. This is also why arrow-go can't generate the test cases which aren't valid, the incorrect constructions cause errors.

@zeroshade
Copy link
Member Author

Now that #91 was merged, can this get merged?

@aihuaxu
Copy link
Contributor

aihuaxu commented Aug 21, 2025

@zeroshade Since these files are the same as what #91, maybe we should not merge this?

@zeroshade
Copy link
Member Author

These aren't the same, these are the ones generated by go

@aihuaxu
Copy link
Contributor

aihuaxu commented Aug 22, 2025

These aren't the same, these are the ones generated by go

Yeah. I understand it's generated from GO but we should expect the same parquet files and same variant files being generated as #91, right? As I understand, the content should be same except that we are missing some invalid cases.

Let me know if I misunderstand and there are any differences between two sets.

@wgtmac
Copy link
Member

wgtmac commented Aug 22, 2025

I agree with @aihuaxu that we shouldn't merge duplicate files unless there are uncovered cases.

@zeroshade
Copy link
Member Author

That's fair. I'll close this since we've confirmed that the files generated by Go are sufficiently readable by the java implementation

@zeroshade zeroshade closed this Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants