-
Notifications
You must be signed in to change notification settings - Fork 70
Shredded Variant Test files written from Go #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@aihuaxu can you take a look at these please? |
|
Thanks @zeroshade I also did similar quickly from GO in #93. I have tested out and see apache/arrow-go#476 for two issues I notice there. |
|
@aihuaxu I addressed those issues when I regenerated these here, I didn't have permission to push to your fork so that's why I created this PR 😄 |
Yeah Same for me. I have verified the newly files addressed those issue. Thanks a lot. |
@zeroshade these were generated by reading in the Java files and then writing them out via Go? Does Go do any of its own shredding or is it just round-tripping the Arrow representation (my main concern are safe-guards in place if Arrow representation fed in is not correct)? |
|
It was simply quicker to read in the files generated by Java as Arrow and then write them back out to Parquet (properly marking Variant types etc.) than to put together something to generate the test cases from scratch. Go has https://github.com/apache/arrow-go/blob/main/arrow/extensions/variant.go#L126 which allows creating your own shredded Variant array which can get written to Parquet with tests added by apache/arrow-go@2cf2b29.
Tests are added by apache/arrow-go#455 which performs validation when constructing the Arrow representation and writing to Parquet. This is also why arrow-go can't generate the test cases which aren't valid, the incorrect constructions cause errors. |
|
Now that #91 was merged, can this get merged? |
|
@zeroshade Since these files are the same as what #91, maybe we should not merge this? |
|
These aren't the same, these are the ones generated by go |
Yeah. I understand it's generated from GO but we should expect the same parquet files and same variant files being generated as #91, right? As I understand, the content should be same except that we are missing some invalid cases. Let me know if I misunderstand and there are any differences between two sets. |
|
I agree with @aihuaxu that we shouldn't merge duplicate files unless there are uncovered cases. |
|
That's fair. I'll close this since we've confirmed that the files generated by Go are sufficiently readable by the java implementation |
This supercedes #93 by having written out the fixed files that the original PR will be updated with.