-
Notifications
You must be signed in to change notification settings - Fork 96
GH-735: Support write arrow record batch #904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b815f6d to
99fea13
Compare
99fea13 to
3cc6e16
Compare
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
|
@lidavidm Hi, could you please help review this pr when you are free? |
Hi @luoyuxia I really appreciate that you have implemented this, however I was trying to test this implementation for ParquetWriter exception: error loading native lib arrow_dataset_jni/x86_64/arrow_dataset_jni.dll FileNotFoundException Is there any other way I can test this? or I need to wait until it gets merged in main branch? |
|
@V-Fenil Hi, thanks for your interest. I already built .so(for linux), .dylib(for mac). But I don't have windows env, so I can't provide .dll for you. To verify this pr, you'll need to build from source, see https://github.com/apache/arrow-java?tab=readme-ov-file#building-from-source. That's also what I did to verify my pr. |
Hi @luoyuxia I'm testing your PR on linux. Could you share the built libarrow_dataset_jni.so file? I can build java but need the native library. (more specific my build was success but I can't find .so file) Total build time was 49 mins |
Of course I can share it. I can share you the |
|
You can download the binaries built by CI: https://github.com/apache/arrow-java/actions/runs/19222857860?pr=904 |
Let me try what @lidavidm suggested if that does not work, I will reach out to you separately. |
I'm testing PR #904 (native Parquet writer via JNI) on Ubuntu 22.04/WSL2 with Java 11 Setup:
Error: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (128) Analysis: Questions:
The Java code is simply: Any guidance would be appreciated. Thanks for this PR - looking forward to using |
Hi, what's your code? I used the following in my local mac os but can't reproduce it I'll try to find time to reproduce it in linux. |
Thanks for the quick response! Here's a minimal test case that reproduces the issue on Test Code: import java.io.File; public class MinimalParquetTest { } Environment:
Maven Dependencies: Run Command: Stack Trace: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (128) Key Differences from macOS: Things I've verified:
Would appreciate any Linux-specific setup steps I might be missing. Thanks! |
1461231 to
03419e5
Compare
@V-Fenil Hi, I spent some time debug in linux env and did find some problem. The |
|
@luoyuxia sure thanks, can you email me files. I will try to import that .so files first and run my test case. [email protected] |
What's Changed
New Features
Added Parquet Writer Support: Introduced ParquetWriter class to write Arrow VectorSchemaRoot to Parquet files via JNI
Implementation
C++ Side:
This contains breaking changes.
Closes #735