-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Use lz4_flex for wasm32
#91
Conversation
Thanks a lot for the PR! Would it be possible to use a feature flag instead of a conditional on the target? Folks not in wasm may want to use lz4_flex regardless of wasm, so maybe a feature could make more sense here? This also allows us to add to the CI to run the tests with and without this flag, thereby covering both branches without having to run the tests under wasm. |
Codecov Report
@@ Coverage Diff @@
## main #91 +/- ##
=======================================
Coverage 67.62% 67.62%
=======================================
Files 69 69
Lines 3830 3830
=======================================
Hits 2590 2590
Misses 1240 1240
Continue to review full report at Codecov.
|
Yes that makes a ton of sense. I've tried to update this to use a feature flag instead, where if I'm not sure the optimal way to add a test for this; I added a CI unit test call with no default features and with lz4_flex added instead of lz4. I'm also still not sure I'm using the |
Sorry for the delay, I am focused on releasing arrow2 0.10, so this is a bit delayed on my end. What I would do here is to have a test where we have a pair This requires 1 test (with the pair) and a new entry in the ci / github workflows with the wasm/lz4_flex feature active, to show that the test runs on both cases. Does it make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed the test. It looks good to me. Thanks again!
I did previously add the line in But I'm still getting issues when trying to actually test reading a minimal parquet file. And it appears to not work either with the I created a simple pure-rust script here (https://github.com/kylebarron/parquet-wasm/pull/36/files#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc). When I run with the
When I run with the
That's when reading this Parquet file created by this Python script. And it's readable in Pyarrow: In [2]: import pyarrow.parquet as pq
In [3]: pq.ParquetFile('1-partition-lz4.parquet').read()
Out[3]:
pyarrow.Table
str: string
uint8: uint8
int32: int32
bool: bool |
Ok, main lz4 has been fixed and has tests for lz4 (against pyarrow). This should unblock us here :) |
let me know if you need any help or guidance. We may also ping the developer of lz4_flex for help here. |
I haven't had time to look at this this week. That said, even if the CI were green, I'd want to test this against files created with the the original LZ4 compression (and from pyarrow), to make sure we're using the API correctly in both cases. For example, the failing CI test looks to be a simple round trip of compression and decompression, and while the LZ4 flex docs have a simple example of block format roundtrip, using those functions might satisfy this CI test, but not the actual Parquet spec. I guess it would be ideal to run the pyarrow integration test both with |
Rebased against main and fixed the issue here #124 . Could you take a look @kylebarron and see if it addresses your concerns? |
Closed (done as part of #124 ) |
Disclaimer: I'm new to Rust but motivated to get LZ4 Parquet decompression working in wasm. 🙂
I'm working on wasm bindings to
arrow2
/parquet2
. So far it appears that all other compressions other than LZ4 are working. I tried to switch tolz4_flex
on this branch but I'm not sure I gated the wasm target correctly. Tests seem to pass on this branch but when I try to load a Parquet file with LZ4 encoding on the web I get an errorUncaught (in promise) External format error: underlying IO error: WrongMagicNumber
. Maybe I'm using the wronglz4_flex
APIs.Ref #85