You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In certain circumstances, the CLI will fail to read old (perhaps ancient) parquet files that have an incorrect compressed_size field set in the column metadata that does not include the dictionary page (at least according to the comment in the code). The code that is supposed to handle this does not flip the byte buffer it reads the extra bytes into. It appears to have been broken for a few years now.
I have written a PR that includes a defective parquet file with this issue, wrote a unit test that fails without the additional flip, and validated that the code works afterwards.
This is a trivial minor issue that was from learning the code rather than actually addressing a production issue, so there's no urgency.
The text was updated successfully, but these errors were encountered:
pyckle
added a commit
to pyckle/parquet-java
that referenced
this issue
Jun 23, 2024
In certain circumstances, the CLI will fail to read old (perhaps ancient) parquet files that have an incorrect compressed_size field set in the column metadata that does not include the dictionary page (at least according to the comment in the code). The code that is supposed to handle this does not flip the byte buffer it reads the extra bytes into. It appears to have been broken for a few years now.
I have written a PR that includes a defective parquet file with this issue, wrote a unit test that fails without the additional
flip
, and validated that the code works afterwards.This is a trivial minor issue that was from learning the code rather than actually addressing a production issue, so there's no urgency.
The text was updated successfully, but these errors were encountered: