-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Part of the removal of specialisation makes the RleDecoder remain instanticated. This is done to lose some allocations but right now appears to leave a decoder in a semi configured state that can mis-read RLE packed data. As such we just init a new one each time the parent decoder has data set on it in a fashion to the pre refactored code.
Codecov Report
@@ Coverage Diff @@
## master #8993 +/- ##
=======================================
Coverage 82.64% 82.64%
=======================================
Files 200 200
Lines 49730 49731 +1
=======================================
+ Hits 41098 41099 +1
Misses 8632 8632
Continue to review full report at Codecov.
|
|
Looks good. Could we add a test for this? |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @GregBowyer.
This PR looks good and appears to fix at least part of the problem, but part of it still remains
I ran the shell loop both with and without this code
Note I could not provoke a reliable error on master, but going back to 02a3ad8 I could.
Here are the commands I used to check. While not 100% conclusive, it seems pretty good evidence to me that PR has improved the issue (there is only one type of panic now, not two)
git checkout 02a3ad81c40c9c0e2e419c56a0635221dd4f21f2
# Then running the shell loop panic'd:
bash ~/Documents/loop.sh 2>& 1 | grep panic
...
# there are two distinct panics after a few seconds
# thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
#thread 'encodings::encoding::tests::test_bool' panicked at 'assertion failed: `(left == right)`
...
# now take the change in this PR and run it again
git cherry-pick 17055ca7a33bd1b0ba1ef244b56deb96994c4e1a
bash ~/Documents/loop.sh 2>& 1 | grep panic
# I let it run for 5 minutes and now I only see the 'invalid byte' error
thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
(https://issues.apache.org/jira/browse/ARROW-10943?focusedCommentId=17252273&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17252273) against latest master I could no longer cause trigger reliably.
@Dandandan -- I agree a more deterministic test would be nice, though this code is certainly covered by CI (how we found the issue initially)
@GregBowyer you posted a test on the jira ticket. What would you think about including that in this PR?
I personally suggest we merge this in and then work on the next panic as a follow on PR
|
The test would need clearning up, and I wonder if it should be ported to proptest which would do the same thing. |
|
The panic I might need some help with. The instructions above dont give me any crash for several hours. Could it be OS, platform ,cpu related (see #8948 ? |
|
@GregBowyer -- yes I definitely think whatever is wrong here is very dependent on memory layout, os, whatever. I have filed https://issues.apache.org/jira/browse/ARROW-11027 to keep on tracking down the error. I'll try some of the code in #8498 to see if perhaps that is the final piece of the puzzle |
Part of the removal of specialisation makes the RleDecoder remain
instanticated. This is done to lose some allocations but right now
appears to leave a decoder in a semi configured state that can mis-read
RLE packed data.
As such we just init a new one each time the parent decoder has data set
on it in a fashion to the pre refactored code.