Skip to content

Conversation

@GregBowyer
Copy link
Contributor

Part of the removal of specialisation makes the RleDecoder remain
instanticated. This is done to lose some allocations but right now
appears to leave a decoder in a semi configured state that can mis-read
RLE packed data.

As such we just init a new one each time the parent decoder has data set
on it in a fashion to the pre refactored code.

Part of the removal of specialisation makes the RleDecoder remain
instanticated. This is done to lose some allocations but right now
appears to leave a decoder in a semi configured state that can mis-read
RLE packed data.

As such we just init a new one each time the parent decoder has data set
on it in a fashion to the pre refactored code.
@github-actions
Copy link

@codecov-io
Copy link

Codecov Report

Merging #8993 (17055ca) into master (0519c4c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8993   +/-   ##
=======================================
  Coverage   82.64%   82.64%           
=======================================
  Files         200      200           
  Lines       49730    49731    +1     
=======================================
+ Hits        41098    41099    +1     
  Misses       8632     8632           
Impacted Files Coverage Δ
rust/parquet/src/encodings/decoding.rs 92.85% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 081728f...17055ca. Read the comment docs.

@Dandandan
Copy link
Contributor

Looks good. Could we add a test for this?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @GregBowyer.

This PR looks good and appears to fix at least part of the problem, but part of it still remains

I ran the shell loop both with and without this code

Note I could not provoke a reliable error on master, but going back to 02a3ad8 I could.

Here are the commands I used to check. While not 100% conclusive, it seems pretty good evidence to me that PR has improved the issue (there is only one type of panic now, not two)

git checkout  02a3ad81c40c9c0e2e419c56a0635221dd4f21f2
# Then running the shell loop panic'd:
bash ~/Documents/loop.sh 2>& 1 | grep panic
...
# there are two distinct panics after a few seconds
# thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
#thread 'encodings::encoding::tests::test_bool' panicked at 'assertion failed: `(left == right)`
...
# now take the change in this PR and run it again
git cherry-pick 17055ca7a33bd1b0ba1ef244b56deb96994c4e1a
bash ~/Documents/loop.sh 2>& 1 | grep panic

# I let it run for 5 minutes and now I only see the 'invalid byte' error

thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18 

(https://issues.apache.org/jira/browse/ARROW-10943?focusedCommentId=17252273&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17252273) against latest master I could no longer cause trigger reliably.

@Dandandan -- I agree a more deterministic test would be nice, though this code is certainly covered by CI (how we found the issue initially)

@GregBowyer you posted a test on the jira ticket. What would you think about including that in this PR?

I personally suggest we merge this in and then work on the next panic as a follow on PR

@GregBowyer
Copy link
Contributor Author

The test would need clearning up, and I wonder if it should be ported to proptest which would do the same thing.

@GregBowyer
Copy link
Contributor Author

The panic I might need some help with. The instructions above dont give me any crash for several hours. Could it be OS, platform ,cpu related (see #8948 ?

@alamb
Copy link
Contributor

alamb commented Dec 24, 2020

@GregBowyer -- yes I definitely think whatever is wrong here is very dependent on memory layout, os, whatever.

I have filed https://issues.apache.org/jira/browse/ARROW-11027 to keep on tracking down the error.

I'll try some of the code in #8498 to see if perhaps that is the final piece of the puzzle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants