ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993

GregBowyer · 2020-12-23T02:53:58Z

Part of the removal of specialisation makes the RleDecoder remain
instanticated. This is done to lose some allocations but right now
appears to leave a decoder in a semi configured state that can mis-read
RLE packed data.

As such we just init a new one each time the parent decoder has data set
on it in a fashion to the pre refactored code.

Part of the removal of specialisation makes the RleDecoder remain instanticated. This is done to lose some allocations but right now appears to leave a decoder in a semi configured state that can mis-read RLE packed data. As such we just init a new one each time the parent decoder has data set on it in a fashion to the pre refactored code.

github-actions · 2020-12-23T03:03:37Z

https://issues.apache.org/jira/browse/ARROW-10943

codecov-io · 2020-12-23T03:16:55Z

Codecov Report

Merging #8993 (17055ca) into master (0519c4c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #8993   +/-   ##
=======================================
  Coverage   82.64%   82.64%           
=======================================
  Files         200      200           
  Lines       49730    49731    +1     
=======================================
+ Hits        41098    41099    +1     
  Misses       8632     8632

Impacted Files	Coverage Δ
rust/parquet/src/encodings/decoding.rs	`92.85% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 081728f...17055ca. Read the comment docs.

Dandandan · 2020-12-23T07:14:15Z

Looks good. Could we add a test for this?

alamb

Thank you very much @GregBowyer.

This PR looks good and appears to fix at least part of the problem, but part of it still remains

I ran the shell loop both with and without this code

Note I could not provoke a reliable error on master, but going back to 02a3ad8 I could.

Here are the commands I used to check. While not 100% conclusive, it seems pretty good evidence to me that PR has improved the issue (there is only one type of panic now, not two)

git checkout  02a3ad81c40c9c0e2e419c56a0635221dd4f21f2
# Then running the shell loop panic'd:
bash ~/Documents/loop.sh 2>& 1 | grep panic
...
# there are two distinct panics after a few seconds
# thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
#thread 'encodings::encoding::tests::test_bool' panicked at 'assertion failed: `(left == right)`
...
# now take the change in this PR and run it again
git cherry-pick 17055ca7a33bd1b0ba1ef244b56deb96994c4e1a
bash ~/Documents/loop.sh 2>& 1 | grep panic

# I let it run for 5 minutes and now I only see the 'invalid byte' error

thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18

(https://issues.apache.org/jira/browse/ARROW-10943?focusedCommentId=17252273&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17252273) against latest master I could no longer cause trigger reliably.

@Dandandan -- I agree a more deterministic test would be nice, though this code is certainly covered by CI (how we found the issue initially)

@GregBowyer you posted a test on the jira ticket. What would you think about including that in this PR?

I personally suggest we merge this in and then work on the next panic as a follow on PR

GregBowyer · 2020-12-23T20:02:31Z

The test would need clearning up, and I wonder if it should be ported to proptest which would do the same thing.

GregBowyer · 2020-12-23T20:17:19Z

The panic I might need some help with. The instructions above dont give me any crash for several hours. Could it be OS, platform ,cpu related (see #8948 ?

alamb · 2020-12-24T13:21:10Z

@GregBowyer -- yes I definitely think whatever is wrong here is very dependent on memory layout, os, whatever.

I have filed https://issues.apache.org/jira/browse/ARROW-11027 to keep on tracking down the error.

I'll try some of the code in #8498 to see if perhaps that is the final piece of the puzzle

github-actions bot added Component: Rust Component: Parquet labels Dec 23, 2020

alamb approved these changes Dec 23, 2020

View reviewed changes

jorgecarleitao closed this in 1ecef42 Dec 24, 2020

asfimport mentioned this pull request Feb 24, 2021

[Rust] Intermittent build failure in parquet encoding #26870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993

ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993

Uh oh!

GregBowyer commented Dec 23, 2020

Uh oh!

github-actions bot commented Dec 23, 2020

Uh oh!

codecov-io commented Dec 23, 2020

Uh oh!

Dandandan commented Dec 23, 2020

Uh oh!

alamb left a comment

Uh oh!

GregBowyer commented Dec 23, 2020

Uh oh!

GregBowyer commented Dec 23, 2020

Uh oh!

alamb commented Dec 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993

ARROW-10943: [Rust][Parquet] Always init new RleDecoder #8993

Uh oh!

Conversation

GregBowyer commented Dec 23, 2020

Uh oh!

github-actions bot commented Dec 23, 2020

Uh oh!

codecov-io commented Dec 23, 2020

Codecov Report

Uh oh!

Dandandan commented Dec 23, 2020

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

GregBowyer commented Dec 23, 2020

Uh oh!

GregBowyer commented Dec 23, 2020

Uh oh!

alamb commented Dec 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants