Re-add old kv679 test and add new test for further dataloss edge case #1296

russelldb · 2017-03-21T12:28:41Z

kv679_dataloss_fb.erl test was part of the original kv679 suite, but at the time of 2.1
it was put in it's own branch. See #719 for discussion and decision. In short, Riak Test is made up of passing tests,
and kv679_dataloss_fb.erl still failed. Adding this back to a
branch off develop as I know have a fix of this issue.

There is also a new test, kv679_dataloss_fb2.erl, that shows another dataloss edge case in riak, even in 2.1 with
per-key-actor-epochs enabled. The test is a little convoluted, and is
based on a quickcheck counter example, included in the riak_kv/test
directory. In short, both this test, and the other kv679_dataloss_fb
test, show that even with multiple replicas acking/storing, a single
disk error on a single replica is enough to cause acked writes to be
silently and permanently lost. For a replicated database, that is bad

This test was part of the original kv679 suite, but at the time of 2.1 it was put in it's own branch. Riak Test is made up of passing tests, and this test (kv679_dataloss_fb.erl) still fails. Adding this back to a branch off develop in preparation for a fix of this issue.

This test shows a dataloss edge case in riak, even in 2.1 with per-key-actor-epochs enabled. The test is a litte convoluted, and is based on a quickcheck counter example, included in the riak_kv/test directory. In short, both this test, and the other kv679_dataloss_fb test, show that even with multiple replicas acking/storing, a single disk error on a single replica is enough to cause acked writes to be silently and permanently lost. For a replicated database, that is bad.

russelldb · 2017-03-21T12:29:45Z

I'm going to be opening the PR against riak_kv that provides a fix for both these tests in a short while, and will edit to provide link when I do. In the meantime, feel free to run these against 2.1 and up.

nickelization · 2017-03-27T16:42:05Z

Both tests make sense, and I confirmed they both fail without basho/riak_kv#1643 and pass with it. Still reviewing the riak_kv PR, but the test PR looks good to me. Thanks, Russell!

nickelization · 2017-03-27T16:42:09Z

+1

The partition repair test deletes all the data at a partition, and then repairs it from neighbouring partitions. The subset of repaired data that was originally coordinated by the deleted partition's vnode showed up as `notfound` since the latest kv679 changes here basho/riak_kv#1643. The reason is that the fix in the KV repo adds a new actor to the repaired key's vclock. Prior to this change `verify` in partition_repair.erl did a simple equality check on the binary encoded riak_object values. This change takes into account that a new actor may be in the vclock at the repaired vnode, and uses a semantic equality check based on riak_object merge and riak object equal.

JeetKunDoug · 2017-04-05T19:18:40Z

+1 - merging to fix partition_repair before next Giddyup run. Still need to add the two new tests to the Giddyup database.

russelldb added 2 commits March 7, 2017 15:08

russelldb mentioned this pull request Mar 21, 2017

Fix for outstanding riak_kv/679 dataloss cases basho/riak_kv#1643

Merged

JeetKunDoug merged commit 8170137 into basho:develop Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-add old kv679 test and add new test for further dataloss edge case #1296

Re-add old kv679 test and add new test for further dataloss edge case #1296

Uh oh!

russelldb commented Mar 21, 2017

Uh oh!

russelldb commented Mar 21, 2017

Uh oh!

nickelization commented Mar 27, 2017 •

edited

Loading

Uh oh!

nickelization commented Mar 27, 2017

Uh oh!

JeetKunDoug commented Apr 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-add old kv679 test and add new test for further dataloss edge case #1296

Re-add old kv679 test and add new test for further dataloss edge case #1296

Uh oh!

Conversation

russelldb commented Mar 21, 2017

Uh oh!

russelldb commented Mar 21, 2017

Uh oh!

nickelization commented Mar 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickelization commented Mar 27, 2017

Uh oh!

JeetKunDoug commented Apr 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nickelization commented Mar 27, 2017 •

edited

Loading