Node crashes can lead to the loss of inserted triples #2290

aphyr · 2018-03-31T07:32:43Z

In a five node cluster, with a mix of crashes and restarts of both zero and alpha nodes, it appears that the crash of multiple Alpha nodes can cause the loss of successfully inserted documents just prior to the crash. Moreover, nodes can disagree on the records that should be returned for a query for all records of a given type, and some nodes can have objects which match the type predicate, but have a missing value for the value field. This is on version

Dgraph version   : v1.0.4
Commit SHA-1     : 224b560
Commit timestamp : 2018-03-27 16:56:17 +0530
Branch           : fix/jepsen_delete

Our schema is

type: string @index(exact) .
value: int @index(int) .

We perform single mutations in transactions, inserting objects with type "element" and a unique integer value. In this test run, we lose three acknowledged inserts when Alpha crashes: 20180331T020118.000-0500.zip

From Jepsen's log just prior to the crash (read "process 8 invoked an add of element 641"...):

8	:invoke	:add	641
2	:ok	:add	632
0	:ok	:add	635
6	:ok	:add	636
2	:invoke	:add	642
7	:ok	:add	634
5	:ok	:add	638
4	:ok	:add	637
3	:invoke	:add	643
6	:invoke	:add	644
1	:ok	:add	639
7	:invoke	:add	645
0	:invoke	:add	646
5	:invoke	:add	647
4	:invoke	:add	648
8	:ok	:add	641
7	:fail	:add	645	:unavailable
7	:invoke	:add	649
1	:invoke	:add	650
3	:ok	:add	643
6	:ok	:add	644
8	:invoke	:add	651
3	:invoke	:add	652
6	:invoke	:add	653
2	:fail	:add	642	:unavailable
:nemesis	:info	:kill-alpha	{"n3" :stopped, "n5" :stopped, "n1" :stopped, "n2" :stopped}
...
336	:ok	:read	(nil nil nil nil nil 0 1 2 3 ... 633 634 635 636 637 638 639 1620 1703 ...)
222	:ok	:read	(0 1 2 3 ... 635 636 637 638 639 1620 1703 ...)
331	:ok	:read	(nil nil nil nil nil 0 1 2 3  ... 633 634 635 636 637 638 639 1620 1703 )
70	:ok	:read	(0 1 2 3 ... 637 638 639 1620 1703 ...)
338	:ok	:read	(0 1 2 3 ... 635 636 637 638 639 1620 1703 ...)
333	:ok	:read	(0 1 2 3 ... 635 636 637 638 639 1620 1703 ...)
...
 {:valid? false,
  :lost "#{641 643..644}",
  :recovered "#{2404..2408}",
  :ok
  "#{0..639 1620 1703 1728..1737 1739..1751 1753..1760 1762..1769 1771..1787 1789..1828 1830..2408 3618 3625 3631..3638 3640..3655 3657..3664 3666..3675 3678..3683 3685..3698 3700..3707 3709..3713 3716..3744 3747 3749..3768 3770..3791 3793..3808 3810 3812..3847 3849..3857 3859..3880 3882..3896 3898..3907}",
  :recovered-frac 1/1008,
  :unexpected-frac 0,
  :unexpected "#{}",
  :lost-frac 1/1680,
  :ok-frac 5/16}

Note that there are three missing inserts (641, 643, 644), which occurred just prior to the crash of n1, n2, n3, and n5. There are also five nil values that matched our query for all records matching eq(type "element"), which suggests that perhaps UIDs were created for these objects (and perhaps two others which timed out or failed?), but those UIDs weren't linked to the :value fields correctly?

Note that process numbers (the first column) are striped over nodes, so processes 0, 5, 10, ... talk to node n1. From this we can infer also odd that those value-less objects are present on one node (n2), whereas n1, n3,and n4 don't show them! n5 did not respond to either of 2 clients for its final read.

You can reproduce this with Jepsen fd875fa9f3c248f6715e1056a504392bfa3963d3 by running:

lein run test --package-url https://transfer.sh/10WC2g/dgraph-linux-amd64.tar.gz --force-download -w set --time-limit 300 --concurrency 10 --test-count 10

The text was updated successfully, but these errors were encountered:

janardhan1993 · 2018-04-03T11:38:56Z

@aphyr Is the client client being used after restart here ?

aphyr · 2018-04-03T12:49:11Z

Yes, we continue to use the same clients through restart, at least until the client gives us an indeterminate error, at which point we generate a new one.

janardhan1993 · 2018-04-09T03:04:13Z

This can happen when using client side sequential mode. If server crashes before the data is persisted to disk, on restart a new client can read the data before the WAL is replayed.

janardhan1993 · 2018-04-09T14:39:46Z

It's not a loss but you would see it eventually with client side sequential mode. If we use server side sequential mode then the read would wait till everything is replayed.

aphyr · 2018-04-09T14:40:39Z

Ahh, okay. Do you know how long you'd have to wait for recovery to complete?

janardhan1993 · 2018-04-09T14:46:32Z

Depends on when snapshot happened and how many entries needs to be replayed.

aphyr · 2018-04-09T16:18:39Z

It looks it might be hundreds of times longer than that?

In this test I allow two minutes for the first recovery, but some records are still missing until a thousand seconds later, when other nodes finally come back online. And there's still that weird issue with returning records for an index query that don't have any associated value--that feels like it's definitely a consistency violation. Is that expected behavior?

janardhan1993 · 2018-04-09T16:24:31Z

@aphyr It's not expected behaviour. Are you using latest nightly build ?

aphyr · 2018-04-09T16:25:59Z

I'm looking at 1.0.4--the nightlies were deadlocking in this test on Friday, but that may be fixed now!

janardhan1993 · 2018-04-09T16:28:50Z

Server side sequential mode is not there in 1.0.4. Can you please try with this build.
https://transfer.sh/Z5CTJ/dgraph-linux-amd64.tar.gz

It has the fix for deadlocking.

aphyr · 2018-04-09T17:26:22Z

So, uh, Z5CTJ with server-side sequencing did this incredibly interesting thing:
20180409T115148.000-0500.zip

After a series of alpha and zero crashes and restarts, it took about 400 seconds to recover, then failed to return all but one of the 29662 acknowledged inserts. Instead, it gave a sequence of a whole boatload of value-less responses (that's the nils in the final reads), followed by the most recently inserted element.

190 :ok :read (nil nil nil nil ... nil nil nil 133143)

You can reproduce this with Jepsen 0ef6e711dfb07aad4afc84f7f9c3348961afa9d7:

lein run test --package-url https://transfer.sh/Z5CTJ/dgraph-linux-amd64.tar.gz --time-limit 300 --concurrency 100 --nemesis kill-alpha,fix-alpha,kill-zero --test-count 20 --workload set --final-recovery-time 200

Since we only insert records with a numeric :value, we should never be able to observe a record without them--yet here dgraph's doing that for tens of thousands of records. Something's wonky here...

aphyr · 2018-04-09T20:25:41Z

To make sure this wasn't something to do with the recovery process itself--like maybe servers could only expose these results immediately after coming back online--I ran another test with a 2000-second pause before we perform any read queries, and it exhibited the same losing-all-but-one-write behavior:

 {:valid? false,
  :lost
  "#{0..16982 16984..16987 16989..16999 17001 ... 63839..63840 63844}",
  :recovered "#{63845}",
  :ok "#{63845}",
  :recovered-frac 1/65624,
  :unexpected-frac 1/65624,
  :unexpected "#{nil}",
  :lost-frac 39115/65624,
  :ok-frac 1/65624},
 :valid? false}

janardhan1993 · 2018-04-16T06:47:01Z

Test no longer fails with
lein run test --package-url https://transfer.sh/mOBTj/dgraph-linux-amd64.tar.gz --time-limit 300 --concurrency 10 --nemesis kill-alpha,fix-alpha,kill-zero --test-count 20 --workload set

manishrjain added the kind/bug Something is broken. label Mar 31, 2018

manishrjain added this to the Sprint-001 milestone Mar 31, 2018

janardhan1993 self-assigned this Apr 3, 2018

janardhan1993 closed this as completed Apr 9, 2018

janardhan1993 reopened this Apr 9, 2018

aphyr mentioned this issue Apr 10, 2018

SI violation with node crashes or predicate moves in bank tests #2321

Closed

janardhan1993 closed this as completed Apr 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node crashes can lead to the loss of inserted triples #2290

Node crashes can lead to the loss of inserted triples #2290

aphyr commented Mar 31, 2018 •

edited

Loading

janardhan1993 commented Apr 3, 2018

aphyr commented Apr 3, 2018

janardhan1993 commented Apr 9, 2018 •

edited

Loading

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 9, 2018 •

edited

Loading

aphyr commented Apr 9, 2018 •

edited

Loading

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018 •

edited

Loading

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 16, 2018 •

edited

Loading

Node crashes can lead to the loss of inserted triples #2290

Node crashes can lead to the loss of inserted triples #2290

Comments

aphyr commented Mar 31, 2018 • edited Loading

janardhan1993 commented Apr 3, 2018

aphyr commented Apr 3, 2018

janardhan1993 commented Apr 9, 2018 • edited Loading

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 9, 2018 • edited Loading

aphyr commented Apr 9, 2018 • edited Loading

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 9, 2018

aphyr commented Apr 9, 2018 • edited Loading

aphyr commented Apr 9, 2018

janardhan1993 commented Apr 16, 2018 • edited Loading

aphyr commented Mar 31, 2018 •

edited

Loading

janardhan1993 commented Apr 9, 2018 •

edited

Loading

janardhan1993 commented Apr 9, 2018 •

edited

Loading

aphyr commented Apr 9, 2018 •

edited

Loading

aphyr commented Apr 9, 2018 •

edited

Loading

janardhan1993 commented Apr 16, 2018 •

edited

Loading