Delete leaves dangling entities, allows double-upserts #2148

aphyr · 2018-02-21T02:16:45Z

On the current Dgraph nightly (2018/02/20), in a healthy five-node cluster, without any faults, concurrently upserting, deleting, and reading records by a single, indexed integer attribute key can result in queries for key returning long-lasting records with only a UID and no key attached; as if the UID were still present in that key's index, even though the triple has been deleted. Moreover, reads may actually see one of these dangling deleted records concurrently with a new record for that key. Moreover, reads may observe two complete records, even though one has been deleted.

This violates snapshot isolation, since reads are clearly not seeing a consistent point-in-time representation of state. It breaks internal consistency, since you can query for a record with eq(key, 4) and get back a record which doesn't have a key at all. Since a single process can actually see deleted data, it's definitely not sequential.

You can reproduce this on Jepsen 282623ca5b733097b9a524f70a451ca40151310e by running

lein run test -w delete -f --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --time-limit 120 --concurrency 100`.

For future users: you'll need 1.0.3 or 1.0.4, since this is on the nightlies and I don't think those have stable URLs. Test code is here.

The schema for this test is key: int @index(int) .. We read keys by querying

{ q(func: eq(key, $key)) {
  uid
  key
}}

... delete by querying for just the UID and issuing a delete for every attribute with {uid: 0x123}, and upsert by reading for an existing record with that key, and if none exists, inserting a new one. All operations occur within a transaction.

In this run, for key 0, an entity 0x298 already exists for the key, but is deleted by process 2. Processes 0, 1, 6, and 8 confirm the deletion of that entity: they find no matches for that query. Process 2 upserts a new record, which is visible on reads by process 5 and 6: {:uid "0x2ab", :key 0}. Concurrently with that upsert, process 4 deletes the key. Concurrent reads return dangling records: {:uid "0x2ab"} with no key. Other processes continue trying to delete key 0; their queries observe uid 0x2ab and issue a delete for it, but apparently with no effect. This dangling record, 0x2ab, persists for the remainder of the test--just over two minutes.

In this diagram, time flows down, each process is a vertical track, operations are colored bars. Blue means success, pink is a known failed operation; e.g. it did not occur.

Moreover, queries may return multiple dangling records:

[{:uid "0xcf"} {:uid "0x110"}]

... a dangling record and a complete record:

[{:uid "0xcf"} {:uid "0x177", :key 1}]

... or multiple complete records for the same key:

[{:uid "0x148b"} {:uid "0x14f6", :key 13}]
[{:uid "0x148b"} {:uid "0x150c", :key 13}]
[{:uid "0x148b", :key 13} {:uid "0x150c", :key 13}]

For instance, in this run, shortly after an upsert, process 33 manages to observe two complete, coexisting records for the same key: 0x25fb, and 0x26e0. Note that 0x25fb was already deleted, which means it went from dangling to non-dangling. Note also that in order for the upsert to take place, it must have failed to observe 0x25fb, or it wouldn't have inserted anything. Something really odd is going on here.

Even weirder, in 20180220T205917.000-0600.zip, 0x7e7 is a dangling record which process 36 can see and apparently successfully delete (although the deletion doesn't actually delete the dangling record). However, right after reading 0x7e7, process 36 attempts a delete and finds nothing to delete. Then 0x7e7 goes right back to being a dangling pointer and is visible to process 36 again--this time as a full record, in conjunction with the newly upserted 0x808. 0x808 turns out to be deletable. 0x707 goes back to being a full record, then a dangling record after that.

The text was updated successfully, but these errors were encountered:

janardhan1993 · 2018-03-13T15:31:41Z

1.#2220 Should fix some of the cases where this test was failing.(Due to issues with SP*)

janardhan1993 · 2018-03-27T09:49:40Z

The test is passing now with 'https://transfer.sh/13hS3v/dgraph-linux-amd64.tar.gz', Fixed by #2261

janardhan1993 · 2018-04-04T08:55:43Z

This passes now with server_side_sequencing hardcoded in server.
@aphyr Need to add @upsert directive for this test as well.
We also need to use latest client and set sequencing mode after this PR (https://github.com/dgraph-io/dgraph4j/pull/54/files) is merged.

aphyr · 2018-04-04T15:36:10Z

I'm going to build the new client and give that a shot. :)

manishrjain added the kind/bug Something is broken. label Feb 21, 2018

janardhan1993 self-assigned this Mar 7, 2018

manishrjain added the kind/bug Something is broken. label Mar 21, 2018

manishrjain added this to the Sprint-000 milestone Mar 28, 2018

manishrjain added the backlog label Mar 29, 2018

manishrjain modified the milestones: Sprint-000, Sprint-001 Apr 2, 2018

janardhan1993 mentioned this issue Apr 3, 2018

add linerizability #2287

Merged

janardhan1993 closed this as completed in #2287 Apr 4, 2018

ghost removed the backlog label Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete leaves dangling entities, allows double-upserts #2148

Delete leaves dangling entities, allows double-upserts #2148

aphyr commented Feb 21, 2018 •

edited by ghost

Loading

janardhan1993 commented Mar 13, 2018 •

edited

Loading

janardhan1993 commented Mar 27, 2018

janardhan1993 commented Apr 4, 2018

aphyr commented Apr 4, 2018

Delete leaves dangling entities, allows double-upserts #2148

Delete leaves dangling entities, allows double-upserts #2148

Comments

aphyr commented Feb 21, 2018 • edited by ghost Loading

janardhan1993 commented Mar 13, 2018 • edited Loading

janardhan1993 commented Mar 27, 2018

janardhan1993 commented Apr 4, 2018

aphyr commented Apr 4, 2018

aphyr commented Feb 21, 2018 •

edited by ghost

Loading

janardhan1993 commented Mar 13, 2018 •

edited

Loading