Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible data loss -- fsync parent directories #6368

Closed
ramanala opened this issue Sep 7, 2016 · 4 comments
Closed

Possible data loss -- fsync parent directories #6368

ramanala opened this issue Sep 7, 2016 · 4 comments
Labels
Milestone

Comments

@ramanala
Copy link

ramanala commented Sep 7, 2016

I am running a three node etcd cluster. When I insert a new key value pair into the store, I see the following sequence of system calls on the server.

1 creat("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal.tmp")
2 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal.tmp")
3 fdatasync("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal.tmp")
4 rename("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal.tmp", "/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
5 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
6 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
7 fdatasync("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
8 append("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
9 fdatasync("/data/etcd/infra0.etcd/member/wal/0000000000000001-000000000000001b.wal")
==========Data is durable here -- Ack the user====================

For the 4th operation (rename of wal.tmp to wal) to be persisted to disk, the parent directory has to be fsync'd. If not, a crash just after acknowledging the user can result in a data loss. Specifically, the rename can be reordered on some file systems, thus not issued immediately by the fs. In such a case, on recovery, the server would see the file wal.tmp but not wal. On seeing this, I believe etcd just unlinks the tmp file and therefore can lose the user data. If this happens on two nodes on a three node cluster, then a global data loss is possible. We have reproduced this particular data loss issue using our testing framework. As a fix, it would be safe to fsync the parent directory on creat or rename of files.

@xiang90 xiang90 added this to the v3.1.0 milestone Sep 7, 2016
@xiang90
Copy link
Contributor

xiang90 commented Sep 7, 2016

@heyitsanthony This seems like a bug. I marked this as P2 since this is unlikely to happen in practice. But we should fix this soon. @ramanala Can you say more about your testing framework? We are VERY interested in auto testing to ensure etcd works reliably.

@ramanala
Copy link
Author

ramanala commented Sep 7, 2016

@xiang90 -- we have a testing framework that can test distributed storage systems for problems like data loss, unavailability etc in the presence of correlated crashes (i.e., all servers crashing together and in some servers, there are problems wrt FS like the above mentioned.). We have couple more issues in etcd where the cluster can become unavailable. I will file those issues in sometime.

Thanks for your interest! This is a research tool and the related paper will be publicly available in OSDI this year. We will also make the tool publicly available. I will update you with more information about the tool in a few days.

@heyitsanthony heyitsanthony self-assigned this Sep 7, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 7, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 7, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Sep 8, 2016
@ramanala
Copy link
Author

ramanala commented Sep 8, 2016

Thank you @heyitsanthony and @xiang90 !

gyuho pushed a commit that referenced this issue Sep 9, 2016
LK4D4 added a commit to LK4D4/swarmkit that referenced this issue Sep 29, 2016
This fixes bug with wal handling etcd-io/etcd#6368

Signed-off-by: Alexander Morozov <[email protected]>
niau pushed a commit to vasil-yordanov/swarmkit that referenced this issue Oct 21, 2016
This fixes bug with wal handling etcd-io/etcd#6368

Signed-off-by: Alexander Morozov <[email protected]>
ashish-goswami pushed a commit to hypermodeinc/dgraph that referenced this issue Sep 19, 2019
#3960)

In #3959 , bulk loader crashes when trying to move a directory into itself with a new name
/dgraph/tmp/shards/shard_0
/dgraph/tmp/shards/shard_0/shard_0

The bulk loader logic is

the mapper produce output as
.../tmp/shards/000
.../tmp/shards/001

read the list of shards under .../tmp/shards/

create the reducer shards as
.../tmp/shards/shard_0
.../tmp/shards/shard_1

move the list read in step 2 into the reducer shards created in step 3

Though I cannot reproduce the problem, but it seems creating of the reducer shard directory .../tmp/shards/shard_0 and listing all the mapper shards in step 2 are re-ordered. Something similar is mentioned in etcd-io/etcd#6368

This PR avoids such possibilities by putting the mapper output into an independent directory
../tmp/map_output, so that the program works correctly even if the reordering happens.
@utkarshmani1997
Copy link

utkarshmani1997 commented Apr 2, 2020

Hi @ramanala i am interested in knowing more about your testing framework. Is it available publicly now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants