Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial WAL entry set causes persistent failure #168

Open
ewencp opened this issue Jan 31, 2017 · 1 comment
Open

Partial WAL entry set causes persistent failure #168

ewencp opened this issue Jan 31, 2017 · 1 comment

Comments

@ewencp
Copy link
Contributor

ewencp commented Jan 31, 2017

See https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/confluent-platform/cHeTBYAZBH8/qgrKXjAmCwAJ for a description of the issue.

If a node dies having only written 5 of the 10 entries in a WAL commit block for files, recovery just fails persistently. We have the correct behavior in the sense that we do not try to apply the WAL entries until we hit the endMarker for a commit. But the FSWAL.apply() code just bails if it hits EOF (or any IOException). Instead of just bailing, it should recover by ignoring the bad set of entries, giving up on committing data, truncate the log, and proceed from the last known safe point based on offsets seen in files in HDFS.

@lomignet
Copy link

Wouldn't this commit Eneco@f038915 help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants