Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt winlogbeat.yml checkpoint file #2313

Closed
mtmcgrew opened this issue Aug 19, 2016 · 5 comments
Closed

Corrupt winlogbeat.yml checkpoint file #2313

mtmcgrew opened this issue Aug 19, 2016 · 5 comments

Comments

@mtmcgrew
Copy link

mtmcgrew commented Aug 19, 2016

For confirmed bugs, please report:

  • Version: 5.0.0-alpha4
  • Operating System: Windows 8
  • Steps to Reproduce: Unknown

I'm using 5.0.0-alpha4 and I noticed that on some users the service was not able to start up. The following error was in the log file:

2016-08-18T18:22:56-07:00 CRIT Exiting: yaml: control characters are not allowed

I noticed that the C:\ProgramData\winlogbeat\winlogbeat.yml file was blank with all zeroes.

# xxd winlogbeat.yml
0000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

This is affecting tens of hosts out of a few hundred. Original forum post here https://discuss.elastic.co/t/corrupt-winlogbeat-yml-checkpoint-file/58417

@svmastersamurai
Copy link

👍
I have noticed this as well. It originally started with a handful of hosts but lately it seems to be spreading to more of them over time.

To remediate I just delete the file to make the service come back up, but eventually the hosts will revert back to this state. Can also confirm that this effects Windows 10 as well.

@andrewkroh
Copy link
Member

I was able to reproduce this by powering off a Windows 2012 VM running in VirtualBox. It only occurred while I had lots of events being read, which causes the registry to be updated more often.

I also noticed that my log file exhibited similar behavior and was full of 0's at the end.

@andrewkroh
Copy link
Member

andrewkroh commented Aug 31, 2016

After some brief investigation, I think the problem is caused by the file cache in Windows. The file cache does lazy writes unless specifically configured to write-through to the disk. So I think the problem is occurring when we lose power and the cache hasn't been flushed.

So when we create the file we need to use the FILE_FLAG_WRITE_THROUGH flag, but Go doesn't expose the flag so we'll have to do our own syscall.

File Caching in Windows
StackOverflow - cause of corrupted file contents

@andrewkroh
Copy link
Member

I opened PR #2434 for 5.X to add the FILE_FLAG_WRITE_THROUGH. I think this should address the problem, but it's hard to say with 100% confidence. Hopefully once it's merged and released you guys can test it on your fleet of machines and provide feedback on whether the problem has been resolved.

@ruflin
Copy link
Member

ruflin commented Sep 2, 2016

Closing this as #2434 was merged.

@ruflin ruflin closed this as completed Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants