-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan: delete corrupted chunks #67
base: master
Are you sure you want to change the base?
Conversation
Before:
for 5 minutes After:
The startup still took 5+ minutes, but the next startup was super fast :D |
hey @edsiper, is there a chance you can take a look at this PR soon/propose an alternative way to address this bug? |
I am thinking about what would be the ideal use case. Note that a corrupted chunk can have "many" valid records, what we do in Fluent Bit is process all valid chunks that exist in a corrupted chunk. If we just skip that functionality we will be in a worse situation... |
so I spent some time looking through this today, I believe this isn't correct, as far as I can tell the execution flow seems to be the following: On startup, fluent bit calls
If As a result, my belief is that we actually never ever load these files, since the only way we scan them is through I'm wondering if there's something I'm missing here, could you potentially link me to some code? |
which line is that ?, what conditions would trigger that scenario ? |
I'm referring to these lines: In cio_file.c,
|
When we scan for chunks on startup, we load the list into memory, but exclude all chunks which are ultimately corrupted. This means that the library consumers are never actually informed about the corrupted chunks, and as a result, can't act on them to delete them.
This just deletes corrupted chunks on scan, but I'm interested in if we want to place this behind a config flag (and enable it in fluent bit), or want to do something more.
This does fix our immediate issue with 5 minutes of corrupted chunks on startup though :)