fix: Fix bugs when loki.source.file uses non-UTF-8 encoding#5210
Conversation
loki.source.file uses non-UTF-8 encodingloki.source.file uses non-UTF-8 encoding
1c5f38e to
64addc7
Compare
E.g. for UTF-16, sometimes spaces would appear between characters due to encoding settings being lost after reopening a file. Also, gauging whether to reopen a file wasn't working well - files were often reopened unnecessarily. That's because UTF-16 has a wider character set and it would confuse the tailer which thought that the file has shrunk, and therefore the tailer thought it must have rotated.
64addc7 to
b30cf77
Compare
blewis12
left a comment
There was a problem hiding this comment.
Thank you for looking into this 🙇
| if err != nil { | ||
| return err | ||
| } | ||
| fileSize := fi.Size() |
There was a problem hiding this comment.
I am not sure this is correct. We used offset because we are interested to get events from where we last read.
If new data has come in between us trying to read a line and calling wait we would miss that.
There was a problem hiding this comment.
Thank you, I think you're right 🤔 I'll see if I can get offset() to only use raw bytes.
| level.Debug(f.logger).Log("msg", "file modified") | ||
| if partial { | ||
| // We need to reset to last successful offset because we consumed a partial line. | ||
| // TODO: Return error from Seek()? |
There was a problem hiding this comment.
It's fine to return an error here!
|
server_log: loki: Hi guys, i'm not sure whether this fix will solve the current issue that i face? server log file encoding utf-16le bom. version 1.12.0 |
@williamhowf It's not fixing the issue. I am working on a alternative solution |
|
Closing this in favour of #5259 |
Pull Request Details
E.g. for UTF-16, sometimes spaces would appear between characters due to encoding settings being lost after reopening a file.
Also, gauging whether to reopen a file wasn't working well - files were often reopened unnecessarily. That's because UTF-16 has a wider character set and it would confuse the tailer which thought that the file has shrunk, and therefore the tailer thought it must have rotated.
PR Checklist