Skip to content

fix: Fix bugs when loki.source.file uses non-UTF-8 encoding#5210

Closed
ptodev wants to merge 1 commit into
mainfrom
ptodev/encoding-space
Closed

fix: Fix bugs when loki.source.file uses non-UTF-8 encoding#5210
ptodev wants to merge 1 commit into
mainfrom
ptodev/encoding-space

Conversation

@ptodev
Copy link
Copy Markdown
Contributor

@ptodev ptodev commented Jan 8, 2026

Pull Request Details

E.g. for UTF-16, sometimes spaces would appear between characters due to encoding settings being lost after reopening a file.

Also, gauging whether to reopen a file wasn't working well - files were often reopened unnecessarily. That's because UTF-16 has a wider character set and it would confuse the tailer which thought that the file has shrunk, and therefore the tailer thought it must have rotated.

PR Checklist

  • Documentation added
  • Tests updated
  • Config converters updated

@ptodev ptodev requested a review from a team as a code owner January 8, 2026 13:43
@ptodev ptodev changed the title fix: Fx bugs when loki.source.file uses non-UTF-8 encoding fix: Fix bugs when loki.source.file uses non-UTF-8 encoding Jan 8, 2026
@ptodev ptodev force-pushed the ptodev/encoding-space branch from 1c5f38e to 64addc7 Compare January 8, 2026 14:11
E.g. for UTF-16, sometimes spaces would appear between characters due to encoding settings being lost after reopening a file.

Also, gauging whether to reopen a file wasn't working well - files were often reopened unnecessarily.
That's because UTF-16 has a wider character set and it would confuse the tailer which thought that the file has shrunk, and therefore the tailer thought it must have rotated.
@ptodev ptodev force-pushed the ptodev/encoding-space branch from 64addc7 to b30cf77 Compare January 8, 2026 15:50
Comment thread internal/component/loki/source/file/internal/tail/file.go
Comment thread internal/component/loki/source/file/internal/tail/file.go
@ptodev ptodev requested a review from blewis12 January 9, 2026 13:21
Copy link
Copy Markdown
Member

@blewis12 blewis12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this 🙇

if err != nil {
return err
}
fileSize := fi.Size()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is correct. We used offset because we are interested to get events from where we last read.

If new data has come in between us trying to read a line and calling wait we would miss that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I think you're right 🤔 I'll see if I can get offset() to only use raw bytes.

level.Debug(f.logger).Log("msg", "file modified")
if partial {
// We need to reset to last successful offset because we consumed a partial line.
// TODO: Return error from Seek()?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to return an error here!

@williamhowf
Copy link
Copy Markdown

williamhowf commented Jan 13, 2026

server_log:
‘2000’: user ‘743004’ updated: Accounts - 743004 - Company: ‘Biên Hòa122’ -> ‘Biên Hòa 1’

loki:
‘2000’: user ‘743004’ updated: Accounts - 743004 - Company: ‘Bin Ha122’ -> ‘Bin Ha 1’

Hi guys, i'm not sure whether this fix will solve the current issue that i face? server log file encoding utf-16le bom.

version 1.12.0

@kalleep
Copy link
Copy Markdown
Contributor

kalleep commented Jan 14, 2026

server_log: ‘2000’: user ‘743004’ updated: Accounts - 743004 - Company: ‘Biên Hòa122’ -> ‘Biên Hòa 1’

loki: ‘2000’: user ‘743004’ updated: Accounts - 743004 - Company: ‘Bin Ha122’ -> ‘Bin Ha 1’

Hi guys, i'm not sure whether this fix will solve the current issue that i face? server log file encoding utf-16le bom.

version 1.12.0

@williamhowf It's not fixing the issue. I am working on a alternative solution

@ptodev
Copy link
Copy Markdown
Contributor Author

ptodev commented Jan 14, 2026

Closing this in favour of #5259

@ptodev ptodev closed this Jan 14, 2026
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jan 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants