Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows_eventlog2 invalid/corrupt output #79

Open
sutyak opened this issue Aug 2, 2021 · 6 comments
Open

windows_eventlog2 invalid/corrupt output #79

sutyak opened this issue Aug 2, 2021 · 6 comments

Comments

@sutyak
Copy link

sutyak commented Aug 2, 2021

Describe the bug

Possible buffer overflow? Original issue posted on the Fluentd Google Group showed there were unexpected CJK characters in event logs. Upon further investigation these are not CJK characters, but rather botched unicode bytes appended to the original text after the "end of text" character. This leads me to believe the windows_eventlog2 plugin may be reading past the desired bytes in memory and grabbing extra data.

Here is how it looks:
"Description":"The resource loader failed to find MUI file.
㐳㈸‧獉畃牲湥㵴琧畲❥㸯਍⼼潂歯慭歲楌瑳>>䏐涔倀者䈼潯浫牡䱫獩㹴਍†䈼潯浫牡桃湡敮㵬洧捩潲潳瑦眭湩潤獷欭牥敮⵬湰⽰潣普杩牵瑡潩❮删捥牯䥤㵤㈧㐱✱䤠䍳牵敲瑮✽牴敵⼧ാ㰊䈯潯浫牡䱫獩㹴㸀",

To Reproduce

Configure Fluentd to read all event logs with "read_all_channels true". This does not occur on the top-level Application, System, and Security logs. Coonfigure the match to dump all output to a local json file for convenience.
In the below code I had already narrowed it down to the wer-payloadhealth log, but that may not be consistent on every system, which is why I recommended using "read_all_channels true".

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels "microsoft-windows-wer-payloadhealth/operational"
  preserve_qualifiers_on_hash true
  read_existing_events
  read_interval 10
  tag winevt.raw2
  render_as_xml false
  rate_limit 5000
  <storage>
    @type local
    persistent true
    path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
  </storage>

</source>

<match winevt.raw2>
  @type file
  path "C:/Temp/${tag}.%Y%m%d%H%M"
  path_suffix ".json"
  append true
  <format>
    @type json
  </format>
  <buffer tag,time>
    timekey 1m
    timekey_use_utc true
    timekey_wait 1m
    chunk_limit_size 500MB
	flush_thread_count 2
  </buffer>
</match>

Expected behavior

The output json file will contain numerous Description elements with what appears to be CJK text. Many if not all will be associated with the what should be an empty Description.
The corresponding Description in Windows will likely be "The Description for event ID xx .... cannot be found."

Grab a Description text from the log and run it through a converter, such as the C# below:

string originalString = "paste string here";

foreach (char c in originalString)
{
byte[] utf8Bytes = Encoding.Unicode.GetBytes(c.ToString());
Console.WriteLine($"{(int)c} - { Encoding.UTF8.GetString(utf8Bytes)}");
}

EDIT the 3 printed below is actually "13" (carriage return).
Something that stands out is the "3", which is the "end of text" character. For now I can add a check to that in my code to signify what is valid text.
You can see after it gets past "10", the line feed character, everything goes a bit wonky.

Here is a snippet of the output:
Colums are Integer value - character

77 - M
85 - U
73 - I
32 -
102 - f
105 - i
108 - l
101 - e
46 - .
3 -
10 -

13363 - 34
12856 - 82
8231 - '
29513 - Is
30019 - Cu
29298 - rr
28261 - en
15732 - t=
29735 - 't
30066 - ru
10085 - e'
15919 - />
2573 -

Your Environment

- Fluentd version: 1.11.1 and 1.12.3
- TD Agent version: 3.8.1 and 4.1.1
- Operating system: Windows Server 2019 and Windows 10 Pro
- Kernel version:

Your Configuration

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels "microsoft-windows-wer-payloadhealth/operational"
  preserve_qualifiers_on_hash true
  read_existing_events
  read_interval 10
  tag winevt.raw2
  render_as_xml false
  rate_limit 5000
  <storage>
    @type local
    persistent true
    path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
  </storage>

</source>

<match winevt.raw2>
  @type file
  path "C:/Temp/${tag}.%Y%m%d%H%M"
  path_suffix ".json"
  append true
  <format>
    @type json
  </format>
  <buffer tag,time>
    timekey 1m
    timekey_use_utc true
    timekey_wait 1m
    chunk_limit_size 500MB
	flush_thread_count 2
  </buffer>
</match>

Your Error Log

No errors.

Additional context

No response

@ashie ashie transferred this issue from fluent/fluentd Aug 3, 2021
@ashie
Copy link
Member

ashie commented Aug 3, 2021

It's winevt_c or fluent-plugin-windows-evnetlog's issue.
I've transfered this issue to fluent-plugin-windows-eventlog (may be forwarded to winevt_c later).

@kenhys
Copy link
Contributor

kenhys commented Aug 3, 2021

I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters from_encoding.

@ashie
Copy link
Member

ashie commented Aug 3, 2021

I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters from_encoding.

I'm not sure but I don't think so.
I think the tailing line is cut off by ETX (0x03) at converting to UTF-8 or converting to Ruby string.

@ashie
Copy link
Member

ashie commented Aug 3, 2021

https://github.com/fluent-plugins-nursery/winevt_c/blob/19ad48ac19d2bf1bf3a8d7cf781fc1872562233c/ext/winevt/winevt_utils.cpp#L8-L20

VALUE
wstr_to_rb_str(UINT cp, const WCHAR* wstr, int clen)
{
  VALUE vstr;
  CHAR* ptr;
  int len = WideCharToMultiByte(cp, 0, wstr, clen, nullptr, 0, nullptr, nullptr);
  ptr = ALLOCV_N(CHAR, vstr, len);
  WideCharToMultiByte(cp, 0, wstr, clen, ptr, len, nullptr, nullptr);
  VALUE str = rb_utf8_str_new_cstr(ptr);
  ALLOCV_END(vstr);

  return str;
}

In winevt_c, probably the above function always called with clen=-1. It may be the cause.

@sutyak
Copy link
Author

sutyak commented Aug 3, 2021

More info. Using C# to write integer representations of the characters led me astray. There is no 03 ETX present. Instead it's a 13 (carriage return). I still don't know why it was printed to the screen as a 3.
What is still accurate is takes conversion to Unicode bytes to see the actual characters.

Here is a string snippet: found.\r\n浫牡䱫獩㹴琀∮
UTF-8 segment: found.\r\n
Unicode segment: kmarkList> t.

Bytes as UTF-8:
(after index 7 is when we see the encoding changed)

  Index Value Type
  [0] 102 byte
  [1] 111 byte
  [2] 117 byte
  [3] 110 byte
  [4] 100 byte
  [5] 46 byte
  [6] 13 byte
  [7] 10 byte
  [8] 230 byte
  [9] 181 byte
  [10] 171 byte
  [11] 231 byte
  [12] 137 byte
  [13] 161 byte
  [14] 228 byte
  [15] 177 byte
  [16] 171 byte
  [17] 231 byte
  [18] 141 byte
  [19] 169 byte
  [20] 227 byte
  [21] 185 byte
  [22] 180 byte
  [23] 231 byte
  [24] 144 byte
  [25] 128 byte
  [26] 226 byte
  [27] 136 byte
  [28] 174 byte

Bytes as Unicode. After index 15 the Unicode conversion shows the true readable values.

  Index Value Type
  [0] 102 byte
  [1] 0 byte
  [2] 111 byte
  [3] 0 byte
  [4] 117 byte
  [5] 0 byte
  [6] 110 byte
  [7] 0 byte
  [8] 100 byte
  [9] 0 byte
  [10] 46 byte
  [11] 0 byte
  [12] 13 byte
  [13] 0 byte
  [14] 10 byte
  [15] 0 byte
  [16] 107 byte
  [17] 109 byte
  [18] 97 byte
  [19] 114 byte
  [20] 107 byte
  [21] 76 byte
  [22] 105 byte
  [23] 115 byte
  [24] 116 byte
  [25] 62 byte
  [26] 0 byte
  [27] 116 byte
  [28] 46 byte
  [29] 34 byte

@cosmo0920
Copy link
Contributor

@sunayk Could you try to use winevt_c master ( fluent-plugins-nursery/winevt_c@bc89d44 )?
This commit could fix your garbage character issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants