Skip to content

Fix windows event channel #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 20, 2014
Merged

Conversation

awiddersheim
Copy link
Member

Quite a few changes/fixes/updates to the Windows event channel code. Many changes were done to help address issues in #224. Some testing was done by myself and @defensivedepth but more testing would certainly be appreciated. Need people who use Windows and event channel.

Tried to make the output from the eventlog format and the eventchannel format as similar as possible. Added a lot of error checking. Fixed issues where not all log messages were able to be displayed. Moved to using temporary files and atomic moves to update bookmarks.

Remove excess whitespace among other things. No code changes made.
Work toward fixing issue ossec#224. The event channel output was multiline
which was causing issues for users.

Found that the old event log code was getting subjected to some string
manipulation that was removing newlines and replacing tabs after
argument fields with spaces.

Moved this code to a central location so both the old and new can be
subjected to the same manipulation. Decided to call this function
win_format_event_string().

Despite this change, the output from the two event log gathering methods
still differs slightly. The old event log message seems to go through a
few other things that can change the string liek FormatMessage(), which
is a Windows system call.

During a small amount of testing it did not appear the old event log
gathering ever needed to have newlines removed and tabs replaced. It
appears that either the logs are coming out that way or the formatting
is being done somewhere previously in the code. Perhaps, FormatMessage()
is doing it.

That said, the event channel stuff certainly does get affected by
win_format_event_string() so it more closely matches it's counterpart
albeit not exactly.
Put parenthesis around 'no user' to match what the event log code does.
Instead of only pulling out certain events from the event logs with
event channel pull out everything. This is how nxlog does it and they
seems to have a good handle on eventlog and eventchannel. Good designers
copy, great designers steal. Much thanks goes to nxlog and keeping their
source open. It has been a great help. The people over at Zabbix also
seem to have a similar tool that also has good code to look at for
inspiration and examples.

Not sure if there is any benefit to only pulling out certain events
from the event log. Perhaps, there is some performance benefit. Seems
like nxlog finds it acceptable and their software is quite fast so I
doubt it is of major significance.

These changes again bring the output of the eventlog code and the new
eventchannel code closer together.
The code to get an event's user and domain was some what confusing so
additional comments were added to clear up why the code is the way it
is. Also, added additional error handling. Whether it was needed or not
is questionable but hpoefully it should help in debugging problems
should they arise.
There was pretty much no error checking in the eventchannel code. Adding
error checking obviously has the benefit of catching and logging errors
but it also helps in readability. It is now a bit easier to understand
what the code is doing and where things can go wrong and why.

A good example of this is many of the Windows system calls are done
twice. The first call is done to determine the buffer size necessary to
hold the data a proper call would produce. This allows for the right
amount of memory to be allocated. This first call actually fails but it
fails in a very predictable manner. Unexpected failures should be
accounted for.

Reworked get_messages() to be get_message() which is less code and a bit
more dynamic in what it can do. There are a number of events (shown
below) someone might want to access and this new function allows for
that pretty easily.

typedef enum _EVT_FORMAT_MESSAGE_FLAGS {
  EvtFormatMessageEvent     = 1,
  EvtFormatMessageLevel     = 2,
  EvtFormatMessageTask      = 3,
  EvtFormatMessageOpcode    = 4,
  EvtFormatMessageKeyword   = 5,
  EvtFormatMessageChannel   = 6,
  EvtFormatMessageProvider  = 7,
  EvtFormatMessageId        = 8,
  EvtFormatMessageXml       = 9
} EVT_FORMAT_MESSAGE_FLAGS;

Changed many of the functions to return errors when they are
encountered. Those errors might not be accounted for because they
frankly don't need to. The data being gathered just won't be displayed
in the final log message. The errors happening inside the functions will
also be logged for review later.

Made a lot of changes to the code format to improve readability. Things
like spacing and changing variable names went a long way. Some things
were a bit inconsistent.

Found that a few variables were not being declared NULL to begin with
which had the potential for errors later. Specifically, the os_event
structure in send_event_channel() seemed to be prone to this.

Created new functions to handle the opening, creating and updating of
the bookmarks. Also, made some improvements in how that is all handled
but there is still a lot that can be done which will come in a future
commit.

With the bookmark code, access() was used which has the common time of
check, time of use pitfalls. This has been removed in favor of just
opening the file and asking forgiveness later.

Finally, the last notable thing to discuss is that wcstombs_s() was not
converting many of the event log messages. This was able to be
successfully reproduced by generating the logs produced when a user
access the system using RDP. It's possible there was an issue with how
wcstombs_s() was being used by switching to WideCharToMultiByte() solved
the issue. Since this code does not need to be cross platform this seems
like the preferable choice when presented with both anyway.
Use GetLastError() when logging the error code.
Use temporary files and atomic move to update bookmarks when using
eventchannel. This method should be less error prone and easier to
understand.
The convert_windows_string() function no longer has a set limit on the
string size it is able to convert. Enough memory should be allocated to
hold the entire string.

The same can be said for the new convert_unix_string() function. This
was created to replace mbstowcs_s() with MultiByteToWideChar().
The eventlog code transforms the "level" using the EventType member from
the EventLogRecord struct into human readable strings. Ported this same
type of logic over to the eventchannel code using the nxlog source as a
template for how to do this. This brings the log format and output more
in line with one another.
Some of the bookmarks can have slashes in the name. For example,
"Microsoft-Windows-Sysmon/Operational". The slashes can get interpreted
as part of the path when creating both the bookmark and the temporary
files used to update the bookmarks which results in errors.

There was already some code to replace the last slash in the
eventchannel name with a space when creating the bookmark file name.
This code was turned into a function called replace_slash() and is now
also done when creating the bookmark temporary file names. The
temprorary file names were also changed slightly by adding a trailing
'-' before the random characters are added by mkstemp_ex().

Also, the slash was replaced with a space. A space in file names is sort
of ugly so that was changed to an underscore. It may have some affect
on anyone using eventchannel and future events but hopefully the impact
is negligible.
@jrossi
Copy link
Member

jrossi commented Dec 20, 2014

I am not able to test this but looks good from reading.

@jrossi
Copy link
Member

jrossi commented Dec 20, 2014

@ossec any one else tested this yet?

@defensivedepth
Copy link
Contributor

I have had this build running on a couple Server 2008R2 + 2012R2 boxes for a couple weeks now pulling eventchannel logs and all appears well.

@jrossi
Copy link
Member

jrossi commented Dec 20, 2014

@defensivedepth Great sounds like it's time for a larger pool of testers. THank you

jrossi added a commit that referenced this pull request Dec 20, 2014
@jrossi jrossi merged commit 3445b9f into ossec:master Dec 20, 2014
@defensivedepth
Copy link
Contributor

Just an update on this... I am noticing a much higher than average ossec client memory usage on the 2008 R2 servers using this build... A couple of them are 500mb up to 800mb, with only an up-time of 2 days...

@awiddersheim
Copy link
Member Author

Hrm, is the memory rising or does it stay at that level? What are "normal" levels when it comes to OSSEC on Windows?

Sounds like a memory leak. Question is, is the leak in the eventchannel code or somewhere else. The build you are using has not only my changes but quite a few others that MAY be causing issues. Any errors in the OSSEC log? If you disable eventchannel and restart the agent does the memory issue persist?

@defensivedepth
Copy link
Contributor

After looking into this further, It appears to gradually increase after a reboot until all memory is consumed on the box...

Normal levels that I have typically observed is under 10mb...

No errors in the OSSEC log.

I will disable event channel, restart the agent and get back to you on if the issue persists...

@awiddersheim
Copy link
Member Author

Definitely seems like a memory leak. I see a few issues with the event channel code that might fix the issue but might take me a bit before I get enough free time to work on it. Let me know how your testing goes with disabling event channel.

@defensivedepth
Copy link
Contributor

After disabling the eventchannel logs and restarting the agent, memory usage sat at under 10mb for the next 5 hours.

Once I enabled the eventchannel logs and restarted the agent, I saw an instant CPU & memory spike (which is to be somewhat expected, as it shipped off the 50k events that were generated and not sent to the OSSEC server during those 5 hours).. During this time when the new events were being processed, the memory usage of the agent went from <10mb to 127mb, and then the memory increase became much slower, correlating to when the agent stopped processing the backlog.

During the next hour, memory usage continued to increase, but more slowly... Currently at 155mb and increasing.

Based on this testing, I would agree that the leak appears to be related with the eventchannel code.

No worries on the timing... If I have to have this functionality right now, I can just set the agent to restart hourly...

@awiddersheim
Copy link
Member Author

Great info and testing notes. Thanks for testing and helping to narrow this down.

@defensivedepth
Copy link
Contributor

Let me know if I can do any more testing, etc

@awiddersheim awiddersheim deleted the fix_windows_event_channel branch January 5, 2015 16:54
@awiddersheim
Copy link
Member Author

Found that this buffer never has free() called.

https://github.com/ossec/ossec-hids/blob/master/src/logcollector/read_win_event_channel.c#L572

This is probably the biggest source of memory leakage that I created.

@jrossi also pointed out several spots where I should be doing an EvtClose() on event handles. Should have something shortly that will hopefully fix things.

@awiddersheim
Copy link
Member Author

Most of the memory leaks were fixed in #496. Turns out I didn't introduce the memory leaks after all. Most were already there I just didn't see them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants