-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix windows event channel #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Remove excess whitespace among other things. No code changes made.
Work toward fixing issue ossec#224. The event channel output was multiline which was causing issues for users. Found that the old event log code was getting subjected to some string manipulation that was removing newlines and replacing tabs after argument fields with spaces. Moved this code to a central location so both the old and new can be subjected to the same manipulation. Decided to call this function win_format_event_string(). Despite this change, the output from the two event log gathering methods still differs slightly. The old event log message seems to go through a few other things that can change the string liek FormatMessage(), which is a Windows system call. During a small amount of testing it did not appear the old event log gathering ever needed to have newlines removed and tabs replaced. It appears that either the logs are coming out that way or the formatting is being done somewhere previously in the code. Perhaps, FormatMessage() is doing it. That said, the event channel stuff certainly does get affected by win_format_event_string() so it more closely matches it's counterpart albeit not exactly.
Put parenthesis around 'no user' to match what the event log code does.
Instead of only pulling out certain events from the event logs with event channel pull out everything. This is how nxlog does it and they seems to have a good handle on eventlog and eventchannel. Good designers copy, great designers steal. Much thanks goes to nxlog and keeping their source open. It has been a great help. The people over at Zabbix also seem to have a similar tool that also has good code to look at for inspiration and examples. Not sure if there is any benefit to only pulling out certain events from the event log. Perhaps, there is some performance benefit. Seems like nxlog finds it acceptable and their software is quite fast so I doubt it is of major significance. These changes again bring the output of the eventlog code and the new eventchannel code closer together.
The code to get an event's user and domain was some what confusing so additional comments were added to clear up why the code is the way it is. Also, added additional error handling. Whether it was needed or not is questionable but hpoefully it should help in debugging problems should they arise.
There was pretty much no error checking in the eventchannel code. Adding error checking obviously has the benefit of catching and logging errors but it also helps in readability. It is now a bit easier to understand what the code is doing and where things can go wrong and why. A good example of this is many of the Windows system calls are done twice. The first call is done to determine the buffer size necessary to hold the data a proper call would produce. This allows for the right amount of memory to be allocated. This first call actually fails but it fails in a very predictable manner. Unexpected failures should be accounted for. Reworked get_messages() to be get_message() which is less code and a bit more dynamic in what it can do. There are a number of events (shown below) someone might want to access and this new function allows for that pretty easily. typedef enum _EVT_FORMAT_MESSAGE_FLAGS { EvtFormatMessageEvent = 1, EvtFormatMessageLevel = 2, EvtFormatMessageTask = 3, EvtFormatMessageOpcode = 4, EvtFormatMessageKeyword = 5, EvtFormatMessageChannel = 6, EvtFormatMessageProvider = 7, EvtFormatMessageId = 8, EvtFormatMessageXml = 9 } EVT_FORMAT_MESSAGE_FLAGS; Changed many of the functions to return errors when they are encountered. Those errors might not be accounted for because they frankly don't need to. The data being gathered just won't be displayed in the final log message. The errors happening inside the functions will also be logged for review later. Made a lot of changes to the code format to improve readability. Things like spacing and changing variable names went a long way. Some things were a bit inconsistent. Found that a few variables were not being declared NULL to begin with which had the potential for errors later. Specifically, the os_event structure in send_event_channel() seemed to be prone to this. Created new functions to handle the opening, creating and updating of the bookmarks. Also, made some improvements in how that is all handled but there is still a lot that can be done which will come in a future commit. With the bookmark code, access() was used which has the common time of check, time of use pitfalls. This has been removed in favor of just opening the file and asking forgiveness later. Finally, the last notable thing to discuss is that wcstombs_s() was not converting many of the event log messages. This was able to be successfully reproduced by generating the logs produced when a user access the system using RDP. It's possible there was an issue with how wcstombs_s() was being used by switching to WideCharToMultiByte() solved the issue. Since this code does not need to be cross platform this seems like the preferable choice when presented with both anyway.
Use GetLastError() when logging the error code.
Use temporary files and atomic move to update bookmarks when using eventchannel. This method should be less error prone and easier to understand.
The convert_windows_string() function no longer has a set limit on the string size it is able to convert. Enough memory should be allocated to hold the entire string. The same can be said for the new convert_unix_string() function. This was created to replace mbstowcs_s() with MultiByteToWideChar().
The eventlog code transforms the "level" using the EventType member from the EventLogRecord struct into human readable strings. Ported this same type of logic over to the eventchannel code using the nxlog source as a template for how to do this. This brings the log format and output more in line with one another.
Some of the bookmarks can have slashes in the name. For example, "Microsoft-Windows-Sysmon/Operational". The slashes can get interpreted as part of the path when creating both the bookmark and the temporary files used to update the bookmarks which results in errors. There was already some code to replace the last slash in the eventchannel name with a space when creating the bookmark file name. This code was turned into a function called replace_slash() and is now also done when creating the bookmark temporary file names. The temprorary file names were also changed slightly by adding a trailing '-' before the random characters are added by mkstemp_ex(). Also, the slash was replaced with a space. A space in file names is sort of ugly so that was changed to an underscore. It may have some affect on anyone using eventchannel and future events but hopefully the impact is negligible.
I am not able to test this but looks good from reading. |
@ossec any one else tested this yet? |
I have had this build running on a couple Server 2008R2 + 2012R2 boxes for a couple weeks now pulling eventchannel logs and all appears well. |
@defensivedepth Great sounds like it's time for a larger pool of testers. THank you |
Just an update on this... I am noticing a much higher than average ossec client memory usage on the 2008 R2 servers using this build... A couple of them are 500mb up to 800mb, with only an up-time of 2 days... |
Hrm, is the memory rising or does it stay at that level? What are "normal" levels when it comes to OSSEC on Windows? Sounds like a memory leak. Question is, is the leak in the eventchannel code or somewhere else. The build you are using has not only my changes but quite a few others that MAY be causing issues. Any errors in the OSSEC log? If you disable eventchannel and restart the agent does the memory issue persist? |
After looking into this further, It appears to gradually increase after a reboot until all memory is consumed on the box... Normal levels that I have typically observed is under 10mb... No errors in the OSSEC log. I will disable event channel, restart the agent and get back to you on if the issue persists... |
Definitely seems like a memory leak. I see a few issues with the event channel code that might fix the issue but might take me a bit before I get enough free time to work on it. Let me know how your testing goes with disabling event channel. |
After disabling the eventchannel logs and restarting the agent, memory usage sat at under 10mb for the next 5 hours. Once I enabled the eventchannel logs and restarted the agent, I saw an instant CPU & memory spike (which is to be somewhat expected, as it shipped off the 50k events that were generated and not sent to the OSSEC server during those 5 hours).. During this time when the new events were being processed, the memory usage of the agent went from <10mb to 127mb, and then the memory increase became much slower, correlating to when the agent stopped processing the backlog. During the next hour, memory usage continued to increase, but more slowly... Currently at 155mb and increasing. Based on this testing, I would agree that the leak appears to be related with the eventchannel code. No worries on the timing... If I have to have this functionality right now, I can just set the agent to restart hourly... |
Great info and testing notes. Thanks for testing and helping to narrow this down. |
Let me know if I can do any more testing, etc |
Found that this buffer never has This is probably the biggest source of memory leakage that I created. @jrossi also pointed out several spots where I should be doing an |
Most of the memory leaks were fixed in #496. Turns out I didn't introduce the memory leaks after all. Most were already there I just didn't see them. |
Quite a few changes/fixes/updates to the Windows event channel code. Many changes were done to help address issues in #224. Some testing was done by myself and @defensivedepth but more testing would certainly be appreciated. Need people who use Windows and event channel.
Tried to make the output from the
eventlog
format and theeventchannel
format as similar as possible. Added a lot of error checking. Fixed issues where not all log messages were able to be displayed. Moved to using temporary files and atomic moves to update bookmarks.