-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregated flows consume more space than collected and other strange things with 1.7 #427
Comments
There are a few things to explain, which may help you to identify some potential bottlenecks and understand why more CPU or bigger files result. Well everything gets bigger these days - right :) The file size:
Both files are lz4 compressed. In this case, the overhead is about 5%. It may of course differ for other compressions and other type of flow records. Up to 10% seems reasonable, however 25% is too much. You may verify file with
The number of records differs because of 1.6.x needs some internal extension records. Size of aggregated flows:
If you aggregate flows, new fields need to be added representing the number of flows. This adds an overhead, the aggregation itself reduces the size. Please make sure the resulting file is compressed the same way as the original file. If you have flows, which can not be aggregated, in theory the resulting file could be slightly bigger, which I never came across so far. CPU nfdump-1.6.x is not threaded, where as nfdump-1.7.x is. The parallel tasks are split between collecting flows and writing them into a memory buffer and compressing and writing the those buffers to disk. Furthermore nfdump 1.7.x got a more advanced decoder for ipfix and netflow v9 in order to cope with the increased demand of formats and data. However for v5 and v1 it is still the same code. If we speak about CPU its about the sum of %user+%system+%iowait. Therefore you have to carefully check for a potential bottleneck. If this is 1% or 2% I think this is not really the point. If you have a loaded system, as you describe while D(DoS), things matter. nfcapd-1.6 has only 1 buffer - the system packet buffer (arg In nfcapd-1.7.x there are more buffers and the multithreading allows to drain more CPU from the system if required. Flows are collected while data buffers are compressed and written. Therefore under a heady loaded packet stream nfcapd-1.7 performs much better the 1.6, but needs the CPU to complete the tasks. This means nfcapd-1.6 is limited at a certain point and starts dropping packets, but uses less CPU, where as 1.7 can process more packets at the cost of using more CPU. Compare the following numbers: I converted the 28Mio flows file back into a pcap data stream file which results in:
so, it's about a 5.2GB pcap file. In order to see how efficient the collector works, you can feed the pcap instead of reading from the network. This eliminates the system buffer and the collector processes the packets in the pcap file as fast as possible. To process the pcap, I use these commands:
avg CPU was 93% and
avg CPU was 112% The tests ran on an older MacBook with an i7 and SSD HD. If you compare the time needed to process the pcap as fast as possible, 1.7 is ways faster than 1.6.x but needs more CPU. As a result nfcapd-1.6 will drop earlier packets, if it can not process those, due to limited resources - a single thread. Last but not least: The missing I hope to shed some light on a bit more complex issue. Sorry for this long answer. |
Thank you Peter, your answer definitely made things clear - and in fact I really like long answers with details :) However, my concerns about disk space are bit different - I do collection into RAM (tmpfs) to avoid thrashing disks as no disks (within my budget at least) are able to handle peaks, while SSDs are dying quite fast if used for this purpose - this probably could be mitigated by highest-end enterprise models but those are far beyond my budget, but two Samsung's Datacenter PM893 did not survive even one year (what is strange though - not because of TBW limit). Due to all this, my current processing flow is as follows:
Thus I am trying to reduce space consumed as much as possible - RAM is very valuable and limited, unlike disks, and even 10% could matter a lot. Using bz2 is not an option as compression of one file takes longer that its production even in threaded mode. The ability to filter out unnecessary and not used/collected fields would be really nice - while it looks negligible it actually translates to significant reduction in size even after compression. As to aggregation... I did some experiments, and results are interesting - it looks like the size increases only for aggregated and compressed data. I have a small log with only 157K records, results are as follows:
As you could see, this is quite significant difference - 28%, and my only clue is that I did similar test for bigger set of data (~ 4.8M flows) and the difference was similar. May be my use case is a bit off (RAM instead of disks), but nevertheless there are systems (maybe even embedded) where disk space is scarce, this probably it still makes sense to have an option to filter out fields which are unused in some scenarios (mac addresses, mpls labels etc). |
Thanks for your answer. The aggregation does not add any unused fields. I only add one single extension and only if the number of aggregated flows for that record is > 1. Therefore I do not really understand, why that overhead is that big. Anyway, you can see which extension elements are used by each flow records by using the
The elements line show which extensions used to compile this record. Extension 1 is always required, all others are added from the template, which is sent by the exporter. If flows are aggregated, the extension 5 is added, if the number of aggregations is > 1, otherwise no changes are done. Extension 2 or 3 is needed for IP addresses. All extensions can be reviewed in nfxV3.h. I would be interested to analyse such a file. If it would be possible for you to share such a nfcapd file not aggregated, which shows this behaviour, you could send it to me be email bzip2 compressed :) to my email in the AUTHORS file. I do not want to change the default behaviour of 1.7 for the collection of elements and back porting the old |
The file should have arrived in your mailbox by now. Regarding the filtering of templates on output, I agree that you do not need to restore the previous behavior. A dedicated option would be much better. However, filtering could be done implicitly when aggregation is requested. In this case, we would know exactly which fields are meaningful, and everything else would not be needed by definition (except for obvious fields like start/stop time, counters, and maybe some other similar fields). |
Let's do in in several steps - first I will implement option |
Could you try the latest master, which implements option Only the matching extensions are stored by the collector. The extension IDs correspond to the definitions in nfxV3.h. For the minimal required record, use Does this make a difference for you? At least the collected flow files are now smaller. |
Yes, thank you - it does make a difference when there are millions of flows. The mystery with increased size after aggregation remains unsolved though. Did you have a chance to look at my file? I see that number of extensions is reduced after aggregation (exactly like in source) but the resulting size is still significantly larger, there must be something else which I am unable to figure out yet. |
Please update to the master branch! There was a bug in the init functions! |
I received the file, but could net yet figure out the mystery. All compression algorithms produce a larger file. |
Well, maybe this output from valgrind could hint you on something (I am not yet familiar with the code):
There are many similar issues - and this only happens when I provide Line numbers may be a bit off though as I am experimenting with my zstd support patch (it does not affect aggregation behavior anyway), but at least you know where to look :) |
The compression code is taken from a library. I need to check. |
I doubt that this is related to compression itself (same code works without |
I moved this into a new issue, as this is not related to the original topic |
I don't see a code bug in the larger file size. I guess this happens as a result of the byte sequence. Otherwise feel free to reopen. |
Hi,
Recently I decided to try out 1.7.1 but it feels a bit... different.
In particular:
nfcapd
consumes 2x more CPU than 1.6for instance during (D)DoS CPU usage in 1.6 was ~50-70% but with 1.7 it is close to 100%.
I didn't change anything in options that were used in 1.6 except for removing
-T
and-l
fromnfcapd
.-T
in theory could have some effect on collected file sizes as I don't need all the data which is sent but now there is no option to get rid of it.I am a bit confused and would like to know - did someone else also observed similar effects?
The text was updated successfully, but these errors were encountered: