You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue started as part of investigation of reasons for 100% CPU usage on Mikrotik boxes during DDoS. Particular deployment had no firewall rules and expected to handle all the traffic using Fast Path. During DDoS attack customer noticed CPU usage spike to 100% which clearly led to network performance degradation.
After checking for lack of firewall rules victim role switched to non sampled by default Netflow which may have caused significant CPU usage as flow tracking is extremely challenging task for hardware especially in mostly software based routers like Mikrotik.
We obtained pcap dump from one of the customers which highlights issue in Mikrotik's Netflow v9 implementation when sampling is enabled.
Setup for this case was following:
set active-flow-timeout=30s cache-entries=16M enabled=yes inactive-flow-timeout=30s packet-sampling=yes sampling-interval=2222 sampling-space=1111
The very first issue we noticed was fact that Mikrotik uses data templates to deliver sampling rate instead of using template datagrams:
As an engineer I partially agree with their decision as option encoding is incredibly complicated but majority of Netflow collectors with struggle with such approach.
They use field 34 called samplingInterval and on IPFIX RFC it explained following way: "When using sampled NetFlow, the rate at which packets are sampled -- e.g., a value of 100 indicates that one of every 100 packets is sampled."
So it's basically sampling rate as is.
Let's check what we see in Wireshark:
It's clearly has nothing to do with 1111 or 2222 or 2 (sampling rate) configured in router's configuration.
What does 16777216 actually mean?
When I see such large numbers I immediately blame endianless-ness and it's for sure was the case in this particular scenario.
With help of small app we can decode it:
./a.out
Data as is in big endian: 16777216
Data in host byte order: 1
test.cpp code:
#include <iostream>
#include <cstdint>
#include <arpa/inet.h>
int main() {
uint32_t sampling_value = 0x01000000;
std::cout << "Data as is in big endian: " << sampling_value << std::endl;
std::cout << "Data in host byte order: " << ntohl(sampling_value) << std::endl;
return 0;
}
What does it mean?
All the fields in Netflow have to be in network byte order which is also known as big endian.
Instead of encoding it this way Mikrotik stored it in little endian which is completely wrong and that's a reason why instead of 1 we see 16777216 in both FastNetMon and Wireshark.
Finally, why on Earth we see 1? We may have 1111, 2222 or 2 (actual sampling rate in this setup) but non of them is not 1.
You may guess that 1 means "sampling not enabled" and it will be wrong assumption as in such cases we see value 0:
Our current conclusion that we cannot even try to add support of such peculiar encoding due to so many issues.
We will be very grateful if active Mikrotik customers report this issue to [email protected]
We have no data for RouterOS 7 and if you can share it with it will be very helpful.
Thank you!
The text was updated successfully, but these errors were encountered:
As a good news with great assistance from Community we got pcap with ROS7.10 and it works just fine:
2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001
2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001
2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001
Even better news that we added support for RouterS v7 encoding format in FastNetMon Advanced and it will be part of next release. It will require setting flag netflow_v9_read_sampling_rate_in_data_section
This issue started as part of investigation of reasons for 100% CPU usage on Mikrotik boxes during DDoS. Particular deployment had no firewall rules and expected to handle all the traffic using Fast Path. During DDoS attack customer noticed CPU usage spike to 100% which clearly led to network performance degradation.
After checking for lack of firewall rules victim role switched to non sampled by default Netflow which may have caused significant CPU usage as flow tracking is extremely challenging task for hardware especially in mostly software based routers like Mikrotik.
We obtained pcap dump from one of the customers which highlights issue in Mikrotik's Netflow v9 implementation when sampling is enabled.
Setup for this case was following:
The very first issue we noticed was fact that Mikrotik uses data templates to deliver sampling rate instead of using template datagrams:
As an engineer I partially agree with their decision as option encoding is incredibly complicated but majority of Netflow collectors with struggle with such approach.
They use field 34 called samplingInterval and on IPFIX RFC it explained following way: "When using sampled NetFlow, the rate at which packets are sampled -- e.g., a value of 100 indicates that one of every 100 packets is sampled."
So it's basically sampling rate as is.
Let's check what we see in Wireshark:
It's clearly has nothing to do with 1111 or 2222 or 2 (sampling rate) configured in router's configuration.
What does 16777216 actually mean?
When I see such large numbers I immediately blame endianless-ness and it's for sure was the case in this particular scenario.
With help of small app we can decode it:
test.cpp code:
What does it mean?
All the fields in Netflow have to be in network byte order which is also known as big endian.
Instead of encoding it this way Mikrotik stored it in little endian which is completely wrong and that's a reason why instead of 1 we see 16777216 in both FastNetMon and Wireshark.
Finally, why on Earth we see 1? We may have 1111, 2222 or 2 (actual sampling rate in this setup) but non of them is not 1.
You may guess that 1 means "sampling not enabled" and it will be wrong assumption as in such cases we see value 0:
Our current conclusion that we cannot even try to add support of such peculiar encoding due to so many issues.
We will be very grateful if active Mikrotik customers report this issue to [email protected]
We have no data for RouterOS 7 and if you can share it with it will be very helpful.
Thank you!
The text was updated successfully, but these errors were encountered: