Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mikrotik RouterOS v6.49.6 encodes sampling rate using wrong byte order (endian-less) in Netflow v9 #985

Open
pavel-odintsov opened this issue Jun 20, 2023 · 3 comments

Comments

@pavel-odintsov
Copy link
Owner

This issue started as part of investigation of reasons for 100% CPU usage on Mikrotik boxes during DDoS. Particular deployment had no firewall rules and expected to handle all the traffic using Fast Path. During DDoS attack customer noticed CPU usage spike to 100% which clearly led to network performance degradation.

After checking for lack of firewall rules victim role switched to non sampled by default Netflow which may have caused significant CPU usage as flow tracking is extremely challenging task for hardware especially in mostly software based routers like Mikrotik.

We obtained pcap dump from one of the customers which highlights issue in Mikrotik's Netflow v9 implementation when sampling is enabled.

Setup for this case was following:

set active-flow-timeout=30s cache-entries=16M enabled=yes inactive-flow-timeout=30s packet-sampling=yes sampling-interval=2222 sampling-space=1111

The very first issue we noticed was fact that Mikrotik uses data templates to deliver sampling rate instead of using template datagrams:

Screenshot from 2023-06-20 15-47-09

As an engineer I partially agree with their decision as option encoding is incredibly complicated but majority of Netflow collectors with struggle with such approach.

They use field 34 called samplingInterval and on IPFIX RFC it explained following way: "When using sampled NetFlow, the rate at which packets are sampled -- e.g., a value of 100 indicates that one of every 100 packets is sampled."

So it's basically sampling rate as is.

Let's check what we see in Wireshark:

image

It's clearly has nothing to do with 1111 or 2222 or 2 (sampling rate) configured in router's configuration.

What does 16777216 actually mean?

When I see such large numbers I immediately blame endianless-ness and it's for sure was the case in this particular scenario.

With help of small app we can decode it:

./a.out 
Data as is in big endian: 16777216
Data in host byte order: 1

test.cpp code:

#include <iostream>
#include <cstdint>


#include <arpa/inet.h>

int main() {
    uint32_t sampling_value = 0x01000000;
        
    std::cout << "Data as is in big endian: " << sampling_value << std::endl;

    std::cout << "Data in host byte order: " << ntohl(sampling_value) << std::endl;

    return 0;
}

What does it mean?

All the fields in Netflow have to be in network byte order which is also known as big endian.

Instead of encoding it this way Mikrotik stored it in little endian which is completely wrong and that's a reason why instead of 1 we see 16777216 in both FastNetMon and Wireshark.

Finally, why on Earth we see 1? We may have 1111, 2222 or 2 (actual sampling rate in this setup) but non of them is not 1.

You may guess that 1 means "sampling not enabled" and it will be wrong assumption as in such cases we see value 0:
image

Our current conclusion that we cannot even try to add support of such peculiar encoding due to so many issues.

We will be very grateful if active Mikrotik customers report this issue to [email protected]

We have no data for RouterOS 7 and if you can share it with it will be very helpful.

Thank you!

@pavel-odintsov
Copy link
Owner Author

Linking with similar issues: netsampler/goflow2#113 and akvorado/akvorado#417

@pavel-odintsov
Copy link
Owner Author

As a good news with great assistance from Community we got pcap with ROS7.10 and it works just fine:

2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001
2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001
2023-06-20 17:17:28,862 [INFO] Got sampling date from data packet: 1001

Router's configuration:

/ip/traffic-flow/set packet-sampling=yes sampling-interval=1 sampling-space=1000

@pavel-odintsov
Copy link
Owner Author

Even better news that we added support for RouterS v7 encoding format in FastNetMon Advanced and it will be part of next release. It will require setting flag netflow_v9_read_sampling_rate_in_data_section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant