Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Granular Configuration Filters #205

Open
arTECRAD opened this issue Dec 9, 2024 · 0 comments
Open

[Feature] Granular Configuration Filters #205

arTECRAD opened this issue Dec 9, 2024 · 0 comments

Comments

@arTECRAD
Copy link

arTECRAD commented Dec 9, 2024

This may be a niche circumstance, but I've been having difficulty collecting interface traffic statistics for a handful of CCR2116 deployments with large configurations. If I enable fetch_routers_in_parallel and set max_worker_threads to 8, attempting to scrape any more than two routers on a 15s scrape interval crashes or hangs MKTXP. From there, it looks to enter a loop where a connection is made to the router, then there are varying errors parsing output that cause it to disconnect and try again.

On the Grafana dashboard, neither of the routers show long collection times, averaging 200ms to 400ms total for each. The metrics server appears to be the bottleneck, as scrapes take around 10 seconds according to the Prometheus target health page. Increasing MKTXP's scrape interval to 1m and scaling timeouts accordingly will allow for a third router to be added, and the MKTXP scrape duration increases to ~20 seconds after doing so. Per router collection times remain in the same 200ms to 400ms range.

The deployments I'm polling run ~1500 VLAN interfaces per router and average anywhere from 1K to 10K active DHCP leases. In the end ~150 MikroTik routers will be polled, all running standardized configs. If there was an ability to restrict interface stats to specifics VIDs in the MKTXP config, or to only query DHCP leases for management/core infrastructure VLANs/VIDs, there would be a much more manageable quantity of time series when Prometheus queries MKTXP. The production configuration would use a longer scrape interval to offset the load some, but when more routers are added, keeping traffic data for critical interfaces like physical uplink ports would be a useful feature. Then, resources could be saved by polling just the small handful of critical interfaces on each router.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant