[Feature] Basic systemd service monitoring#1153
[Feature] Basic systemd service monitoring#1153henrygd merged 3 commits intohenrygd:1153-systemd-servicesfrom
Conversation
|
Thanks, I'll pull and check it out 👍 |
|
Sorry, I was reorganizing things the past few days and had to force push to resolve the conflicts. |
|
After watching #510 with interest, it's great to see this 😀 As someone who's running a bunch of apps/services with systemd instead of docker (on NixOS), I'd be interested to know if individual services can be shown/monitored in the same way as containers shown in the screenshot at the start of the readme (with status, CPU usage, memory usage, etc.)? Some way of configuring particular services to be shown/monitered in the same way as containers currently are would be fantastic. |
Cool idea! This doesn't add resource monitoring for services, just an 'at a glance' for seeing if services have failed. It's definitely possible to add grabbing the memory and CPU usage since the agent here has access to the whole dbus properties context. However, I would be interested to know what @henrygd thinks about this and how to best configure specifying which services to do that for. I'll have some time next week to play around with that. |
Just my two cents—if you decide to add it (which I’d love!), maybe consider adding a separate systemd tab, similar to what I did for the Docker section in this PR. That way it’s separated out, and we don’t end up with an endless scrolling page of charts 😆 |
|
Agreed, I think we need to start splitting things up into different pages. Eventually we should have the following:
I definitely want to pull resource usage from the systemd services if possible, so it's more in line with the Docker monitoring. I also want to start putting the new data into their own tables / rows instead of using the JSON blobs. That way we can actually query the individual items effectively. For example, if you want to display the top 50 services across all systems using the most memory. This is something I'll change on my end, you don't need to do anything right now in the PRs. |
|
I am so happy to see SMART data on your upcoming feature list. That is awesome. |
@henrygd that sounds great. Would you consider also providing the option to hide unused/less relevant pages? For example, I'd want to hide the container and proxomox/vms pages, and I imagine some people who are running everything through containers might want to hide the systemd page. |
|
I've had a chance to spend some more time on this. While adding the service metrics, I refactored the implementation to better align with how container statistics are collected. Here’s a summary of the changes:
I'll likely wait until #928 is merged before doing any more frontend work on this. I'm looking forward to seeing how dividing the UI into separate pages fleshes things out and would like to make it easier for this to be consistent. |
|
Awesome, thanks @smtucker! Really appreciate your work. Just so you're aware of the timeline, I'm trying to wrap up with a big feature that includes some housekeeping to hopefully make the whole system a bit more flexible. It will likely be another week or two before I will be able to merge this and the other PRs mentioned above. @tecosaur Yes, we'll probably hide unused pages by default and add config options to exclude things you don't want to monitor. |
|
Thanks for updates Shelby and Hank. Now that #928 had been merged, I'm hoping that this PR in able to follow. It looks like this work is pretty much good to go, is that right? |
|
I'll get to this shortly, just have a few small things to wrap up. |
|
Would this add support for alerts when a systemd service goes down? |
|
@mufeedali Not quite yet. We will look into adding that after this is merged. |
|
Working on this now. For performance reasons I think I'm going to change this to collect every 10 min instead of every 1 min. Then I'm thinking we can have these columns for the services table: Name | Status | CPU (10m avg) | CPU (24h max) | Memory | Memory (24h max) | Updated We can also add a page similar to |
What about live monitoring using the Systemd dbus API? https://www.freedesktop.org/wiki/Software/systemd/dbus/ I see that Go even has conveniences around this in the dbus package: https://pkg.go.dev/github.com/coreos/go-systemd/dbus?utm_source=godoc#Conn.SubscribeUnits. I'd think this would be worth having for unit status at least. |
|
That's what we're using. I improved perf a bit so it may be fine to collect every minute. I'll deploy and see how she goes. |
Co-authored-by: Shelby Tucker <shelby.tucker@gmail.com>
|
This is amazing, thank you very much! |
|
I just realized I forgot to include the column in the 'All Systems' table. This will be added soon. |
|
Is there an environment variable we can use to hide the service we don't want to monitor? |
|
@FixNinja May I ask why you want to do this? In the next release we can let you supply your own patterns for which services you want to monitor, so you could do something like I think you could exclude something with |
I'd appreciate limited service monitoring too. Primarily so I can focus on things that represent issues for the services we run. I know it has been mentioned already but alerts for those services on failure is going to be the most important part of this for us. This is excellent work so far though and I love how this product is shaping up. |
Perfect, that's all I was looking for. The idea is that a server may have dozens of services running, and only a few of those are really important. SERVICE_PATTERNS will give us the flexibility to choose which ones we want to monitor. One small suggestion: it looks like right now we are pulling all the services, including exited ones. It would be better if we only pull the running (auto start) ones to reduce clutter. |
I totally get why some people only want to monitor specific things, and for those users, having SERVICE_PATTERNS to specify which services to show in the frontend makes total sense for that use case. However, regarding the suggestion to only pull running units by default, I personally feel it should still default to showing all the data received from the systemd API, including non-running units. My personal thinking:
@henrygd Thank you very much for considering, improving, and merging this pull request! |
|
My 2 cents, monitor everything by default and implement a way to configure what services to monitor (could be done in 2 ways): Option 1: Option 2: |
it is showing blank in my dashboard no ✅ this sign |
|
@Darkrock04 can confirm that. |
|
@Darkrock04 @jappi00 Are you on 0.16.1? I forgot to include the services column in 0.16.0. |
|
Jeah I'am on 16.1. Maybe it is docker related? The agent runs in docker.66 |
|
I also have the problem, that agents with version 0.16.1 don't send any systemd service information anymore. Agents still on 0.16.0 send systemd service data. My dashboard is already on 0.16.1 - agents differ from 0.15.2 to 0.16.1 - everything running via Docker. |
|
Do the agent logs show any warnings or errors? |
|
Yes, clients that have been upgraded to Even if I downgrade to Docker Compose Configs are the same everywhere - only SSH Key and Token vary: I really can't explain this behavior :-\ |
|
Try changing the mount point to use volumes:
- /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket:ro |
|
Hello @henrygd, I added the security_opt and the volume and it works now. I would suggest adding that information to the docs. |
|
@jappi00 We have that documented on the Systemd Services page here: https://beszel.dev/guide/systemd |
|
Hello, using the env var SERVICE_PATTERNS to limit the services to monitor works like a charm. What I would like to know is whether an alarm/notification is triggered if a service stops? Thx, |
|
Service alerts have not been implemented yet. |
|
I am running agent v0.17.0 Logs don't show any error but i don't see any systemd information on my dashboard |





This introduces a new feature to monitor systemd services on Linux hosts. The agent now collects service statuses and sends them to the hub, where they are displayed on the system detail page and summarized on the main dashboard. #722
Key Changes:
Agent:
System Detail Page:
Dashboard: