Skip to content

[Feature] Basic systemd service monitoring#1153

Merged
henrygd merged 3 commits intohenrygd:1153-systemd-servicesfrom
smtucker:systemd
Nov 10, 2025
Merged

[Feature] Basic systemd service monitoring#1153
henrygd merged 3 commits intohenrygd:1153-systemd-servicesfrom
smtucker:systemd

Conversation

@smtucker
Copy link
Contributor

@smtucker smtucker commented Sep 8, 2025

This introduces a new feature to monitor systemd services on Linux hosts. The agent now collects service statuses and sends them to the hub, where they are displayed on the system detail page and summarized on the main dashboard. #722

Key Changes:

  • Agent:

    • The agent now collects systemd service names and their statuses (active, inactive, failed, etc.).
    • It gracefully handles permissions by attempting to connect to the system-wide systemd instance first, then falling back to the user-level instance if necessary. This ensures functionality whether the agent is run as root or a standard user.
  • System Detail Page:

    • A new "Systemd Services" card displays a detailed list of all services.
    • The view is collapsed by default, showing only failed services for immediate attention. An expander reveals the full list.
    • Services are sorted to prioritize failed ones, and each status is color-coded with an indicator dot for quick visual assessment.
    • A filter allows users to search for specific services by name.
  • Dashboard:

    • The main systems table now includes a "Services" column.
    • This column displays the number of failed services in red or a green checkmark if all services are running correctly, providing an at-a-glance health check.
20250907_23h48m26s_grim 20250907_23h44m19s_grim

@henrygd
Copy link
Owner

henrygd commented Sep 9, 2025

Thanks, I'll pull and check it out 👍

@henrygd
Copy link
Owner

henrygd commented Sep 9, 2025

Sorry, I was reorganizing things the past few days and had to force push to resolve the conflicts.

@tecosaur
Copy link

After watching #510 with interest, it's great to see this 😀

As someone who's running a bunch of apps/services with systemd instead of docker (on NixOS), I'd be interested to know if individual services can be shown/monitored in the same way as containers shown in the screenshot at the start of the readme (with status, CPU usage, memory usage, etc.)?

Some way of configuring particular services to be shown/monitered in the same way as containers currently are would be fantastic.

@smtucker
Copy link
Contributor Author

After watching #510 with interest, it's great to see this 😀

As someone who's running a bunch of apps/services with systemd instead of docker (on NixOS), I'd be interested to know if individual services can be shown/monitored in the same way as containers shown in the screenshot at the start of the readme (with status, CPU usage, memory usage, etc.)?

Some way of configuring particular services to be shown/monitered in the same way as containers currently are would be fantastic.

Cool idea! This doesn't add resource monitoring for services, just an 'at a glance' for seeing if services have failed. It's definitely possible to add grabbing the memory and CPU usage since the agent here has access to the whole dbus properties context. However, I would be interested to know what @henrygd thinks about this and how to best configure specifying which services to do that for.

I'll have some time next week to play around with that.

@svenvg93
Copy link
Collaborator

Cool idea! This doesn't add resource monitoring for services, just an 'at a glance' for seeing if services have failed. It's definitely possible to add grabbing the memory and CPU usage since the agent here has access to the whole dbus properties context. However, I would be interested to know what @henrygd thinks about this and how to best configure specifying which services to do that for.

I'll have some time next week to play around with that.

Just my two cents—if you decide to add it (which I’d love!), maybe consider adding a separate systemd tab, similar to what I did for the Docker section in this PR. That way it’s separated out, and we don’t end up with an endless scrolling page of charts 😆

@henrygd
Copy link
Owner

henrygd commented Sep 11, 2025

Agreed, I think we need to start splitting things up into different pages. Eventually we should have the following:

I definitely want to pull resource usage from the systemd services if possible, so it's more in line with the Docker monitoring.

I also want to start putting the new data into their own tables / rows instead of using the JSON blobs. That way we can actually query the individual items effectively. For example, if you want to display the top 50 services across all systems using the most memory. This is something I'll change on my end, you don't need to do anything right now in the PRs.

@chrisdeeming
Copy link

Looking forward to this. Good work @smtucker and @henrygd

@christophdb
Copy link

I am so happy to see SMART data on your upcoming feature list. That is awesome.

@tecosaur
Copy link

I think we need to start splitting things up into different pages. Eventually we should have the following…

@henrygd that sounds great. Would you consider also providing the option to hide unused/less relevant pages?

For example, I'd want to hide the container and proxomox/vms pages, and I imagine some people who are running everything through containers might want to hide the systemd page.

@smtucker
Copy link
Contributor Author

smtucker commented Sep 18, 2025

I've had a chance to spend some more time on this. While adding the service metrics, I refactored the implementation to better align with how container statistics are collected.

Here’s a summary of the changes:

  • Collects CPU and memory usage for each systemd service.
  • Adds a systemdManager to the agent.
  • Sends service statistics in CombinedData instead of SystemData.
  • Stores the records in a dedicated systemd_stats collection in the database.
  • The systemd services table now displays and allows sorting by CPU and memory usage.
20250917_22h56m39s_grim

I'll likely wait until #928 is merged before doing any more frontend work on this. I'm looking forward to seeing how dividing the UI into separate pages fleshes things out and would like to make it easier for this to be consistent.

@henrygd
Copy link
Owner

henrygd commented Sep 18, 2025

Awesome, thanks @smtucker! Really appreciate your work.

Just so you're aware of the timeline, I'm trying to wrap up with a big feature that includes some housekeeping to hopefully make the whole system a bit more flexible.

It will likely be another week or two before I will be able to merge this and the other PRs mentioned above.

@tecosaur Yes, we'll probably hide unused pages by default and add config options to exclude things you don't want to monitor.

@tecosaur
Copy link

tecosaur commented Nov 4, 2025

Thanks for updates Shelby and Hank. Now that #928 had been merged, I'm hoping that this PR in able to follow.

It looks like this work is pretty much good to go, is that right?

@henrygd
Copy link
Owner

henrygd commented Nov 4, 2025

I'll get to this shortly, just have a few small things to wrap up.

@mufeedali
Copy link

Would this add support for alerts when a systemd service goes down?

@henrygd
Copy link
Owner

henrygd commented Nov 6, 2025

@mufeedali Not quite yet. We will look into adding that after this is merged.

@henrygd
Copy link
Owner

henrygd commented Nov 6, 2025

Working on this now.

For performance reasons I think I'm going to change this to collect every 10 min instead of every 1 min.

Then I'm thinking we can have these columns for the services table:

Name | Status | CPU (10m avg) | CPU (24h max) | Memory | Memory (24h max) | Updated

We can also add a page similar to /containers that shows all services from all systems, though this may be a huge number of services.

@tecosaur
Copy link

tecosaur commented Nov 7, 2025

For performance reasons I think I'm going to change this to collect every 10 min instead of every 1 min.

What about live monitoring using the Systemd dbus API? https://www.freedesktop.org/wiki/Software/systemd/dbus/ I see that Go even has conveniences around this in the dbus package: https://pkg.go.dev/github.com/coreos/go-systemd/dbus?utm_source=godoc#Conn.SubscribeUnits.

I'd think this would be worth having for unit status at least.

@henrygd henrygd mentioned this pull request Nov 7, 2025
4 tasks
@henrygd
Copy link
Owner

henrygd commented Nov 7, 2025

That's what we're using. I improved perf a bit so it may be fine to collect every minute. I'll deploy and see how she goes.

@henrygd henrygd changed the base branch from main to 1153-systemd-services November 10, 2025 20:29
@henrygd henrygd merged commit 40b3951 into henrygd:1153-systemd-services Nov 10, 2025
henrygd added a commit that referenced this pull request Nov 10, 2025
Co-authored-by: Shelby Tucker <shelby.tucker@gmail.com>
@henrygd
Copy link
Owner

henrygd commented Nov 10, 2025

This is merged now, thanks very much!

I did change it to collect every 10 minutes, otherwise I was seeing around 5x CPU usage for the agent.

Should have a release out in the next few days, just need to write the docs and test a little further.

image image2

@Justinzobel
Copy link

This is amazing, thank you very much!

@henrygd
Copy link
Owner

henrygd commented Nov 13, 2025

I just realized I forgot to include the column in the 'All Systems' table. This will be added soon.

@FixNinja
Copy link

Is there an environment variable we can use to hide the service we don't want to monitor?

@henrygd
Copy link
Owner

henrygd commented Nov 13, 2025

@FixNinja May I ask why you want to do this?

In the next release we can let you supply your own patterns for which services you want to monitor, so you could do something like SERVICE_PATTERNS=*foo*,*bar*.

I think you could exclude something with SERVICE_PATTERNS=[!foo]*. Would that work for you?

henrygd added a commit that referenced this pull request Nov 13, 2025
@chrisdeeming
Copy link

@FixNinja May I ask why you want to do this?

In the next release we can let you supply your own patterns for which services you want to monitor, so you could do something like SERVICE_PATTERNS=*foo*,*bar*.

I think you could exclude something with SERVICE_PATTERNS=[!foo]*. Would that work for you?

I'd appreciate limited service monitoring too. Primarily so I can focus on things that represent issues for the services we run.

I know it has been mentioned already but alerts for those services on failure is going to be the most important part of this for us.

This is excellent work so far though and I love how this product is shaping up.

@OM-NATH
Copy link

OM-NATH commented Nov 13, 2025

@FixNinja May I ask why you want to do this?

In the next release we can let you supply your own patterns for which services you want to monitor, so you could do something like SERVICE_PATTERNS=*foo*,*bar*.

I think you could exclude something with SERVICE_PATTERNS=[!foo]*. Would that work for you?

Perfect, that's all I was looking for. The idea is that a server may have dozens of services running, and only a few of those are really important. SERVICE_PATTERNS will give us the flexibility to choose which ones we want to monitor. One small suggestion: it looks like right now we are pulling all the services, including exited ones. It would be better if we only pull the running (auto start) ones to reduce clutter.

@smtucker
Copy link
Contributor Author

smtucker commented Nov 13, 2025

Perfect, that's all I was looking for. The idea is that a server may have dozens of services running, and only a few of those are really important. SERVICE_PATTERNS will give us the flexibility to choose which ones we want to monitor. One small suggestion: it looks like right now we are pulling all the services, including exited ones. It would be better if we only pull the running (auto start) ones to reduce clutter.

I totally get why some people only want to monitor specific things, and for those users, having SERVICE_PATTERNS to specify which services to show in the frontend makes total sense for that use case.

However, regarding the suggestion to only pull running units by default, I personally feel it should still default to showing all the data received from the systemd API, including non-running units.

My personal thinking:

  • We only know if the unit is active or not after getting its status from the systemd API, so we already have the information. Filtering it in the backend at that point doesn't really save performance.
  • The frontend already allows you to filter if you want to reduce clutter.
  • It seems more fluid to manually filter if desired, than to require a user to manually enable everything if that's what they want.

@henrygd Thank you very much for considering, improving, and merging this pull request!

@Justinzobel
Copy link

Justinzobel commented Nov 14, 2025

My 2 cents, monitor everything by default and implement a way to configure what services to monitor (could be done in 2 ways):

Option 1:
Add it in Settings in the web UI
This would list every unique systemd service by name. Then the user can select which ones are important and they would be monitored on every instance. Basically a list of all names in one column on the left, then the user can click Add to move it to the monitored column (right).

Option 2:
Config file on the agent machines /etc/beszel-agent.conf with SYSTEMD_SERVICES='apache2;mysql;sshd'

@Darkrock04
Copy link

This introduces a new feature to monitor systemd services on Linux hosts. The agent now collects service statuses and sends them to the hub, where they are displayed on the system detail page and summarized on the main dashboard. #722

Key Changes:

* Agent:
  
  * The agent now collects systemd service names and their statuses (active, inactive, failed, etc.).
  * It gracefully handles permissions by attempting to connect to the system-wide systemd instance first, then falling back to the user-level instance if necessary. This ensures functionality whether the agent is run as root or a standard user.

* System Detail Page:
  
  * A new "Systemd Services" card displays a detailed list of all services.
  * The view is collapsed by default, showing only failed services for immediate attention. An expander reveals the full list.
  * Services are sorted to prioritize failed ones, and each status is color-coded with an indicator dot for quick visual assessment.
  * A filter allows users to search for specific services by name.

* Dashboard:
  
  * The main systems table now includes a "Services" column.
  * This column displays the number of failed services in red or a green checkmark if all services are running correctly, providing an at-a-glance health check.

20250907_23h48m26s_grim 20250907_23h44m19s_grim

it is showing blank in my dashboard no ✅ this sign

@jappi00
Copy link

jappi00 commented Nov 17, 2025

@Darkrock04 can confirm that.

@henrygd
Copy link
Owner

henrygd commented Nov 17, 2025

@Darkrock04 @jappi00 Are you on 0.16.1? I forgot to include the services column in 0.16.0.

@jappi00
Copy link

jappi00 commented Nov 17, 2025

Jeah I'am on 16.1.

Maybe it is docker related? The agent runs in docker.66

@EvilBMP
Copy link

EvilBMP commented Nov 20, 2025

I also have the problem, that agents with version 0.16.1 don't send any systemd service information anymore. Agents still on 0.16.0 send systemd service data.

My dashboard is already on 0.16.1 - agents differ from 0.15.2 to 0.16.1 - everything running via Docker.

@henrygd
Copy link
Owner

henrygd commented Nov 24, 2025

Do the agent logs show any warnings or errors?

@EvilBMP
Copy link

EvilBMP commented Nov 25, 2025

Yes, clients that have been upgraded to 0.16.1 show the known error/ warning mentioned from other users above

WARN Error connecting to systemd err="dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory" ref=https://beszel.dev/guide/systemd

Even if I downgrade to 0.16.0 again (with complete docker prune between), the warning persists! Clients that weren't upgraded yet and are on 0.16.0 from a former version 0.15.x are working as expected.

Docker Compose Configs are the same everywhere - only SSH Key and Token vary:

services:
  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    restart: unless-stopped
    network_mode: host
    security_opt:
      - apparmor:unconfined
    volumes:
      - ./beszel_agent_data:/var/lib/beszel-agent
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /run/dbus/system_bus_socket:/run/dbus/system_bus_socket:ro
      # monitor other disks / partitions by mounting a folder in /extra-filesystems
      # - /mnt/disk/.beszel:/extra-filesystems/sda1:ro
    environment:
      LISTEN: 45876
      KEY: "..."
      TOKEN: ...
      HUB_URL: ...

I really can't explain this behavior :-\

@henrygd
Copy link
Owner

henrygd commented Nov 25, 2025

Try changing the mount point to use /var/run instead of /run:

volumes:
    - /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket:ro

@jappi00
Copy link

jappi00 commented Nov 27, 2025

Hello @henrygd,

I added the security_opt and the volume and it works now. I would suggest adding that information to the docs.

@henrygd
Copy link
Owner

henrygd commented Nov 27, 2025

@jappi00 We have that documented on the Systemd Services page here: https://beszel.dev/guide/systemd

@RalphPungaKronbergs
Copy link

RalphPungaKronbergs commented Dec 11, 2025

Hello,

using the env var SERVICE_PATTERNS to limit the services to monitor works like a charm.

What I would like to know is whether an alarm/notification is triggered if a service stops?

Thx,
Ralph

@henrygd
Copy link
Owner

henrygd commented Dec 12, 2025

Service alerts have not been implemented yet.

@xd003
Copy link

xd003 commented Jan 7, 2026

I am running agent v0.17.0
The agent docker volume contains both of the following

- /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket:ro
- /var/run/systemd/private:/var/run/systemd/private:ro

Logs don't show any error but i don't see any systemd information on my dashboard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.