Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[21.05] extract osds into separate units #778

Merged
merged 5 commits into from
Sep 25, 2023

Conversation

ctheune
Copy link
Member

@ctheune ctheune commented Sep 5, 2023

We had hosts run out of memory and the OSDs were all subsumed in the same cgroup which caused them to be killed together because systemd values consistency in a unit's state ...

There might be interesting cases here if NixOS decides to activate/deactivate anything, but hopefully the current setup will cause the "group unit" to catch this. I can imagine cases where this might not be the case, but I can't show that they actually exist currently.

Re PL-131646

@flyingcircusio/release-managers

Release process

Impact: internal only

Changelog:

  • move Ceph OSD processes into individual systemd service units per OSD
    • adjust fc-ceph accordingly
  • rename service for managing all OSDs from fc-ceph-osds.service -> fc-ceph-osds-all.service

Security implications

  • Security requirements defined? (WHERE)
    • automatic transition of running osd services at unattended upgrade time
    • availability: prevent systemd from stopping all osds at the failure of a single one, just because it considers all of them to belong together
  • Security requirements tested? (EVIDENCE)
    • automated tests still pass
    • manually tested and documented proper automatic transition of running OSD services at unattended system upgrade
    • extensive manual fc-ceph operations tests in dev cluster

@osnyx osnyx force-pushed the PL-131646-extract-osds-into-separate-units branch 5 times, most recently from 53a52a4 to 3e9d5c9 Compare September 19, 2023 16:39
@osnyx osnyx marked this pull request as ready for review September 19, 2023 16:55
@osnyx osnyx self-requested a review September 19, 2023 17:07
Copy link
Member

@osnyx osnyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, but to be fair I did most of the fine-tuning and adjustments of the core approach, so someone else needs to review as well.

@osnyx osnyx changed the title PL-131646 extract osds into separate units [21.05] extract osds into separate units Sep 19, 2023
ctheune and others added 5 commits September 19, 2023 21:16
We had hosts run out of memory and the OSDs were all subsumed in the
same cgroup which caused them to be killed together because systemd
values consistency in a unit's state ...

There might be interesting cases here if NixOS decides to
activate/deactivate anything, but hopefully the current setup will cause
the "group unit" to catch this. I can imagine cases where this might
not be the case, but I can't show that they actually exist currently.

Re PL-131646
co-authored by [email protected]
Before moving individual OSDs to their own units, the old `fc-ceph-osds`
unit with all osds needs to be stopped to allow the start of the
separated OSD units again.
This is achieved by renaming that service, making the NixOS
switch-to-configuration logic stop the old service and start the new
service at system update time.

This has been tested in our dev cluster:

stopping the following units: fc-blockdev.service, fc-ceph-mgr.service, fc-ceph-mon.service, fc-ceph-osds.service, fc-ceph-rgw.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, telegraf.service
NOT restarting the following changed units: systemd-fsck@dev-disk-by\x2dlabel-boot.service
activating the configuration...
setting up /etc...
reloading user units for os...
setting up tmpfiles
reloading the following units: dbus.service
restarting the following units: sensu-client.service
starting the following units: fc-blockdev.service, fc-ceph-mgr.service, fc-ceph-mon.service, fc-ceph-rgw.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, telegraf.service
the following new units were started: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], fc-ceph-osds-all.service, fc-update-channel.timer
follow-up to 5d9e656 where both packages
had only been removed as references from the overlay, but not as files
from the repo
@osnyx osnyx force-pushed the PL-131646-extract-osds-into-separate-units branch from fe798ae to f7c0f31 Compare September 19, 2023 19:16
@ctheune ctheune merged commit 70a69ab into fc-21.05-dev Sep 25, 2023
1 check passed
@ctheune ctheune deleted the PL-131646-extract-osds-into-separate-units branch September 25, 2023 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants