Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg/sd/v1: Fix unnecessary freeze/thaw #3143

Closed
wants to merge 2 commits into from

Conversation

odinuge
Copy link
Contributor

@odinuge odinuge commented Aug 10, 2021

My testing shows that #3082 (commit f2db879) doesn't work as expected. This causes big issues in k8s.

PTAL @kolyshkin 😄

Finally back again after having some time off, so haven't had time to look at it before now.

return
devAllow, e := getUnitTypeProperty(m.dbus, unitName, getUnitType(unitName), "DeviceAllow")
if e == nil {
rv := reflect.ValueOf(devAllow.Value.Value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that devAllow.Value == dbus.MakeVariant([]deviceAllowEntry{}) did not work? Or that your version is more readable? (or both?)

Copy link
Contributor Author

@odinuge odinuge Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that devAllow.Value == dbus.MakeVariant([]deviceAllowEntry{}) did not work? Or that your version is more readable? (or both?)

Well, it doesn't seen to work for me. Does that for you, in that case there is something wrong with my setup? 🤔

And, I do think my version is less readable. 😅

Copy link
Contributor

@kolyshkin kolyshkin Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I improved the readability a notch in #3148:

devAllow, e := getUnitTypeProperty(m.dbus, unitName, unitType, "DeviceAllow")
   if e == nil {
      if rv := reflect.ValueOf(devAllow.Value.Value()); rv.Kind() == reflect.Slice && rv.Len() == 0 {
         needsFreeze = false
         needsThaw = false
         return
   }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I can do that instead if the others agree, even tho. the line is a bit too long. I'll update the PR.

Copy link
Contributor

@kolyshkin kolyshkin Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's too long, what about this:

                        if e == nil { 
                                rv := reflect.ValueOf(devAllow.Value.Value())
                                if rv.Kind() == reflect.Slice && rv.Len() == 0 { 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be it reads slightly better than the switch/case (although I agree that either version we came with so far is not super readable).

@kolyshkin
Copy link
Contributor

I guess what we need first is a unit test that would create a unit (of various types) with a particular property, get the property, and check it's as expected.

Last commit of #3082 is an integration test (TestPodSkipDevicesUpdate) that checks that the freeze is not done. I look at all this and I still can't understand why that test does not catch the issue this PR is fixing...

@odinuge
Copy link
Contributor Author

odinuge commented Aug 11, 2021

I guess what we need first is a unit test that would create a unit (of various types) with a particular property, get the property, and check it's as expected.

Yeah, some tests would be ok I guess. It is fairly hard to test tho. But keep reading freezer.state on the slice while constantly updating, and looking for only THAWED would work. I can look into doing it in go, but the bats stuff is a bit complicated tho., so that would require some boiler plate.

Last commit of #3082 is an integration test (TestPodSkipDevicesUpdate) that checks that the freeze is not done. I look at all this and I still can't understand why that test does not catch the issue this PR is fixing...

Well, it only verifies that there is no errors with the devices. It does however not check if the slice is frozen or not during the update. It should not be frozen at all during update. The test only verifies after the test.

To test this, one can simply verify like this - keep the loop running while running tests with sudo go test -run=TestPodSkipDevicesUpdate -v -test.count=5:

# in this branch
$ while true ; do grep FROZEN -R /sys/fs/cgroup/freezer/system.slice/; done
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN
# in master 
$ while true ; do grep FROZEN -R /sys/fs/cgroup/freezer/system.slice/; done             
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/test-PodSkipDevicesUpdate.scope/freezer.state:FROZEN                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                                                     
/sys/fs/cgroup/freezer/system.slice/system-runc_test_podTestPodSkipDevicesUpdate.slice/freezer.state:FROZEN                        

As seen here, only the scope/container is frozen with this branch, while in master both the pod/slice and the container/scope is frozen. Does that make sense to you @kolyshkin ?

@odinuge
Copy link
Contributor Author

odinuge commented Aug 11, 2021

Hmm. Is something wrong with CI, or?

@kolyshkin
Copy link
Contributor

Hmm. Is something wrong with CI, or?

Yup; see https://githubstatus.com/.

Seems over now, do a force push to trigger CI.

@kolyshkin
Copy link
Contributor

or we can close/open to re-kick CI (usually it works)

@kolyshkin kolyshkin closed this Aug 11, 2021
@kolyshkin kolyshkin reopened this Aug 11, 2021
@AkihiroSuda
Copy link
Member

Does this fix kubernetes/kubernetes#104280 ?

@odinuge
Copy link
Contributor Author

odinuge commented Aug 11, 2021

@AkihiroSuda: Does this fix kubernetes/kubernetes#104280 ?

Yes.

But as I have said a few times already, for k8s a flag for disabling this would be better and simpler. I guess we (in k8s, not here in runc) have to discuss if we should avoid using runc/libcontainer for setting cgroup things, since things like this will increase the overhead waaay more than necessary.

@odinuge odinuge force-pushed the freeze-cgroup-wip branch 2 times, most recently from 2644106 to a45ffdd Compare August 11, 2021 20:06
@odinuge odinuge changed the title [WIP] libct/cg/sd/v1: Fix unnecessary freeze/thaw libct/cg/sd/v1: Fix unnecessary freeze/thaw Aug 11, 2021
@kolyshkin
Copy link
Contributor

@odinuge I was playing with the test case and ended up doing #3148.

The other thing I did is slightly simplified the deviceAllow check (#3143 (comment))

We can either use #3148, or cherry-pick the test case from it to here.

@@ -518,3 +518,10 @@ func addCpuset(cm *dbusConnManager, props *[]systemdDbus.Property, cpus, mems st
}
return nil
}

func getUnitType(unitName string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add something like

	// This code should be in sync with getUnitName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack 👍

@odinuge
Copy link
Contributor Author

odinuge commented Aug 11, 2021

We can either use #3148, or cherry-pick the test case from it to here.

I can cherry-pick the test commit here if they should be in the same PR. I prefer to keep this PR, otherwise I cannot update and change the PR (you created based of this), making it a bit cumbersome to work on it. Having a separate PR with the test works as well.

@kolyshkin
Copy link
Contributor

I can cherry-pick the test commit here if they should be in the same PR

Please do (otherwise the CI is semi-useless).

making it a bit cumbersome to work on it

Something like git-rebase -i origin/master is a life saver when working on PRs with multiple commits. In most complicated cases I do git stash; git format-patch -o <dir> origin/master; git reset --hard origin/master, when git am those commits one by one, making changes and doing git commit --amend when necessary.

@kolyshkin kolyshkin added area/systemd backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 labels Aug 11, 2021
@kolyshkin kolyshkin mentioned this pull request Aug 11, 2021
@kolyshkin
Copy link
Contributor

Once ready and merged, we'll have a backport to 1.0 and will try 1.0.2 released ASAP.

@kolyshkin
Copy link
Contributor

things like this will increase the overhead waaay more than necessary

@odinuge your concern about this code path being slow is valid, it is also quite complicated. I was looking into how to make it fast and simple -- PTAL #3151.

Surely a simple "SkipFreeze" flag will be faster, but this seems lightweight enough to not cause performance concerns.

@odinuge
Copy link
Contributor Author

odinuge commented Aug 12, 2021

@odinuge your concern about this code path being slow is valid, it is also quite complicated. I was looking into how to make it fast and simple -- PTAL #3151.

Surely a simple "SkipFreeze" flag will be faster, but this seems lightweight enough to not cause performance concerns.

Yeah, #3151 would probably work, but I don't like going around systemd like that approach does. Systemd should be the source of the truth in this case, as it is now.

(agian, I am mostly a fan of a flag for k8s, otherwise we have to look for alternatives. k8s do these calls very often, and with an increasing amount of pods, this will increase further. Same for shortlived containers. A flag as SkipDevices was previously is definitely the best approach for k8s; whether we have to use runc or implement it ourselves.)

Something like git-rebase -i origin/master is a life saver when working on PRs with multiple commits. In most complicated cases I do git stash; git format-patch -o

origin/master; git reset --hard origin/master, when git am those commits one by one, making changes and doing git commit --amend when necessary.

git is always the easy part. GitHub is the hard one, since I don't have push or edit access on your PR, only mine.

@odinuge odinuge force-pushed the freeze-cgroup-wip branch from a45ffdd to 9dbf27b Compare August 12, 2021 08:49
This fixes the behavior intended to avoid freezing containers/control
groups without it being necessary. This is important for end users of
libcontainer who rely on the behavior of no freeze.

The previous implementation would always get error trying to get
DevicePolicy from the Unit via dbus, since the Unit interface doesn't
contain DevicePolicy.

Signed-off-by: Odin Ugedal <[email protected]>
@odinuge odinuge force-pushed the freeze-cgroup-wip branch from 9dbf27b to efbf0ae Compare August 12, 2021 09:01
@harche
Copy link

harche commented Aug 12, 2021

/hold

I am running node e2e + systemd + crio tests locally using the runc binary built from this PR. The first run was successful, but I will have more runs to verify that this PR indeed fixes kubernetes/kubernetes#104280 (comment)

@odinuge
Copy link
Contributor Author

odinuge commented Aug 12, 2021

I am running node e2e + systemd + crio tests locally using the runc binary built from this PR. The first run was successful, but I will have more runs to verify that this PR indeed fixes kubernetes/kubernetes#104280 (comment)

It is the vendoring of runc into k8s that is important for fixing kubernetes/kubernetes#104280, not only the runc binary used by the CRI runtime.

@harche
Copy link

harche commented Aug 12, 2021

I am running node e2e + systemd + crio tests locally using the runc binary built from this PR. The first run was successful, but I will have more runs to verify that this PR indeed fixes kubernetes/kubernetes#104280 (comment)

It is the vendoring of runc into k8s that is important for fixing kubernetes/kubernetes#104280, not only the runc binary used by the CRI runtime.

True, but the failures I was seeing in CI were specifically coming from runc binary indicating that CRIO was able to start the container but it ended up in paused state.

/hold cancel

@odinuge
Copy link
Contributor Author

odinuge commented Aug 12, 2021

True, but the failures I was seeing in CI were specifically coming from runc binary indicating that CRIO was able to start the container but it ended up in paused state.

Yes, but that is due to kubelet (via this code) freezing kubepods.slice, burstable.slice, and pod-xyz.slice. Most of the issues in regards to the binary should already be fixed in runc v1.0.1.

@harche
Copy link

harche commented Aug 12, 2021

True, but the failures I was seeing in CI were specifically coming from runc binary indicating that CRIO was able to start the container but it ended up in paused state.

Yes, but that is due to kubelet (via this code) freezing kubepods.slice, burstable.slice, and pod-xyz.slice. Most of the issues in regards to the binary should already be fixed in runc v1.0.1.

Oh ok, that means the success I am seeing in node e2e (2 out of 2 runs so far) is due to me updating runc from rc95 to your latest PR (which has v1.0.1).

Comment on lines 354 to 355
// a non-existent unit returns default properties,
// and settings in (2) are the defaults.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests in this PR do not check whether this is true. I have added those checks last night, plus another test case -- @odinuge can you please re-cherry-pick the last commit from https://github.com/kolyshkin/runc/tree/fix-freeze-before-set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack 😄 Test looks good!

@kolyshkin
Copy link
Contributor

Yeah, #3151 would probably work, but I don't like going around systemd like that approach does. Systemd should be the source of the truth in this case, as it is now.

So, we have to admit that this is a kludge already. We need a way to figure out whether systemd is going do a disruptive device rules update. Any way to figure that out is fine as long as it works. Checking unit properties is one way, checking devices.list is another. Sure, #3151 has an assumption that devices.list in the kernel and systemd Device* properties are in sync. I think such assumption is valid (if not, this would mean systemd is not doing its job, ignoring device properties updates or some such).

(agian, I am mostly a fan of a flag for k8s, otherwise we have to look for alternatives. k8s do these calls very often, and with an increasing amount of pods, this will increase further. Same for shortlived containers. A flag as SkipDevices was previously is definitely the best approach for k8s; whether we have to use runc or implement it ourselves.)

Yes, this is also a way to figure out whether a freeze is needed or not, and it's fine as long as it works. The only thing I don't like about it is it requires an additional knob to set. Generally speaking, if there is a way to achieve a goal without adding an explicit knob, and it works reliably and fast, I think such way is better. Currently I see #3151 as such a way.

@odinuge
Copy link
Contributor Author

odinuge commented Aug 12, 2021

So, we have to admit that this is a kludge already. We need a way to figure out whether systemd is going do a disruptive device rules update. Any way to figure that out is fine as long as it works. Checking unit properties is one way, checking devices.list is another. Sure, #3151 has an assumption that devices.list in the kernel and systemd Device* properties are in sync. I think such assumption is valid (if not, this would mean systemd is not doing its job, ignoring device properties updates or some such).

Yeah, that solution does also work, and yeah, this is all pretty kludgy, I agree about that 😅 k8s and the runc binary also work differently, with different purposes, making it even more kludgy. But yeah, then we instead can just look at the provided resources, and if the devices allow access to all, then drop the freeze. If we don't rely on systemd, and don't want a separate knob, I think checking devices would be better and simpler to understand, together with better performance.

Eg something like:

oldDevices :=  []specs.LinuxDeviceCgroup{
  {
    Allow:  true,
    Access: "rwm",
  },
}

newDevices :=  []specs.LinuxDeviceCgroup{
  {
    Allow:  true,
    Access: "rwm",
  },
}


if allowsAll(oldDevices) && allowsAll(newDevices) {
  // Skip freezing
}

(although we might need to poke systemd or cgropufs in Apply to get state in the beginning, in slice/scope is already alive)

Any thoughts about that instead? Wouldn't that work for runc update as well, as long as people only manage the slice/scope via runc and not set devices directly in systemd? Or do you think that is too fragile?

Yes, this is also a way to figure out whether a freeze is needed or not, and it's fine as long as it works. The only thing I don't like about it is it requires an additional knob to set. Generally speaking, if there is a way to achieve a goal without adding an explicit knob, and it works reliably and fast, I think such way is better. Currently I see #3151 as such a way.

The SkipDevices is already a knob, and it worked properly for k8s until it was adapted for runc update. I don't really see an issue with it being a knob in the same way as SkipDevices is, but I guess that is another discussion. 😅

@odinuge
Copy link
Contributor Author

odinuge commented Aug 12, 2021

We can also change this to a single dbus call by using GetUnitTypePropertiesContext on the slice/scope, since both DeviceAllow and DevicePolicy are on the same interface.

Add a test for freezeBeforeSet, checking various scenarios including
those that were failing before the fix in the previous commit.

[v2: add more cases, add a check before creating a unit.]

Signed-off-by: Kir Kolyshkin <[email protected]>
@odinuge odinuge force-pushed the freeze-cgroup-wip branch from efbf0ae to 987ed97 Compare August 12, 2021 19:56
@kolyshkin
Copy link
Contributor

We can also change this to a single dbus call by using GetUnitTypePropertiesContext on the slice/scope, since both DeviceAllow and DevicePolicy are on the same interface.

It's an interesting idea, which I also considered but gave up on that since this returns all (tens of, I guess) properties in a map[string]interface{}, of which we only need two. This results in more garbage to collect by the GC.

OTOH if the call itself is relatively heavy, the cost of the saved call may outweigh the GC overhead, and this might be worth doing.

I guess the way to check which way is better is to benchmark both implementations, using runtime.GC() inside the benchmarked code to make sure GC overhead is accounted for. IOW something like

func BenchmarkXXX(b *testing.B) {
     // ... any prep code
     runtime.GC()
     b.ResetTimer()
     for i := 0; i < b.N, i++ {
          // ... whatever we test
     }
     runtime.GC()
}

@kolyshkin
Copy link
Contributor

if allowsAll(oldDevices) && allowsAll(newDevices) {
  // Skip freezing
}

(although we might need to poke systemd or cgropufs in Apply to get state in the beginning, in slice/scope is already alive)

Any thoughts about that instead? Wouldn't that work for runc update as well, as long as people only manage the slice/scope via runc and not set devices directly in systemd? Or do you think that is too fragile?

Well, this is conceptually the same that we do here and/or in #3151. In terms of #3151, allowsAll(oldDevices) is equivalent to !hasDeviceRules(), and allowsAll(newDevices) is equivalent to r.SkipDevices.

@kolyshkin
Copy link
Contributor

git is always the easy part. GitHub is the hard one, since I don't have push or edit access on your PR, only mine.

Ha! This is easy peasy -- become a runc maintainer, and you'll be able to shove your commits into contributors' PRs 🕺🏻

On a serious note, though, I think I misunderstood you, thought you were against cherry-picking a commit. 😀

@kolyshkin
Copy link
Contributor

kolyshkin commented Aug 13, 2021

OK I ended up writing this benchmark for freezeBeforeSet() (based on the test code):

Here are the results (first line is this PR, second is #3151):

BenchmarkFreezeBeforeSet-8   	    3757	    286971 ns/op	   16632 B/op	     680 allocs/op
BenchmarkFreezeBeforeSet-8   	  131257	      9116 ns/op	    2344 B/op	       7 allocs/op

To have some reference points, I did a few quick benchmarks

  1. start/stop a systemd scope:
BenchmarkApplyDestroy-8      	     808	   1390820 ns/op	  172188 B/op	    2130 allocs/op
  1. Set() (with no freeze checking or freeze code -- as if SkipFreeze flag is provided)
BenchmarkSet-8               	    6412	    183464 ns/op	    7266 B/op	     283 allocs/op
  1. Set() (with unconditional freeze, no processes in cgroup!):
BenchmarkSet-8               	    5127	    231167 ns/op	   10224 B/op	     298 allocs/op

From that we can estimate Set() overhead for kubelet case.

With current approach (#3082 (commit f2db879) + #3143): slowdown from 200 to 500 us (150% worse).

With devices.list approach (#3151): slowdown from 200 to 210 us (5% worse)

@kolyshkin
Copy link
Contributor

Ah, here's the benchmarking code: https://gist.github.com/kolyshkin/20e8f43fedde4989033941be8bef9fc6

It needs some tweaks to measure Set(), I was using these placeholders for Set measurements:

func (m *legacyManager) freezeBeforeSetNo(unitName string, r *configs.Resources) (needsFreeze, needsThaw bool, err error) {
        return false, false, nil
}

func (m *legacyManager) freezeBeforeSetYes(unitName string, r *configs.Resources) (needsFreeze, needsThaw bool, err error) {
        return true, true, nil
}

@odinuge
Copy link
Contributor Author

odinuge commented Aug 13, 2021

With current approach (#3082 (commit f2db879) + #3143): slowdown from 200 to 500 us (150% worse).

With devices.list approach (#3151): slowdown from 200 to 210 us (5% worse)

Nice. I guess that is more or less predicted. Freezing with a lot of procs, eg. 110 pods on 100 logical Linux CPUs, I assume the overhead will be quite noticeable, and probably quite huge (so people don't infer that unconditional freeze works ok).

Well, this is conceptually the same that we do here and/or in #3151. In terms of #3151, allowsAll(oldDevices) is equivalent to !hasDeviceRules(), and allowsAll(newDevices) is equivalent to r.SkipDevices.

Well, looking at it that way all the solutions are conceptually the same. But if we only look at the oci-data to see if we can avoid freezing, without any external calls. allowsAll(oldDevices) would then just look at the old value, instead of checking with systemd or looking at cgroupfs to verify it.

@kolyshkin
Copy link
Contributor

Well, looking at it that way all the solutions are conceptually the same. But if we only look at the oci-data to see if we can avoid freezing, without any external calls. allowsAll(oldDevices) would then just look at the old value, instead of checking with systemd or looking at cgroupfs to verify it.

Problem is, runc cgroup manager do not have persistent storage, and kubelet instantiates a new cgroup manager to call Set():

https://github.com/kubernetes/kubernetes/blob/b6d83f0ba3f155a4f6af9b37bf5511f66327cd5b/pkg/kubelet/cm/cgroup_manager_linux.go#L487

@odinuge
Copy link
Contributor Author

odinuge commented Aug 13, 2021

Problem is, runc cgroup manager do not have persistent storage, and kubelet instantiates a new cgroup manager to call Set():

https://github.com/kubernetes/kubernetes/blob/b6d83f0ba3f155a4f6af9b37bf5511f66327cd5b/pkg/kubelet/cm/cgroup_manager_linux.go#L487

Yeah, but we do supply cgroup config when creating the manager. But yeah, if the only way to avoid this "freeze dance" is for k8s to keep the manager around, I guess we can do that; but that will require some more code.

(not familiar with how the runc binary works in detail btw.)

@kolyshkin
Copy link
Contributor

not familiar with how the runc binary works in detail btw.

It's (usually) a short-lived binary, so we also instantiate a new manager for e.g. runc update or runc destroy. runc has a state file though (which I plan to split into two parts), so it "knows" the current state (unless someone uses other tools to change it, that is).

@kolyshkin
Copy link
Contributor

Yeah, but we do supply cgroup config when creating the manager.

You can supply anything, meaning this solution would be a skip-freeze flag in disguise.

Oh well, computers are tough. 😩

#3161

@odinuge
Copy link
Contributor Author

odinuge commented Aug 14, 2021

You can supply anything, meaning this solution would be a skip-freeze flag in disguise.

And that is exactly what I want! 😄

Oh well, computers are tough.

Yeee, they are...

#3161

Ack. +1 for that one

@kolyshkin
Copy link
Contributor

Superceded by (already merged) #3166 (which includes commits from this PR), thus closing. Thanks @odinuge for figuring this out!

@kolyshkin kolyshkin closed this Aug 23, 2021
@kolyshkin kolyshkin removed the backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 label Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants