scx_mitosis: move tick() to timer #2759

dschatzberg · 2025-09-10T20:05:46Z

If cpu 0 is not running any scx_mitosis tasks, we can miss configuration updates. This was due to a misunderstanding of how tick() works. The intended behavior is just to periodically run this logic on some core. A bpf_timer is the appropriate way to do this and so this commit makes that change. There's an additional small fix to the freeing logic ensuring that cpus are given back to the root cgroup.

Most of the changes involve the need to cover more of what was previously the tick() logic in a larger rcu lock due to migrating it to a sleepable timer.

tommy-u

Looks great. A few ideas / questions.

tommy-u · 2025-09-10T21:48:08Z

scheds/rust/scx_mitosis/src/bpf/mitosis.bpf.c

-	if (bpf_get_smp_processor_id())
-		return;
+	int ret;
+	if ((ret = bpf_timer_start(timer, TIMER_INTERVAL_NS, 0))) {


From the angle of reducing re-entrancy risk, would it make sense to re-arm the timer at the end of this function instead of the beginning?

After reading, I think re-entrancy on the same core is impossible. This timer callback runs in a softirq context. While interrupts are enabled, sleeping is disabled, and another softirq cannot run until this one is finished.

Perhaps that implies we should try to keep reconfig short because it can block processing other softirqs.

Yeah, let me read up on this, it's not clear if there's any re-entrancy on a timer. It probably makes sense to do it at the end regardless.

tommy-u · 2025-09-10T21:52:23Z

scheds/rust/scx_mitosis/src/bpf/mitosis.bpf.c

 	bpf_rcu_read_unlock();
+	bpf_cgroup_release(root_cgrp_ref);
+	return 0;
 out_root_cgrp:
 	bpf_cgroup_release(root_cgrp_ref);


How bout using cleanup attributes for these? I find it difficult to keep all the gotos straight. I think it's also an easier way to (not have to) reason about error propagation.

Let me look into this as a followup PR. The cleanup attributes are kind of messy still and may require reordering code in some cases. But in general, I agree that it makes the code cleaner.

tommy-u · 2025-09-10T22:13:41Z

scheds/rust/scx_mitosis/src/bpf/mitosis.bpf.c

Should we call bpf_timer_cancel() somewhere? mitosis_exit()?

What is protecting the core callbacks (enqueue, dispatch, select) from race with reconfiguration? This question isn't new to this diff, I was confused about it in the tick() implementation as well.

I don't have a firm example in mind, but how much trouble could we get in if a cpumask changed in the middle of dispatch()? (Unfortunately I'm thinking in terms of the L3 aware version of dispatch() here.)

Should we call bpf_timer_cancel() somewhere? mitosis_exit()?

I don't think we need to - the bpf program getting unloaded will kill the timer.

What is protecting the core callbacks (enqueue, dispatch, select) from race with reconfiguration?

It's inherently racy - the only important properties are that we preserve atomicity - e.g. the cpumask is always something that the timer work published, never some intermediate and that we never miss an update - e.g. that eventually we can be sure all scheduled tasks are using (or will use on next schedule) the latest published cpumask. You'll notice a common pattern to how we mutate cpumasks:

We operate on the tmp_cpumask associated with a cell - this is only accessed by the tick() or timer() work so there's no worry about partial reads of this cpumask

Once we have completed the operation we bpf_kptr_xchg() that tmp_cpumask into the real cpumask - this atomically "publishes" the new cpumask.

Any concurrent enqueue, dispatch, init, select, etc. would either read the previously published cpumask or the newly published cpumask - atomicity is ensured.

We sequence the read/writes of applied_configuration_seq such that we always publish the cpumask first and then bump applied_configuration_seq and on the read side we always read the applied_configuration_seq and then read the cpumask so it's not possible to miss a cpumask update

If cpu 0 is not running any scx_mitosis tasks, we can miss configuration updates. This was due to a misunderstanding of how tick() works. The intended behavior is just to periodically run this logic on some core. A bpf_timer is the appropriate way to do this and so this commit makes that change. There's an additional small fix to the freeing logic ensuring that cpus are given back to the root cgroup. Signed-off-by: Dan Schatzberg <[email protected]>

dschatzberg requested review from dforsyth, tommy-u and kkdwivedi September 10, 2025 20:05

tommy-u approved these changes Sep 10, 2025

View reviewed changes

dschatzberg force-pushed the mitosis-cell-recreation branch from f096d1a to 0e5aca7 Compare September 15, 2025 15:09

tommy-u approved these changes Sep 15, 2025

View reviewed changes

dschatzberg force-pushed the mitosis-cell-recreation branch from 0e5aca7 to b9f1706 Compare September 15, 2025 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scx_mitosis: move tick() to timer #2759

scx_mitosis: move tick() to timer #2759

dschatzberg commented Sep 10, 2025

Uh oh!

tommy-u left a comment

Uh oh!

tommy-u Sep 10, 2025

Uh oh!

tommy-u Sep 10, 2025

Uh oh!

dschatzberg Sep 11, 2025

Uh oh!

tommy-u Sep 10, 2025

Uh oh!

dschatzberg Sep 11, 2025

Uh oh!

tommy-u Sep 10, 2025

Uh oh!

tommy-u Sep 10, 2025

Uh oh!

dschatzberg Sep 11, 2025

Uh oh!

Uh oh!

scx_mitosis: move tick() to timer #2759

Are you sure you want to change the base?

scx_mitosis: move tick() to timer #2759

Conversation

dschatzberg commented Sep 10, 2025

Uh oh!

tommy-u left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!