Implement Linux memory policy #3230

n4mlz · 2025-08-27T05:06:33Z

Description

This PR adds support for the new linux.memoryPolicy field introduced in the OCI Runtime Spec (opencontainers/runtime-spec#1282).
With this change, youki can configure NUMA memory policies for containers by calling set_mempolicy(2) during container initialization.

Implemented mapping of MemoryPolicyModeType and MemoryPolicyFlagType from the spec to the corresponding Linux mempolicy.h constants.
Added validation logic to ensure mode/flag/node combinations conform to the spec and kernel semantics:
- MPOL_DEFAULT / MPOL_LOCAL require empty nodes and no flags.
- MPOL_PREFERRED supports empty nodes (local allocation) and forbids STATIC_NODES/RELATIVE_NODES in that case.
- BIND / INTERLEAVE / PREFERRED_MANY / WEIGHTED_INTERLEAVE require non-empty nodes.
- MPOL_F_NUMA_BALANCING is only valid with BIND.
- STATIC_NODES and RELATIVE_NODES are mutually exclusive.
Implemented a parser for nodemask strings ("0,1-3") and conversion to the bitmask representation required by set_mempolicy.
Added syscall abstraction for set_mempolicy and integrated it into container setup.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test updates
CI/CD related changes
Other (please describe):

Testing

Added new unit tests
Added new integration tests
Ran existing test suite
Tested manually (please provide steps)

Related Issues

Fixes #3215
Related to opencontainers/runtime-spec#1282

Additional Context

This implementation depends on the new memory policy types recently added to oci-spec-rs (see youki-dev/oci-spec-rs#290).
Although the PR has been merged, no new crate version has been published yet.
As a result:

Locally, this branch was tested against a manually cloned version of oci-spec-rs containing the new definitions.
CI builds against the released crate version will likely fail until a new oci-spec release is published.

Once oci-spec is released with memory policy support, the dependency in Cargo.toml can be updated and this PR will build and run correctly.

saku3

Thanks for the PR!

Would it be possible to add an integration test similar to runc’s?
https://github.com/opencontainers/runc/pull/4726/files#diff-eab28bb7cc15c949e829d02adfa5ad5869d5cea2dde63d0c2c5f4c52225967e6

In youki, we implement integration tests here:
https://github.com/youki-dev/youki/tree/main/tests/contest

As noted in the description, nodes can be empty in some cases. The runtime-spec has also been updated:
https://github.com/opencontainers/runtime-spec/pull/1294/files

Could you update oci-spec-rs accordingly as well?

saku3 · 2025-08-29T22:17:05Z

crates/libcontainer/src/process/init/process.rs

+        const MPOL_F_STATIC_NODES: u32 = 1 << 15; // 0x8000
+
+        let base_mode = match policy.mode() {
+            oci_spec::runtime::MemoryPolicyModeType::MpolDefault => MPOL_DEFAULT,


import them like this:

use oci_spec::runtime::{ /* other */, LinuxMemoryPolicy, MemoryPolicyFlagType, MemoryPolicyModeType };

Please refer to line 11.

saku3 · 2025-08-30T21:51:45Z

crates/libcontainer/src/process/init/process.rs

+    let highest_node = node_ids.iter().max().copied().unwrap_or(0) as usize;
+
+    // Calculate maxnode as highest_node + 1 (number of nodes to consider)
+    let maxnode = (highest_node + 1) as u64;


Could you explain the rationale for how maxnode is calculated?

When I run an example like the one in runc’s test code, youki fails with an error. I suspect the cause is a difference in how maxnode is implemented, but I don’t fully understand it yet.

https://github.com/opencontainers/runc/pull/4726/files#diff-1c1e2ce4c942d6847ab0885e1ff1b92c28f8502fc536cc66e33493a8cbe13109R172

In set_mempolicy(2), nodemask points to a bit mask of node IDs that contains up to maxnode bits and the kernel will use bits only up to maxnode. Also, a maxnode value of zero specifies the empty set of nodes (in which case nodemask is ignored) .
Therefore, to make the highest set bit in nodemask visible to the kernel, maxnode should be one greater than the highest node ID we want to include.

The failure you encountered is likely not caused by the maxnode calculation itself. Right now, oci-spec-rs has already merged support for Linux memory policy (including mode/nodes/flags), but no new crate release has been published yet. If your local environment is still pointing to the crates.io release instead of the git revision with that PR, you may hit errors when parsing or building the spec. That may explain the discrepancy you observed when comparing with runc’s tests.

Sorry for the long message.
Please let me know if my analysis is incorrect.
If it’s working on your local environment, then something might be wrong with mine.

I first confirmed the machine has two NUMA nodes and the kernel version:

cat /sys/devices/system/node/online 0-1 uname -a Linux ip-172-31-11-219 6.14.0-1012-aws #12~24.04.1-Ubuntu SMP Fri Aug 15 00:16:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

I also switched oci-spec to a branch that includes memoryPolicy support:

[dependencies] - oci-spec = { version = "0.8.2", features = ["runtime"] } + oci-spec = { git = "https://github.com/n4mlz/oci-spec-rs.git", branch = "fix/mempolicy", features = ["runtime"] }

Then I ran the contest tests:

failed

just test-contest memory_policy /home/ubuntu/workspace/youki/scripts/build.sh -o /home/ubuntu/workspace/youki -r -c youki * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu Compiling libcontainer v0.5.5 (/home/ubuntu/workspace/youki/crates/libcontainer) Compiling youki v0.5.5 (/home/ubuntu/workspace/youki/crates/youki) Finished `release` profile [optimized] target(s) in 52.06s /home/ubuntu/workspace/youki/scripts/build.sh -o /home/ubuntu/workspace/youki -r -c contest * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu removed '/home/ubuntu/workspace/youki/contest' Compiling contest v0.1.0 (/home/ubuntu/workspace/youki/tests/contest/contest) Finished `release` profile [optimized] target(s) in 33.22s removed '/home/ubuntu/workspace/youki/runtimetest' Finished `release` profile [optimized] target(s) in 0.12s sudo /home/ubuntu/workspace/youki/scripts/contest.sh /home/ubuntu/workspace/youki/youki memory_policy # Start group memory_policy 1 / 10 : bind_static : not ok container stderr was not empty, found : ERROR libcontainer::process::init::process: failed to set memory policy err=Nix(EINVAL) ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall ERROR youki: error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall Error: failed to create container: exec process failed with error error in executing process : failed syscall 2 / 10 : bind_way_too_large_node_number : ok 3 / 10 : default_with_missing_nodes_ok : ok 4 / 10 : empty_memory_policy : ok 5 / 10 : interleave_without_flags : not ok container stderr was not empty, found : ERROR libcontainer::process::init::process: failed to set memory policy err=Nix(EINVAL) ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall ERROR youki: error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall Error: failed to create container: exec process failed with error error in executing process : failed syscall 6 / 10 : invalid_flag_string : ok 7 / 10 : invalid_mode_string : ok 8 / 10 : missing_mode_but_nodes_present : ok 9 / 10 : preferred_relative : not ok container stderr was not empty, found : ERROR libcontainer::process::init::process: failed to set memory policy (MPOL_PREFERRED) err=Nix(EINVAL) ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall ERROR youki: error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall Error: failed to create container: exec process failed with error error in executing process : failed syscall 10 / 10 : syscall_invalid_arguments : ok # End group memory_policy error: Recipe `test-contest` failed on line 62 with exit code 1

A simple manual run with the following OCI config also fails:

"linux": { "memoryPolicy" : { "mode": "MPOL_INTERLEAVE", "nodes": "1" } }

sudo strace -f -tt -s 128 -e trace=set_mempolicy -o /tmp/youki.strace \ ./youki run -b tutorial/ container ERROR libcontainer::process::init::process: failed to set memory policy err=Nix(EINVAL) ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall INFO libcgroups::common: cgroup manager V2 will be used DEBUG libcgroups::v2::manager: remove cgroup "/sys/fs/cgroup/:youki:container" ERROR youki: error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall run failed : failed to create container: exec process failed with error error in executing process : failed syscall

trace result

50065 06:03:37.212423 set_mempolicy(MPOL_INTERLEAVE, [0x00000000000002], 2) = -1 EINVAL (Invalid argument) 50064 06:03:37.212710 +++ exited with 0 +++ 50063 06:03:37.212746 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=50064, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 50065 06:03:37.213411 +++ exited with 255 +++ 50063 06:03:37.213466 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=50065, si_uid=0, si_status=255, si_utime=0, si_stime=0} --- 50063 06:03:37.221851 +++ exited with 255 +++

The root cause is explained by the Linux source code:

set_mempolicy

kernel_set_mempolicy

get_nodes

get_bitmap

do_set_mempolicy

mpol_new

See mm/mempolicy.c around here: https://github.com/torvalds/linux/blob/v6.14/mm/mempolicy.c

Why set_mempolicy(MPOL_INTERLEAVE, [0x00000000000002], 2) fails:

In the kernel’s get_nodes(), maxnode is decremented first (--maxnode).
With maxnode=2 from userspace, it becomes 1 inside the kernel.

Then get_bitmap() applies a tail mask to keep only the lowest maxnode bits:

mask[nlongs - 1] &= (1UL << (maxnode % BITS_PER_LONG)) - 1;

With maxnode=1 and BITS_PER_LONG=64, the mask is 0x1.

nodemask word is 0x2 (bit1 set = node1).
0x2 & 0x1 = 0x0 → the effective nodemask becomes empty.

MPOL_INTERLEAVE does not allow an empty nodemask, so the kernel returns -EINVAL.

mpol_new

} else if (nodes_empty(*nodes)) return ERR_PTR(-EINVAL);

runc

With the above in mind, running runc’s code related to memory_policy succeeds: opencontainers/runc#4726

sudo strace -f -tt -s 128 -e trace=set_mempolicy -o /tmp/youki.strace runc run -b tutorial/ container

strace

38296 06:57:27.319222 set_mempolicy(MPOL_INTERLEAVE, [0x00000000000002], 64) = 0

Therefore, I think it’s preferable to implement the maxnode specification the same way as runc.

numactl

strace -f -tt -s 128 -e trace=set_mempolicy -o /tmp/numa.strace numactl --interleave=1 sh

cat /tmp/numa.strace 38676 06:59:11.357283 set_mempolicy(MPOL_INTERLEAVE, [0x00000000000002, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000], 1025) = 0 38676 06:59:14.501691 +++ exited with 0 +++

For your reference, when I run numactl, it uses a much larger value for maxnode.

@saku3
Thank you very much for this thorough response!
I realize now that my initial verification was insufficient. When I set up a virtual machine with NUMA node number 2 and retested the functionality, I encountered exactly the same error you pointed out.

As you've correctly noted, the maxnode parameter in set_mempolicy(2) requires the total number of bits in the nodemask array. I've made the required modifications to properly prepare the nodemask in the same format as runc. Could you please verify this again?

cf: In my environment, the test results looked like this:

$ cat /sys/devices/system/node/online 0-1 $ uname -a Linux kernelhack 6.1.0-37-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 (2025-05-22) x86_64 GNU/Linux $ just test-contest memory_policy /home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c youki * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu Finished `release` profile [optimized] target(s) in 0.30s /home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c contest * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu removed '/home/n4mlz/youki/contest' Finished `release` profile [optimized] target(s) in 0.30s removed '/home/n4mlz/youki/runtimetest' Finished `release` profile [optimized] target(s) in 0.27s sudo /home/n4mlz/youki/scripts/contest.sh /home/n4mlz/youki/youki memory_policy Validation successful for runtime /home/n4mlz/youki/youki

n4mlz · 2025-09-03T10:28:23Z

Since oci-spec-rs crate—which includes support for Linux memory policy—hasn't yet been released, I couldn't perform tests via CI. However, I have added integration tests. By cloning the branch containing these changes in oci-spec-rs, editing youroki's Cargo.toml to point to the local crate, and running tests locally, we were able to verify that the tests pass.

saku3 · 2025-09-06T21:30:08Z

crates/libcontainer/src/process/init/process.rs

    Ok(())
 }

+fn setup_memory_policy(


(Before we proceed with the overall review.)
For maintainability, please consider moving setup_memory_policy, build_nodemask, and parse_node_string into a separate file like memorypolicy.rs.

saku3

I’ve reviewed it, and the overall structure looks good.
I left comments on the finer details—when you have a moment, please consider addressing them. If anything is unclear, feel free to ask anytime.

saku3 · 2025-09-14T00:04:12Z

crates/libcontainer/src/process/memory_policy.rs

+    memory_policy: &Option<oci_spec::runtime::LinuxMemoryPolicy>,
+    syscall: &dyn Syscall,
+) -> Result<()> {
+    if let Some(policy) = memory_policy {


How about using early returns? The if nesting is deep.

saku3 · 2025-09-14T00:30:55Z

crates/libcontainer/src/process/memory_policy.rs

+
+            for flag in flags {
+                match flag {
+                    oci_spec::runtime::MemoryPolicyFlagType::MpolFNumaBalancing => {


Please add a use import so MemoryPolicyFlagType can be referenced directly.

saku3 · 2025-09-14T00:34:06Z

crates/libcontainer/src/process/memory_policy.rs

+                        }
+                        flags_value |= MPOL_F_NUMA_BALANCING;
+                    }
+                    oci_spec::runtime::MemoryPolicyFlagType::MpolFRelativeNodes => {


Same here MemoryPolicyFlagType

saku3 · 2025-09-14T00:34:13Z

crates/libcontainer/src/process/memory_policy.rs

+                        has_relative = true;
+                        flags_value |= MPOL_F_RELATIVE_NODES;
+                    }
+                    oci_spec::runtime::MemoryPolicyFlagType::MpolFStaticNodes => {


Same here MemoryPolicyFlagType

saku3 · 2025-09-14T00:36:08Z

tests/contest/runtimetest/src/main.rs

+        "memory_policy_default" => {
+            tests::validate_memory_policy(&spec, Some(MemoryPolicyModeType::MpolDefault))
+        }
+        "memory_policy_empty" => tests::validate_memory_policy(&spec, None),


I think the memory_policy_empty test case is checking the same thing as hello_world, so it’s fine to omit it.

saku3 · 2025-09-14T03:34:57Z

crates/libcontainer/src/process/memory_policy.rs

+            MemoryPolicyModeType::MpolWeightedInterleave => MPOL_WEIGHTED_INTERLEAVE,
+        };
+
+        let mut flags_value: u32 = 0;


Personally, I think it would be easier to read if the validation and the construction of flags_value were separated.

saku3 · 2025-09-14T03:37:56Z

crates/libcontainer/src/process/memory_policy.rs

+        let mode_with_flags = base_mode | (flags_value as i32);
+
+        // Handle special cases for MPOL_DEFAULT and MPOL_LOCAL
+        match base_mode {


Personally, I think it would be clearer to separate validation from the application of set_mempolicy.

saku3 · 2025-09-14T03:43:14Z

crates/libcontainer/src/process/memory_policy.rs

+
+        // Handle special cases for MPOL_DEFAULT and MPOL_LOCAL
+        match base_mode {
+            MPOL_DEFAULT => {


Can the validation for MPOL_DEFAULT and MPOL_LOCAL be consolidated?

saku3 · 2025-09-14T03:43:25Z

crates/libcontainer/src/process/memory_policy.rs

+                        // Double-check: if the parsed result is empty (e.g., whitespace-only input),
+                        // treat as empty nodemask
+                        if maxnode == 0 {
+                            if flags_value & (MPOL_F_RELATIVE_NODES | MPOL_F_STATIC_NODES) != 0 {


memo: https://docs.kernel.org/admin-guide/mm/numa_memory_policy.html

It is possible for the user to specify that local allocation is always preferred by passing an empty nodemask with this mode. If an empty nodemask is passed, the policy cannot use the MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described below.

saku3 · 2025-09-14T03:45:33Z

crates/libcontainer/src/process/memory_policy.rs

+        assert_eq!(got_args[0].mode, 2); // MPOL_BIND (corrected value)
+        assert_eq!(got_args[0].nodemask.len(), 1); // One c_ulong needed
+        assert_eq!(got_args[0].nodemask[0], 3); // 2^0 + 2^1 = 1 + 2 = 3
+        assert_eq!(got_args[0].maxnode, 2); // highest node ID (1) + 1 = 2


Since maxnode is calculated as (num_u64s * u64_bits) as u64 above, the test may be incorrect.

n4mlz · 2025-09-19T06:40:12Z

Thank you for the thorough code review.
Sorry for the delay!
I've made all the requested changes. Please take a look.

On a machine with two NUMA nodes, I have verified that the tests pass as follows:

$ cat /sys/devices/system/node/online
0-1

$ just test-contest memory_policy
/home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c youki
* FEATURES: <default>
* TARGET: x86_64-unknown-linux-gnu
    Finished `release` profile [optimized] target(s) in 0.29s
/home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c contest
* FEATURES: <default>
* TARGET: x86_64-unknown-linux-gnu
removed '/home/n4mlz/youki/contest'
    Finished `release` profile [optimized] target(s) in 0.30s
removed '/home/n4mlz/youki/runtimetest'
    Finished `release` profile [optimized] target(s) in 0.27s
sudo /home/n4mlz/youki/scripts/contest.sh /home/n4mlz/youki/youki memory_policy
Validation successful for runtime /home/n4mlz/youki/youki

$

n4mlz · 2025-09-19T06:42:53Z

crates/libcontainer/src/process/memory_policy.rs

+    let highest_node = node_ids.iter().max().copied().unwrap_or(0) as usize;
+
+    // Calculate how many c_ulong values we need to store the bitmask
+    let bits_per_ulong = std::mem::size_of::<libc::c_ulong>() * 8;


u_long may have different bit widths depending on the environment, so it is treated as libc::u_long rather than u64.

saku3

Thanks for updating the changes. I really appreciate your thoughtful handling. I left a few additional notes—when you have time, please check them.

saku3 · 2025-09-27T01:01:05Z

crates/libcontainer/src/process/memory_policy.rs

+    InvalidNodes(String),
+
+    #[error("Invalid memory policy mode: {0}")]
+    InvalidMode(String),


Is this flag used anywhere?

saku3 · 2025-09-27T02:51:14Z

crates/libcontainer/src/process/init/process.rs

 use crate::namespaces::Namespaces;
 use crate::process::args::{ContainerArgs, ContainerType};
 use crate::process::channel;
+use crate::process::memory_policy;


Running just lint shows the following:

Suggested change

use crate::process::memory_policy;

use crate::process::{channel, memory_policy};

saku3 · 2025-09-27T02:51:25Z

crates/libcontainer/src/process/memory_policy.rs

@@ -0,0 +1,514 @@
+use oci_spec::runtime::MemoryPolicyFlagType;


Running just lint shows the following:

Suggested change

use oci_spec::runtime::MemoryPolicyFlagType;

use oci_spec::runtime::{MemoryPolicyFlagType, MemoryPolicyModeType};

crates/libcontainer/src/process/memory_policy.rs

+            let nodes = match policy.nodes() {
+                None => {
+                    return Err(MemoryPolicyError::InvalidNodes(format!(
+                        "Mode {} requires non-empty node specification",


saku3 · 2025-09-27T04:18:39Z

tests/contest/runtimetest/src/tests.rs

+                && !numa_maps_indicates_node0(first_line, policy_field)
+            {
+                eprintln!(
+                    "expected interleave including node0, but got: {}",


saku3 · 2025-09-27T04:54:54Z

tests/contest/runtimetest/src/tests.rs

+fn numa_maps_indicates_node0(first_line: &str, policy_field: &str) -> bool {
+    policy_field.contains("bind:0")
+        || policy_field.contains("prefer:0")
+        || first_line.contains("N0=")


I think this line is unnecessary.

|| first_line.contains("N0=")

saku3 · 2025-09-27T05:03:31Z

tests/contest/runtimetest/src/tests.rs

+    }
+
+    let policy = memory_policy.as_ref().unwrap();
+    if policy.mode() != expected {


It’s unnecessary because it always matches.

let expected_mode = memory_policy.as_ref().map(|p| p.mode()); let expected = expected_mode.unwrap(); let policy = memory_policy.as_ref().unwrap(); policy.mode()

n4mlz · 2025-09-29T15:01:27Z

Thank you for your always thorough reviews. The corrections have been completed. Please check them.

saku3 · 2025-09-30T05:09:43Z

Please wait a little while until oci-spec-rs is released🙏

saku3 · 2025-10-07T09:22:36Z

Please resolve the conflict.

Signed-off-by: n4mlz <[email protected]>

…t cases Signed-off-by: n4mlz <[email protected]>

Signed-off-by: n4mlz <[email protected]>

…r handling Signed-off-by: n4mlz <[email protected]>

Signed-off-by: n4mlz <[email protected]>

n4mlz · 2025-10-17T05:26:29Z

@saku3
I apologize for the delay🙏
When I updated the version of runc used in integration tests to one that supports Linux memory policy (v1.4.0-rc.2), it seems there was a change in how cpu.weight is calculated in cgroup v2, causing the CI to fail (cf: https://github.com/n4mlz/youki/actions/runs/18583292135/job/52982229747?pr=3#step:9:64).
I have created a PR (#3274) to update to the first version where this change occurred (v1.3.2), so I would appreciate it if you could merge this into main first.

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

n4mlz · 2025-11-01T17:52:06Z

@saku3
Thank you for waiting. I have updated the version of runc for CI, confirmed that the CI completes successfully in my forked repository, and am ready to run the tests. I would appreciate it if you could check it.

saku3 · 2025-11-02T04:51:43Z

Sorry for the delay.
I’ll review it in the coming days.

…et-mempolicy

saku3

@n4mlz
Sorry for the delay—
it’s a small thing, but I’ve left some comments.
Please take a look 🙏

saku3 · 2025-11-08T01:57:51Z

tests/contest/runtimetest/src/tests.rs

+            if has_static_flag && !policy_field.contains("static") {
+                eprintln!("expected bind static, but found: {}", policy_field);
+            }
+            if expected_nodes.as_ref().is_some_and(|v| v.contains(&0))


I’d like it to support checking values other than 0, too.

example image

const EXPECT_NODE = 1; if expected_nodes.as_ref().is_some_and(|v| v.contains(EXPECT_NODE)) && !numa_maps_expected_node(policy_field) // check for EXPECT_NODE

I know we’ll specify only 0 on GitHub Actions, but I want to be able to specify 1,0-2, (or etc.) locally and still run valid tests.
Since you’ve also implemented functions parse_node_string, I believe it’s feasible—what do you think?

Thanks for the suggestion.
Supporting nodes other than 0 makes sense. However, I realized that the current test logic itself wasn’t actually correct; even the original implementation didn’t properly handle the intended semantics when MPOL_F_RELATIVE_NODES is used.

When this flag is set, the nodemask is interpreted relative to the process’s cpuset, not as absolute physical node IDs.
So a nodemask containing 0 doesn’t necessarily mean physical node 0 — it could mean “the first node in Mems_allowed_list”.
In that case, /proc/self/numa_maps will legitimately show a different node number, so our current assumption is not correct.

Reference:

set_mempolicy(2) — MPOL_F_RELATIVE_NODES / MPOL_F_STATIC_NODES

Kernel docs: NUMA Memory Policy (flags section)

I’ll update the test so that it maps relative nodemasks to the actual physical nodes (using Mems_allowed_list from /proc/self/status) before comparing them with /proc/self/numa_maps.

saku3 · 2025-11-08T04:25:33Z

.github/workflows/integration_tests_validation.yaml

        run: sudo add-apt-repository -y ppa:criu/ppa
      - name: Install requirements
        run: sudo env PATH=$PATH just ci-prepare
-      - name: Install runc 1.3.2


I’m sorry for the trouble.
validate-contest-runc is intended to verify the validity of newly added integration tests by running them with runc.
Thinking ahead to other integration tests we might add, I believe it’s best to use the stable version here.
If the memory policy test fails, we can skip it.
Please refer to the following.

https://github.com/youki-dev/youki/blob/main/tests/contest/contest/src/tests/fd_control/mod.rs#L100

Signed-off-by: n4mlz <[email protected]>

n4mlz · 2025-11-12T03:33:17Z

tests/contest/runtimetest/src/tests.rs

 }

-fn expected_nodes_from_spec(spec: &Spec) -> Option<Vec<u32>> {
+fn mems_allowed_list() -> Option<Vec<u32>> {


I’ve added logic to handle the MPOL_F_RELATIVE_NODES case by reading Mems_allowed_list from /proc/self/status and comparing the translated physical nodes against /proc/self/numa_maps.
This makes the test technically correct, but the logic became quite large and complex.
If you think this level of validation is too heavy for an integration test, I can remove or trim it down. What do you think?

In an environment with two NUMA nodes, I confirmed that the tests pass as follows:

$ cat /sys/devices/system/node/online 0-1 $ uname -a Linux kernelhack 6.1.0-37-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 (2025-05-22) x86_64 GNU/Linux $ just test-contest memory_policy /home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c youki * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu Finished `release` profile [optimized] target(s) in 0.29s /home/n4mlz/youki/scripts/build.sh -o /home/n4mlz/youki -r -c contest * FEATURES: <default> * TARGET: x86_64-unknown-linux-gnu removed '/home/n4mlz/youki/contest' Finished `release` profile [optimized] target(s) in 0.28s removed '/home/n4mlz/youki/runtimetest' Finished `release` profile [optimized] target(s) in 0.27s sudo /home/n4mlz/youki/scripts/contest.sh /home/n4mlz/youki/youki memory_policy Validation successful for runtime /home/n4mlz/youki/youki $

Thank you for the thoughtful feedback.

When MPOL_F_RELATIVE_NODES or MPOL_F_STATIC_NODES is specified, I think it’s sufficient to verify that relative or static appears in /proc/self/numa_maps.

If we design tests to verify that, when multiple nodes are specified, the nodes are honored exactly as specified, it becomes quite complex—not only due to MPOL_F_RELATIVE_NODES but also features like MPOL_WEIGHTED_INTERLEAVE. I don’t think we need to account for that level of complexity.

For multiple nodes, I think we can stick to the simplest case. For example:

"mode": "MPOL_BIND", "nodes": "0-1"

If we can verify that this is reflected in /proc/self/numa_maps, that should be sufficient to ensure Youki’s handling of “multiple nodes passed to the syscall” works correctly for other modes and flags as well.

Rather than testing every (node (one-or-multiple) × mode × flag) combination, we’ll split coverage by dimension:

Provide test cases for multiple nodes.

Provide test cases for each mode.

Provide test cases for each flag.

This gives us confidence in the behavior without enumerating all cross-product combinations.
So as long as the test covers a simple case where multiple nodes work correctly, there’s no problem.

If we have enough capacity, it might be good to add the following test cases as well:

{ "mode": "MPOL_LOCAL" } { "mode": "MPOL_WEIGHTED_INTERLEAVE", "nodes": "0" } { "mode": "MPOL_PREFERRED_MANY", "nodes": "0" } { "mode": "MPOL_BIND", "nodes": "0", "flags": ["MPOL_F_NUMA_BALANCING"] }

Signed-off-by: n4mlz <[email protected]>

saku3

@n4mlz
Please resolve the conflicts.

Overall, I think it's mostly fine from my side.
Thank you for patiently working through all the changes.

saku3 · 2025-11-20T11:15:38Z

@utam0k
PTAL 🙏

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

utam0k

First review. I'll take a look another day 🙏

utam0k · 2025-11-22T11:06:32Z

crates/libcgroups/Cargo.toml

 [package]
 name = "libcgroups"
-version = "0.5.7" # MARK: Version
+version = "0.5.7"                                 # MARK: Version


I'm unsure about the reason for this change.

utam0k · 2025-11-22T11:25:48Z

crates/libcontainer/src/process/memory_policy.rs

+    const MPOL_DEFAULT: i32 = 0;
+    const MPOL_PREFERRED: i32 = 1;
+    const MPOL_BIND: i32 = 2;
+    const MPOL_INTERLEAVE: i32 = 3;
+    const MPOL_LOCAL: i32 = 4;
+    const MPOL_PREFERRED_MANY: i32 = 5;
+    const MPOL_WEIGHTED_INTERLEAVE: i32 = 6;
+
+    // Memory policy flag constants from Linux UAPI
+    const MPOL_F_NUMA_BALANCING: u32 = 1 << 13; // 0x2000
+    const MPOL_F_RELATIVE_NODES: u32 = 1 << 14; // 0x4000
+    const MPOL_F_STATIC_NODES: u32 = 1 << 15; // 0x8000


An enum is best suited for two scenarios.

utam0k · 2025-11-22T11:26:24Z

crates/libcontainer/src/process/memory_policy.rs

+    let base_mode = match policy.mode() {
+        MemoryPolicyModeType::MpolDefault => MPOL_DEFAULT,
+        MemoryPolicyModeType::MpolPreferred => MPOL_PREFERRED,
+        MemoryPolicyModeType::MpolBind => MPOL_BIND,
+        MemoryPolicyModeType::MpolInterleave => MPOL_INTERLEAVE,
+        MemoryPolicyModeType::MpolLocal => MPOL_LOCAL,
+        MemoryPolicyModeType::MpolPreferredMany => MPOL_PREFERRED_MANY,
+        MemoryPolicyModeType::MpolWeightedInterleave => MPOL_WEIGHTED_INTERLEAVE,
+    };


You can use the From trait.

utam0k · 2025-11-22T12:26:09Z

crates/libcontainer/src/process/memory_policy.rs

+    let mut has_static = false;
+    let mut has_relative = false;
+    if let Some(flags) = policy.flags() {
+        for flag in flags.iter() {


Suggested change

let mut has_static = false;

let mut has_relative = false;

if let Some(flags) = policy.flags() {

for flag in flags.iter() {

if let Some(flags) = policy.flags() {

let mut has_static = false;

let mut has_relative = false;

for flag in flags.iter() {

utam0k · 2025-11-22T12:35:49Z

crates/libcontainer/src/process/memory_policy.rs

+                MemoryPolicyFlagType::MpolFRelativeNodes => {
+                    has_relative = true;
+                }
+                MemoryPolicyFlagType::MpolFStaticNodes => {
+                    has_static = true;
+                }


Using flags.contains(...) makes this logic simpler.

Signed-off-by: n4mlz <[email protected]>

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

n4mlz changed the title ~~Implement set_mempolicy syscall and related error handling~~ Implement Linux memory policy Aug 27, 2025

saku3 reviewed Aug 30, 2025

View reviewed changes

n4mlz mentioned this pull request Aug 31, 2025

Update Linux memory policy field youki-dev/oci-spec-rs#292

Merged

n4mlz force-pushed the feat/set-mempolicy branch from 847601f to 5a2cc68 Compare September 3, 2025 09:11

n4mlz requested a review from saku3 September 3, 2025 10:28

saku3 reviewed Sep 6, 2025

View reviewed changes

utam0k added the kind/feature label Sep 7, 2025

n4mlz requested a review from saku3 September 7, 2025 12:43

saku3 reviewed Sep 14, 2025

View reviewed changes

n4mlz commented Sep 19, 2025

View reviewed changes

n4mlz requested a review from saku3 September 19, 2025 06:43

saku3 mentioned this pull request Sep 22, 2025

Next release youki-dev/oci-spec-rs#293

Closed

saku3 reviewed Sep 27, 2025

View reviewed changes

n4mlz requested a review from saku3 September 29, 2025 15:01

n4mlz added 10 commits October 16, 2025 17:26

Implement set_mempolicy syscall and related error handling

823c98d

Signed-off-by: n4mlz <[email protected]>

Update memory policy validation in setup_memory_policy function

2c3b9e0

Signed-off-by: n4mlz <[email protected]>

Add memory policy tests and integrate into test framework

14a114e

Signed-off-by: n4mlz <[email protected]>

Refactor memory policy tests to use ConditionalTest for NUMA-dependen…

80fa98b

…t cases Signed-off-by: n4mlz <[email protected]>

Add memory policy test for empty mode in tests

e40dceb

Signed-off-by: n4mlz <[email protected]>

Refactor memory policy mode usage

449b953

Signed-off-by: n4mlz <[email protected]>

Fix maxnode calculation in build_nodemask function

666a81e

Signed-off-by: n4mlz <[email protected]>

Split Linux memory policy processing into files

97255c1

Signed-off-by: n4mlz <[email protected]>

Refactor setup_memory_policy function to improve readability and erro…

d2d971f

…r handling Signed-off-by: n4mlz <[email protected]>

Refactor memory policy flag usage for improved clarity

ffc9bd7

Signed-off-by: n4mlz <[email protected]>

n4mlz added 5 commits October 16, 2025 18:29

Update runc installation to version 1.4.0-rc.2 for memory policy support

c971475

Signed-off-by: n4mlz <[email protected]>

Refactor numa_maps_indicates_node0 to improve node parsing logic

8c9bd9e

Signed-off-by: n4mlz <[email protected]>

Add empty flags to memory policy test cases

7449697

Signed-off-by: n4mlz <[email protected]>

Add empty nodes to memory policy test cases

f5f86eb

Signed-off-by: n4mlz <[email protected]>

Format import order

eb39c93

Signed-off-by: n4mlz <[email protected]>

n4mlz mentioned this pull request Oct 17, 2025

Update/runc 1.3.2 #3274

Merged

13 tasks

n4mlz force-pushed the feat/set-mempolicy branch from 068fc6b to eb39c93 Compare October 17, 2025 05:21

Merge branch 'main' of https://github.com/youki-dev/youki into feat/s…

0133c00

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

Merge branch 'main' of https://github.com/youki-dev/youki into feat/s…

ab4a91a

…et-mempolicy

saku3 reviewed Nov 8, 2025

View reviewed changes

n4mlz force-pushed the feat/set-mempolicy branch 2 times, most recently from 82b64ca to ab4a91a Compare November 11, 2025 09:16

n4mlz added 2 commits November 12, 2025 11:09

Update memory policy validation to enhance node set handling

92c61b0

Signed-off-by: n4mlz <[email protected]>

Lint fix

7c53fea

Signed-off-by: n4mlz <[email protected]>

n4mlz commented Nov 12, 2025

View reviewed changes

n4mlz added 2 commits November 12, 2025 12:40

Revert changes related to the runc version

d66d9a1

Signed-off-by: n4mlz <[email protected]>

Update memory policy tests to use ConditionalTest

8189200

Signed-off-by: n4mlz <[email protected]>

n4mlz requested a review from saku3 November 12, 2025 04:22

saku3 approved these changes Nov 20, 2025

View reviewed changes

n4mlz force-pushed the feat/set-mempolicy branch from 0225be5 to a7e0a1d Compare November 20, 2025 12:24

Merge branch 'main' of https://github.com/youki-dev/youki into feat/s…

a7e0a1d

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

utam0k reviewed Nov 22, 2025

View reviewed changes

n4mlz added 3 commits November 25, 2025 13:26

Fix formatting of Cargo.toml

6de003f

Signed-off-by: n4mlz <[email protected]>

Refactor memory policy for improved readability and maintainability

623ca95

Signed-off-by: n4mlz <[email protected]>

Merge branch 'main' of https://github.com/youki-dev/youki into feat/s…

088bfe6

…et-mempolicy Signed-off-by: n4mlz <[email protected]>

n4mlz requested a review from utam0k November 27, 2025 12:03

	use crate::process::memory_policy;
	use crate::process::{channel, memory_policy};

		@@ -0,0 +1,514 @@
		use oci_spec::runtime::MemoryPolicyFlagType;

	use oci_spec::runtime::MemoryPolicyFlagType;
	use oci_spec::runtime::{MemoryPolicyFlagType, MemoryPolicyModeType};

Implement Linux memory policy #3230

Are you sure you want to change the base?

Implement Linux memory policy #3230

Uh oh!

Conversation

n4mlz commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Related Issues

Additional Context

Uh oh!

saku3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

runc

numactl

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n4mlz commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saku3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n4mlz commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saku3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n4mlz commented Sep 29, 2025

Uh oh!

saku3 commented Sep 30, 2025

Uh oh!

n4mlz commented Aug 27, 2025 •

edited

Loading

n4mlz commented Oct 17, 2025 •

edited

Loading

saku3 left a comment •

edited

Loading