MemPolicy
MemPolicy controls NUMA memory placement for worker processes. It
wraps set_mempolicy(2) and is applied after fork, before the work
loop starts.
pub enum MemPolicy {
Default,
Bind(BTreeSet<usize>),
Preferred(usize),
Interleave(BTreeSet<usize>),
Local,
PreferredMany(BTreeSet<usize>),
WeightedInterleave(BTreeSet<usize>),
}
Variants
Default – inherit the parent process’s memory policy. No
set_mempolicy syscall is made.
Bind(nodes) – allocate only from the specified NUMA nodes
(MPOL_BIND). Allocation fails with ENOMEM if all specified nodes
are exhausted.
Preferred(node) – prefer allocations from the specified node,
falling back to others when the preferred node is full
(MPOL_PREFERRED).
Interleave(nodes) – interleave allocations round-robin across
the specified nodes (MPOL_INTERLEAVE).
Local – prefer the nearest node to the CPU where the allocation
occurs (MPOL_LOCAL). No nodemask.
PreferredMany(nodes) – prefer allocations from any of the
specified nodes, falling back to others when all preferred nodes are
full (MPOL_PREFERRED_MANY, kernel 5.15+).
WeightedInterleave(nodes) – weighted interleave across the
specified nodes. Page distribution is proportional to per-node weights
set via /sys/kernel/mm/mempolicy/weighted_interleave/nodeN
(MPOL_WEIGHTED_INTERLEAVE, kernel 6.9+).
Convenience constructors
MemPolicy::bind([0, 1])
MemPolicy::preferred(0)
MemPolicy::interleave([0, 1])
MemPolicy::preferred_many([0, 1])
MemPolicy::weighted_interleave([0, 1])
Node-set constructors (bind, interleave, preferred_many,
weighted_interleave) accept any IntoIterator<Item = usize> –
arrays, ranges, Vec, BTreeSet. preferred takes a single
usize node ID.
MpolFlags
MpolFlags provides optional mode flags OR’d into the
set_mempolicy(2) mode argument:
| Flag | Value | Description |
|---|---|---|
NONE | 0 | No flags |
STATIC_NODES | 1 << 15 | Nodemask is absolute, not remapped when the task’s cpuset changes |
RELATIVE_NODES | 1 << 14 | Nodemask is relative to the task’s current cpuset |
NUMA_BALANCING | 1 << 13 | Enable NUMA balancing optimization for this policy |
Flags combine with | or MpolFlags::union():
let flags = MpolFlags::STATIC_NODES | MpolFlags::NUMA_BALANCING;
Usage in WorkSpec and CgroupDef
WorkSpec and CgroupDef both expose .mem_policy() and
.mpol_flags() builder methods:
use ktstr::prelude::*;
let w = WorkSpec::default()
.workers(4)
.mem_policy(MemPolicy::bind([0]))
.mpol_flags(MpolFlags::STATIC_NODES);
let def = CgroupDef::named("cg_0")
.with_cpuset(CpusetSpec::numa(0))
.workers(4)
.mem_policy(MemPolicy::bind([0]));
Cpuset validation
When a cgroup has a cpuset, ktstr validates that the MemPolicy’s
node set is covered by the NUMA nodes reachable from that cpuset.
A MemPolicy::Bind([1]) on a cgroup whose cpuset covers only NUMA
node 0 will fail with an error at setup time.
Policies without a node set (Default, Local) skip validation.
node_set()
MemPolicy::node_set() returns the NUMA node IDs referenced by the
policy. Returns the node set for Bind, Interleave,
PreferredMany, and WeightedInterleave; a single-element set for
Preferred; and an empty set for Default/Local.
NUMA checking
Page locality and migration results from workers using MemPolicy are
checked by the NUMA checking
assertions. The expected node set for
locality checks is derived from the worker’s MemPolicy at evaluation
time.
Example: NUMA-aware test
A complete test that checks page locality across two NUMA nodes:
use ktstr::prelude::*;
#[ktstr_test(
numa_nodes = 2, llcs = 4, cores = 4, threads = 1,
min_numa_nodes = 2,
min_page_locality = 0.8,
)]
fn numa_locality(ctx: &Ctx) -> Result<AssertResult> {
execute_defs(ctx, vec![
CgroupDef::named("node0")
.with_cpuset(CpusetSpec::numa(0))
.workers(4)
.mem_policy(MemPolicy::bind([0])),
CgroupDef::named("node1")
.with_cpuset(CpusetSpec::numa(1))
.workers(4)
.mem_policy(MemPolicy::bind([1])),
])
}
Each cgroup’s workers are pinned to a single NUMA node’s CPUs via
CpusetSpec::numa() and their memory allocations are bound to the
same node via MemPolicy::bind(). The min_page_locality threshold
fails the test if less than 80% of pages land on the expected node.