Ops and Steps
The ops system is a composable way to express dynamic cgroup topology
changes. It replaces hand-written Action::Custom functions for most
dynamic scenarios.
Op
An Op is an atomic operation on the cgroup topology. The enum is
#[non_exhaustive], so external pattern matches must end with .. to
stay compatible across ktstr version bumps that add new variants:
| Op | Description |
|---|---|
AddCgroup | Create a cgroup |
RemoveCgroup | Stop workers and remove a cgroup |
SetCpuset | Set a cgroup’s cpuset via CpusetSpec |
ClearCpuset | Remove cpuset constraints |
SwapCpusets | Swap cpusets between two cgroups |
Spawn | Fork workers into a cgroup |
StopCgroup | Stop a cgroup’s workers |
SetAffinity | Set worker affinity via AffinityIntent |
SpawnHost | Spawn workers in the parent cgroup |
MoveAllTasks | Move all tasks from one cgroup to another |
RunPayload | Spawn a binary-kind Payload in the background and track its PayloadHandle under the step’s payload set. Subsequent WaitPayload / KillPayload address it by (payload.name, cgroup). Scheduler-kind payloads are rejected at apply time. |
WaitPayload | Block until the named payload exits naturally, evaluate its checks, and record metrics to the per-test sidecar. Target lookup is by (name, cgroup) composite key; cgroup: None resolves to the unique live copy. No timeout — pair with a bounded HoldSpec or the payload’s own --runtime for time-boxed runs. |
KillPayload | SIGKILL the named payload, reap the child, evaluate checks, and record metrics. Same (name, cgroup) lookup rules as WaitPayload. Mirrors step-teardown drain for an explicitly-targeted payload. |
FreezeCgroup | Freeze every task in the named cgroup via cgroup.freeze (kernel-side asynchronous freeze; not a SIGSTOP). Idempotent for already-frozen cgroups. Pair with UnfreezeCgroup to release; teardown auto-unfreezes. See Snapshots for the observer-cgroup deadlock warning. |
UnfreezeCgroup | Unfreeze every task in the named cgroup via cgroup.freeze. Inverse of FreezeCgroup. Idempotent. |
Snapshot | Capture a host-side diagnostic snapshot under name via the freeze coordinator: pauses every vCPU, reads BPF map state, vCPU registers, and per-CPU counters into a FailureDumpReport, then resumes. The report is keyed by name on the active SnapshotBridge. No active bridge is a no-op with tracing::warn!. See Snapshots. |
WatchSnapshot | Capture a snapshot whenever the guest writes to the named kernel symbol; one fire = one capture tagged with the symbol path. Symbol resolution at op execution time looks the name up by verbatim vmlinux ELF symbol-table match — the requested name must appear in the guest kernel’s static symbol table exactly as written (no path expansion, no BTF descent). Maximum 3 watch ops per scenario (3 hardware watchpoint slots; 1 slot reserved for the error-class exit_kind trigger). See Watch Snapshots. |
Op constructors accept string literals directly (no .into() needed):
Op::add_cgroup("cg_0")
Op::set_cpuset("cg_0", CpusetSpec::disjoint(0, 2))
Op::stop_cgroup("cg_0")
Op::spawn("cg_0", WorkSpec::default().workers(4))
Op::set_affinity("cg_0", AffinityIntent::RandomSubset)
Op::spawn_host(WorkSpec::default().workers(4))
Op::freeze_cgroup("cg_0")
Op::unfreeze_cgroup("cg_0")
Op::snapshot("after_spawn")
Op::watch_snapshot("jiffies_64")
SpawnHost creates workers in the parent cgroup, not in a managed
cgroup. Use this to simulate host-level CPU contention alongside
managed cgroups.
OpKind
OpKind is a payload-free discriminant enum generated from Op via
#[strum_discriminants]. It carries the same variant set as Op
(AddCgroup, RemoveCgroup, …, RunPayload, WaitPayload,
KillPayload, FreezeCgroup, UnfreezeCgroup, Snapshot,
WatchSnapshot) with none of the inner fields, so it is cheap to
copy and use as a map key. Framework code uses OpKind when it
only cares WHICH operation ran (per-op statistics, stimulus-event
tagging, verifier/monitor bookkeeping) without the payload. Test
authors rarely spell OpKind directly — the strum::EnumIter
derive also lets tooling enumerate every OpKind variant for
coverage checks.
OpKind shares Op’s #[non_exhaustive] attribute: external
pattern matches over OpKind must end with ...
CpusetSpec
CpusetSpec computes a cpuset from the topology at runtime. The enum
is #[non_exhaustive], so external callers should construct via the
associated constructor functions (see the list below this snippet)
rather than naming variant literals — a future field addition (e.g.
a stride on Range) can land behind a defaulted parameter without
breaking call sites. Pattern matches over CpusetSpec must also end
with ..:
pub enum CpusetSpec {
Llc(usize), // All CPUs in an LLC
Numa(usize), // All CPUs in a NUMA node
Range { start_frac: f64, end_frac: f64 }, // Fraction of usable CPUs
Disjoint { index: usize, of: usize }, // Equal disjoint partitions
Overlap { index: usize, of: usize, frac: f64 }, // Overlapping partitions
Exact(BTreeSet<usize>), // Exact CPU set
}
Convenience constructors accept parameters directly:
CpusetSpec::disjoint(0, 2), CpusetSpec::range(0.0, 0.5),
CpusetSpec::exact([0, 1, 2]), CpusetSpec::llc(0),
CpusetSpec::numa(0), CpusetSpec::overlap(0, 2, 0.5).
All fractional specs operate on
usable_cpus().
CgroupDef
CgroupDef bundles three ops that always go together: create cgroup,
set cpuset, spawn workers. It is the primary way to define cgroups in
ops-based scenarios.
let def = CgroupDef::named("cg_0")
.with_cpuset(CpusetSpec::disjoint(0, 2))
.workers(4)
.work_type(WorkType::SpinWait);
Builder methods
.with_cpuset(CpusetSpec)– set the cpuset (CPU set the cgroup is pinned to)..with_cpuset_mems(BTreeSet<usize>)– explicitcpuset.memsoverride (default derives from the resolved cpuset’s NUMA nodes)..workers(n)– set worker count..work_type(WorkType)– set work type (default:SpinWait)..sched_policy(SchedPolicy)– set Linux scheduling policy (default:Normal). See WorkSpec Types..work(WorkSpec)– add a work group (multiple calls for concurrent groups)..workload(&'static Payload)– attach a binary workload payload to run alongside the worker group; the framework launches it as a child process inside the cgroup. Panics when called with a scheduler-kindPayload(PayloadKind::Scheduler(_)); the scheduler slot is#[ktstr_test(scheduler = ...)]at the test level, not the cgroup-levelworkloadslot. Step-levelOp::RunPayloadrejects scheduler-kind payloads with ananyhow::Errorinstead of panicking; the build-timeworkloadcall panics because there is no scenario-level recovery path..affinity(AffinityIntent)– set per-worker affinity (default:Inherit)..mem_policy(MemPolicy)– set NUMA memory placement policy (default:Default). See MemPolicy..mpol_flags(MpolFlags)– set mode flags forset_mempolicy(2)(default:NONE). See MemPolicy..nice(n)– cgroup-level default per-worker nice value, merged into every WorkSpec whose ownniceis unset. See Tutorial: Step 11..comm(name)– cgroup-level default per-workertask->commviaprctl(PR_SET_NAME). Merged into every WorkSpec whose owncommis unset..pcomm(name)– thread-group-leadertask->commfor the fork-then-thread spawn path (workers run as threads under one forked leader). Stamps every existing WorkSpec in-place; not order-independent with.work(...)..uid(uid)/.gid(gid)– cgroup-level default per-worker effective UID / GID viasetresuid/setresgid. Merged into every WorkSpec whose ownuid/gidis unset..numa_node(node)– cgroup-level default NUMA-node affinity for every WorkSpec. Merged at apply-setup time..swappable(bool)– opt into gauntlet work type override.
Cgroup controllers
The cgroup-v2 cpu / memory / io / pids controllers are exposed as typed setters (default: unconstrained):
.cpu_quota_pct(pct)/.cpu_quota(quota, period)/.cpu_unlimited()– writecpu.max(pctis shorthand:100= one full CPU).cpu_unlimitedresets to the kernel default..cpu_weight(weight)– writecpu.weight(1..=10000, default100)..memory_max(bytes)/.memory_high(bytes)/.memory_low(bytes)/.memory_unlimited()– writememory.max/memory.high/memory.low.memory_unlimitedresetsmemory.maxtomax..memory_swap_max(bytes)/.memory_swap_unlimited()– writememory.swap.max..io_weight(weight)– writeio.weight(1..=10000, default100)..pids_max(n)/.pids_unlimited()– writepids.max.
MemPolicy-cpuset validation
When a cgroup has a cpuset, ktstr validates that the MemPolicy’s
node set is covered by the NUMA nodes reachable from that cpuset. A
MemPolicy::Bind([1]) on a cgroup whose cpuset covers only NUMA
node 0 fails at setup time. Policies without a node set (Default,
Local) skip validation.
WorkSpec type overrides and swappable
CgroupDef has a swappable flag (default: false). When true
and a work type override is active (Ctx.work_type_override), the
override replaces this def’s work type.
In contrast, the Scenario-level override (in run_scenario()) only
replaces SpinWait work types. The two mechanisms serve different
scopes:
- Scenario-level: replaces
SpinWaitinWorkSpec.work_type - CgroupDef-level: replaces the work type when
swappable = true
Both skip overrides to grouped work types when num_workers is not
divisible by the work type’s group size.
WorkSpec type overrides apply only to CgroupDef setup, not to raw
Op::Spawn. Op::Spawn always uses the work type as given. Use
CgroupDef with .swappable(true) when the work type should
participate in gauntlet overrides.
Step
A Step is a sequence of ops with a hold period:
pub struct Step {
pub setup: Setup, // CgroupDefs to create after ops
pub ops: Vec<Op>, // Operations to apply
pub hold: HoldSpec, // How long to wait after
}
Setup is either Defs(Vec<CgroupDef>) or Factory(fn(&Ctx) -> Vec<CgroupDef>).
Vec<CgroupDef> implements Into<Setup>, so you can write
setup: vec![...].into() instead of setup: Setup::Defs(vec![...]).
Constructors
Step::new(ops, hold) – creates a step with ops only (no
CgroupDef setup). Use when the step only applies dynamic operations
to an existing topology.
Step::with_defs(defs, hold) – creates a step with CgroupDef
setup and a hold period. The primary constructor for steps that
create cgroups with workers.
Step::set_ops(self, ops) – REPLACES the ops on a step
(builder method). Chain after with_defs to add dynamic operations
to a step that also creates cgroups.
Naming asymmetry:
Step::set_opsREPLACES; the siblingBackdrop::with_opsAPPENDS. The two methods deliberately use different verbs to signal the different semantics. AStep::new(ops).set_ops(more)chain produces a step whose ops vec is exactlymore(the originalopsis dropped); aBackdrop::new().with_ops(ops_a).with_ops(ops_b)chain produces a backdrop whose ops vec isops_a + ops_b. If you need to extend a step’s ops vec, build the combinedVec<Op>at the call site and pass it toset_ops, or compose at theBackdroplayer instead.
HoldSpec
How long to hold after a step completes:
| Variant | Description |
|---|---|
Frac(f64) | Fraction of the total scenario duration |
Fixed(Duration) | Fixed time |
Loop { interval } | Repeat ops at interval until time runs out |
HoldSpec::FULL is a constant for Frac(1.0) (hold for the full
scenario duration).
execute_defs
execute_defs(ctx, defs) is a convenience wrapper for the common
pattern of creating cgroups and running them for the full duration:
execute_defs(ctx, vec![
CgroupDef::named("cg_0").workers(4),
CgroupDef::named("cg_1").workers(4),
])
Equivalent to execute_steps(ctx, vec![Step::with_defs(defs, HoldSpec::FULL)]).
execute_steps
execute_steps(ctx, steps) runs a step sequence:
- For each step: apply ops, then apply setup (create cgroups from
CgroupDefs), hold for the specified duration. Ops run first so parent cgroups can be created before children are spawned.Loopsteps reverse this: setup runs once before the loop, then ops repeat at the specified interval. - Check scheduler liveness between steps.
- After all steps: collect worker reports and run checks.
- Writes stimulus events to the SHM ring buffer for timeline analysis.
execute_steps_with
execute_steps_with(ctx, steps, assertions) is the same as
execute_steps but accepts an explicit
Assert for worker checks.
execute_steps is a convenience wrapper that passes None.
use ktstr::prelude::*;
fn my_scenario(ctx: &Ctx) -> Result<AssertResult> {
let assertions = Assert::NO_OVERRIDES
.check_not_starved()
.max_gap_ms(3000);
let steps = vec![/* ... */];
execute_steps_with(ctx, steps, Some(&assertions))
}
When assertions is Some, the provided Assert overrides ctx.assert
for worker checks. When None, uses ctx.assert (the merged
three-layer config: default_checks -> scheduler -> per-test).