Snapshots
A snapshot is a frozen record of guest BPF map state and scheduler
globals captured at a specific point in a scenario. The freeze
coordinator pauses every vCPU long enough to walk the kernel’s BPF
maps, BTF-render every captured value, and bundle the result into a
FailureDumpReport keyed by a name you choose. Test code then reads
it back via the Snapshot accessor for typed traversal.
Op::snapshot("name") is the on-demand capture trigger. Use it to
ask “what does the scheduler look like right now?” at a precise
point in the scenario. For automatic capture on a kernel write to a
specific symbol, see Watch Snapshots. For
cadenced capture across the workload window without invoking
Op::snapshot from the scenario body, see
Periodic Capture — it produces a time-ordered
SampleSeries that flows into
the temporal-assertion patterns
(nondecreasing, rate_within, steady_within, converges_to,
always_true, ratio_within).
Issuing a snapshot
Op::snapshot(name) is a single op in a Step’s op list. The
executor invokes the active SnapshotBridge’s capture callback,
which performs the freeze rendezvous and returns the report; the
bridge stores the report under name.
use ktstr::prelude::*;
let steps = vec![Step {
setup: vec![CgroupDef::named("workers").workers(2)].into(),
ops: vec![
Op::snapshot("after_spawn"),
// ... other ops ...
Op::snapshot("after_workload"),
],
hold: HoldSpec::FULL,
}];
execute_steps(ctx, steps)?;
A scenario may issue any number of Op::snapshot ops with distinct
names. Reusing a name overwrites the prior capture (and emits a
tracing::warn!).
Wiring the bridge
The bridge is what turns an Op::snapshot into stored data. The host
typically wires it before execute_steps runs, but a scenario can
install one inline:
use ktstr::prelude::*;
let cb: CaptureCallback = std::sync::Arc::new(|_name: &str| {
// Production: freeze the VM and build a real FailureDumpReport.
// Tests: return a hand-crafted report so the executor + bridge
// pipeline runs without booting a guest.
Some(FailureDumpReport::default())
});
let bridge = SnapshotBridge::new(cb);
let bridge_handle = bridge.clone();
let _guard = bridge.set_thread_local();
execute_steps(ctx, steps)?;
let captured = bridge_handle.drain();
let report = captured.get("after_spawn").expect("snapshot recorded");
set_thread_local returns a BridgeGuard that restores the prior
bridge on drop, so a nested scenario inside an outer one cannot leak
its bridge into the outer scope. Bind the guard to an
underscore-prefixed identifier such as _guard so the binding lives
for the scope of the scenario — a bare let _ = bridge.set_thread_local()
drops the guard immediately and clears the bridge before any op runs.
must_use will warn if the return value is discarded entirely.
If no bridge is installed, Op::snapshot is a no-op with a
tracing::warn! and the scenario continues. If the capture callback
returns None (capture pipeline unavailable), the bridge stays empty
and the scenario continues. Existing scenarios that never declare
snapshot ops keep working unchanged.
Reading the captured report
Snapshot::new(report) builds a borrowed view over a
FailureDumpReport. The view does not copy the report; accessor
methods walk the report in place and return further borrowed views.
Map-name lookup
let snap = Snapshot::new(report);
let map = snap.map("scx_per_task")?; // SnapshotMap
Snapshot::map(name) returns Result<SnapshotMap, SnapshotError>. A
miss yields SnapshotError::MapNotFound { requested, available } —
the available list enumerates every captured map name so a typo
surfaces in test output.
Top-level globals (.bss / .data / .rodata)
let nr_cpus = snap.var("nr_cpus_onln").as_u64()?;
Snapshot::var(name) walks every *.bss, *.data, and *.rodata
global-section map for a top-level member named name and returns
the unique match as a SnapshotField.
Multiple matches yield
SnapshotError::AmbiguousVar { requested, found_in } —
disambiguate via Snapshot::map(name). A miss yields
SnapshotError::VarNotFound { requested, available } with the
union of every section’s top-level member names.
Entries inside a map
let map = snap.map("scx_per_task")?;
let first = map.at(0); // by ordinal index
let busy = map.find(|e| e.get("tid").as_i64().unwrap_or(-1) == 1234);
let busiest = map.max_by(|e| e.get("runtime_ns").as_u64().unwrap_or(0));
let all_active = map.filter(|e| e.get("runtime_ns").as_u64().unwrap_or(0) > 0);
SnapshotMap exposes:
at(n)— entry at ordinal indexn. Out of range returnsSnapshotEntry::Missing(SnapshotError::IndexOutOfRange).find(predicate)— first matching entry. No match returnsSnapshotEntry::Missing(SnapshotError::NoMatch { op: "find", ... }).filter(predicate)— every matching entry collected into aVec.max_by(key_fn)— entry whosekey_fnproduces the maximumu64. Empty map returnsMissingwithop: "max_by".
Per-CPU maps
BPF_MAP_TYPE_PERCPU_ARRAY / _PERCPU_HASH / _LRU_PERCPU_HASH maps
require narrowing to a CPU before reading individual values:
let map = snap.map("scx_pcpu")?;
let entry = map.cpu(1).at(0); // CPU 1's slot
let value = entry.get("").as_u64()?; // empty path = root
SnapshotMap::cpu(n) narrows subsequent at / find calls to a
specific CPU’s slot. An out-of-range CPU returns Missing with
SnapshotError::PerCpuSlot { unmapped: false, len, ... }; an
unmapped slot (None in the per-CPU vec) returns the same error
variant with unmapped: true.
Calling entry.get(path) on a per-CPU entry without narrowing
first surfaces SnapshotError::PerCpuNotNarrowed { map } — call
.cpu(N) first.
Field accessors and dotted paths
SnapshotEntry::get(path) and SnapshotField::get(path) walk the
entry’s value side along a dotted path. Each component matches a
struct member; pointer dereferences are followed transparently.
let weight = entry.get("ctx.weight").as_u64()?;
let policy = entry.get("ctx.policy").as_str()?; // enum variant name
let pid = entry.get("leader.pid").as_i64()?; // pointer chase
The dotted-path walker:
-
Pointer chase. When a path step lands on
RenderedValue::Ptr { deref: Some(...) }, the walker transparently follows the dereference (up to 16 hops) before matching the next component. The test author writes the path the BTF would suggest; pointer indirection is invisible. -
Empty path.
get("")returns the current value as aSnapshotField::Value— useful for terminal accessors on per-CPU slots that hold a scalar directly. -
Composability. Two-segment paths are equivalent to chained
getcalls:entry.get("ctx.weight")≡entry.get("ctx").get("weight").Note that
Snapshot::vardoes not split — it treats the full string as one global name. To walk into a struct, usesnap.var("ctx").get("weight").
Terminal accessors
SnapshotField exposes typed terminal reads, all returning
Result<T, SnapshotError>:
| Method | Returns | Accepts |
|---|---|---|
as_u64() | u64 | Uint, non-negative Int/Enum, Bool (0/1), Char (raw byte), Ptr (pointer value, including cast-recovered pointers — see Cast-recovered pointers), per-CPU array key |
as_i64() | i64 | Int, Uint ≤ i64::MAX, Bool, Char, Enum, per-CPU array key |
as_bool() | bool | Bool direct; Int/Uint/Char/Enum/Ptr non-zero is true; per-CPU array key |
as_f64() | f64 | Float, Int, Uint, Enum, per-CPU array key |
as_str() | &str | Enum with a resolved variant name |
rendered() | Option<&RenderedValue> | the underlying value when present |
Type mismatches surface as SnapshotError::TypeMismatch { expected, actual, requested } — for example, as_str() on a Uint reports
expected: "Enum", actual: "Uint".
Cast-recovered pointers
Schedulers stash kernel pointers (task_struct *, cgroup *, …)
and arena pointers in BPF map fields whose BTF declares them as
u64 because BTF cannot express a pointer to a per-allocation
type. The host-side
cast analyzer walks the
scheduler’s .bpf.o instruction stream during load, recovers the
target struct for each provable (source_struct, field_offset) → target_struct mapping, and feeds the result into the renderer.
When the renderer encounters a u64 slot the analyzer flagged, it
emits a RenderedValue::Ptr
with cast_annotation set and chases the dereference through the
address-space-appropriate reader. The full set of
cast_annotation values:
| Annotation | Meaning |
|---|---|
"cast→arena" | Cast analyzer flagged a u64 field; chase resolved to an arena allocation via the BTF-typed pointee. |
"cast→kernel" | Cast analyzer flagged a u64 field; chase resolved to a kernel slab / vmalloc / per-cpu allocation. |
"sdt_alloc" | BTF-typed Type::Ptr whose pointee was a BTF_KIND_FWD; the renderer recovered the real payload struct id via the sdt_alloc bridge. No cast-analyzer hit was involved. |
"cast→arena (sdt_alloc)" | Cast analyzer flagged a u64 field AND the chase target peeled to a Fwd; the bridge recovered the real arena payload struct id. |
"cast→kernel (sdt_alloc)" | Cast analyzer flagged a u64 field AND the chase target peeled to a Fwd; the bridge recovered the real kernel-side struct id. |
A parallel cross-BTF Fwd resolution path is consulted whenever a
chase target survives the local same-BTF Fwd resolve as a
BTF_KIND_FWD: when the body lives in a sibling embedded BPF
object’s BTF (the multi-.bpf.objs shape), the renderer switches
the recursion to that sibling BTF and renders the full body.
Cross-BTF resolution does NOT add a new annotation — the body is
recovered transparently and the rendered subtree carries whichever
annotation ("cast→arena", "cast→kernel", or None for a
BTF-typed Type::Ptr) it would have had if the same struct lived
in the entry BTF.
From the test author’s perspective:
as_u64()returns the raw pointer value (matching pre-analysis behavior, so existing tests do not need updating).entry.get("ctx.task")and similar dotted-path walks transparently follow the cast-recovered chase; nested struct fields appear under the same path the BTF would suggest for a natively-typed pointer.- The
cast_annotationis visible in failure-dump rendering and diagnostic output so an operator can distinguish cast-recovered pointers from BTF-typed ones; the test API does not require any extra calls to consume them.
Error handling
SnapshotError is the unified error type for every fallible
accessor. Each variant carries the path or available alternatives
needed to fix the call site without re-running the test:
MapNotFound { requested, available }—Snapshot::map(name)miss.VarNotFound { requested, available }—Snapshot::var(name)miss.AmbiguousVar { requested, found_in }— more than one*.bss/*.data/*.rodatamap exposes a top-level member with the requested name.found_inlists every map (in capture order) where the name was seen; disambiguate viaSnapshot::map(name)+.at(0).get(...)against a specific map.FieldNotFound { requested, walked, component, available }— a path component did not match any struct member at that depth.walkedis the prefix that resolved successfully;componentis the failing segment;requestedis the original user-supplied path.NotAStruct { requested, walked, component, kind }— a path component reached a non-struct value where a struct was expected (e.g. descending into aUintleaf).kindnames the actual variant.TypeMismatch { expected, actual, requested }— terminal accessor called on a rendered shape it cannot decode.expectednames the scalar type the accessor requires;actualnames the rendered variant;requestedis the user-supplied lookup string (empty when the accessor was invoked on a leaf without a path walk).IndexOutOfRange { map, index, len }—SnapshotMap::at(n)past the entry list end.PerCpuSlot { map, cpu, len, unmapped }— out-of-range or unmapped per-CPU slot;unmapped: truedistinguishes aNoneslot from an out-of-range CPU.NoMatch { map, op }— predicate-based lookup (find,max_by) found no match.opnames the operation.EmptyPathComponent { requested }— a path string contained an empty component (e.g."a..b").PerCpuNotNarrowed { map }—entry.getcalled on a per-CPU entry withoutcpu(N)first.NoRendered { map, side }— entry has no rendered key/value side (BTF type id missing at capture time, leaving hex bytes only).PlaceholderSample { tag, reason }— a periodic-capture sample’s underlyingFailureDumpReportis a placeholder produced by the freeze-rendezvous timeout fallback. Surfaces when projecting viaSampleSeries::bpf; temporal patterns route the variant through their skip path so a placeholder never falsely registers as zero progress against a monotonicity / rate / steady / ratio band.reasoncarries the rendezvous-timeout cause text.MissingStats { tag }— aSampleSeries::statsprojection ran on a sample whosestatsslot isNone(stats client not wired or per-sample stats request failed). Distinct from in-JSON path misses (FieldNotFound/TypeMismatch) so the assertion site can branch on the cause without re-walking the source.
SnapshotError implements std::error::Error and Display, so it
composes with ? and anyhow. The Display impl includes the path
and any available alternatives so a failure message points the test
author at the fix.
Worked example
Capture a snapshot, look up a map, walk into its first entry, and read a nested field:
use ktstr::prelude::*;
fn snapshot_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
// Wire a bridge for the duration of the scenario.
let cb: CaptureCallback = std::sync::Arc::new(|_name| {
// Production: freeze + build a real FailureDumpReport. The
// host installs this callback in real runs.
Some(FailureDumpReport::default())
});
let bridge = SnapshotBridge::new(cb);
let handle = bridge.clone();
let _guard = bridge.set_thread_local();
// Run the scenario, capturing once after spawn.
let steps = vec![Step {
setup: vec![CgroupDef::named("workers").workers(2)].into(),
ops: vec![Op::snapshot("after_spawn")],
hold: HoldSpec::FULL,
}];
let mut result = execute_steps(ctx, steps)?;
// Drain the bridge and inspect the captured report.
let captured = handle.drain();
let report = captured
.get("after_spawn")
.ok_or_else(|| anyhow::anyhow!("snapshot 'after_spawn' missing"))?;
let snap = Snapshot::new(report);
// Top-level scalar.
if let Ok(nr_cpus) = snap.var("nr_cpus_onln").as_u64() {
result.details.push(AssertDetail::new(
DetailKind::Other,
format!("captured nr_cpus_onln = {nr_cpus}"),
));
}
Ok(result)
}
For the executor + bridge wiring outside a VM, see the host-side
smoke tests in tests/snapshot_e2e.rs — they exercise the same
pipeline against a hand-crafted FailureDumpReport so the assertion
shape is covered without booting a guest.
Composing reads with writes
Snapshots are the read half of the host↔guest interaction. The
write half — pre-seeding a BPF map value before the scenario
starts — is the #[ktstr_test] attribute bpf_map_write = CONST,
which targets a BpfMapWrite constant:
use ktstr::prelude::*;
const TRIGGER_FAULT: BpfMapWrite = BpfMapWrite {
map_name_suffix: ".bss", // matched against discovered maps
offset: 42, // byte offset within the map's value
value: 1, // u32 written by the host
};
#[ktstr_test(bpf_map_write = TRIGGER_FAULT, expect_err = true)]
fn fault_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
// The host has already written `1` at `.bss + 42` before
// the scenario started. Capture and inspect the resulting
// scheduler state mid-run.
/* bridge wiring + Op::snapshot + Snapshot::new as above */
Ok(AssertResult::pass())
}
The write is event-driven: the host polls for BPF map
discoverability (scheduler loaded), polls the SHM ring for
scenario start, then writes the configured u32 at the configured
offset. Only BPF_MAP_TYPE_ARRAY maps are supported; the framework
finds the map by map_name_suffix (e.g. ".bss") via
BpfMapAccessor::find_map. See Monitor → BPF map writes
for the prerequisites (vmlinux) and the full host-side
contract.
Read+write workflows then compose naturally: the test pre-seeds
guest state with bpf_map_write, lets the scheduler run, and
asserts on the resulting state with Op::snapshot + the
Snapshot accessor:
- Write (pre-scenario) —
bpf_map_writeflips a.bssflag the scheduler reads. - Run — the scenario’s ops drive workload behavior; the scheduler reacts to the flag.
- Read (mid-scenario) —
Op::snapshot("after")captures the scheduler state at the chosen point. - Assert —
Snapshot::var(...).as_u64()/Snapshot::map(...).find(...).get(...).as_*()verifies the reaction. Errors carry the available alternatives so a typo or stale field name surfaces before the test author hand-edits the case.
The write side is a single one-shot poke at scheduler-load time;
there is no Op variant for runtime writes. Ergonomic mid-scenario
state mutation is reserved for cases where the scheduler itself
exports a writable interface (sysfs, debugfs, BPF map command
interface) and the test invokes that interface from a workload
process.