Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Snapshots

A snapshot is a frozen record of guest BPF map state and scheduler globals captured at a specific point in a scenario. The freeze coordinator pauses every vCPU long enough to walk the kernel’s BPF maps, BTF-render every captured value, and bundle the result into a FailureDumpReport keyed by a name you choose. Test code then reads it back via the Snapshot accessor for typed traversal.

Op::snapshot("name") is the on-demand capture trigger. Use it to ask “what does the scheduler look like right now?” at a precise point in the scenario. For automatic capture on a kernel write to a specific symbol, see Watch Snapshots. For cadenced capture across the workload window without invoking Op::snapshot from the scenario body, see Periodic Capture — it produces a time-ordered SampleSeries that flows into the temporal-assertion patterns (nondecreasing, rate_within, steady_within, converges_to, always_true, ratio_within).

Issuing a snapshot

Op::snapshot(name) is a single op in a Step’s op list. The executor invokes the active SnapshotBridge’s capture callback, which performs the freeze rendezvous and returns the report; the bridge stores the report under name.

use ktstr::prelude::*;

let steps = vec![Step {
    setup: vec![CgroupDef::named("workers").workers(2)].into(),
    ops: vec![
        Op::snapshot("after_spawn"),
        // ... other ops ...
        Op::snapshot("after_workload"),
    ],
    hold: HoldSpec::FULL,
}];
execute_steps(ctx, steps)?;

A scenario may issue any number of Op::snapshot ops with distinct names. Reusing a name overwrites the prior capture (and emits a tracing::warn!).

Wiring the bridge

The bridge is what turns an Op::snapshot into stored data. The host typically wires it before execute_steps runs, but a scenario can install one inline:

use ktstr::prelude::*;

let cb: CaptureCallback = std::sync::Arc::new(|_name: &str| {
    // Production: freeze the VM and build a real FailureDumpReport.
    // Tests: return a hand-crafted report so the executor + bridge
    // pipeline runs without booting a guest.
    Some(FailureDumpReport::default())
});
let bridge = SnapshotBridge::new(cb);
let bridge_handle = bridge.clone();
let _guard = bridge.set_thread_local();

execute_steps(ctx, steps)?;

let captured = bridge_handle.drain();
let report = captured.get("after_spawn").expect("snapshot recorded");

set_thread_local returns a BridgeGuard that restores the prior bridge on drop, so a nested scenario inside an outer one cannot leak its bridge into the outer scope. Bind the guard to an underscore-prefixed identifier such as _guard so the binding lives for the scope of the scenario — a bare let _ = bridge.set_thread_local() drops the guard immediately and clears the bridge before any op runs. must_use will warn if the return value is discarded entirely.

If no bridge is installed, Op::snapshot is a no-op with a tracing::warn! and the scenario continues. If the capture callback returns None (capture pipeline unavailable), the bridge stays empty and the scenario continues. Existing scenarios that never declare snapshot ops keep working unchanged.

Reading the captured report

Snapshot::new(report) builds a borrowed view over a FailureDumpReport. The view does not copy the report; accessor methods walk the report in place and return further borrowed views.

Map-name lookup

let snap = Snapshot::new(report);
let map = snap.map("scx_per_task")?;        // SnapshotMap

Snapshot::map(name) returns Result<SnapshotMap, SnapshotError>. A miss yields SnapshotError::MapNotFound { requested, available } — the available list enumerates every captured map name so a typo surfaces in test output.

Top-level globals (.bss / .data / .rodata)

let nr_cpus = snap.var("nr_cpus_onln").as_u64()?;

Snapshot::var(name) walks every *.bss, *.data, and *.rodata global-section map for a top-level member named name and returns the unique match as a SnapshotField. Multiple matches yield SnapshotError::AmbiguousVar { requested, found_in } — disambiguate via Snapshot::map(name). A miss yields SnapshotError::VarNotFound { requested, available } with the union of every section’s top-level member names.

Entries inside a map

let map = snap.map("scx_per_task")?;
let first = map.at(0);                          // by ordinal index
let busy = map.find(|e| e.get("tid").as_i64().unwrap_or(-1) == 1234);
let busiest = map.max_by(|e| e.get("runtime_ns").as_u64().unwrap_or(0));
let all_active = map.filter(|e| e.get("runtime_ns").as_u64().unwrap_or(0) > 0);

SnapshotMap exposes:

  • at(n) — entry at ordinal index n. Out of range returns SnapshotEntry::Missing(SnapshotError::IndexOutOfRange).
  • find(predicate) — first matching entry. No match returns SnapshotEntry::Missing(SnapshotError::NoMatch { op: "find", ... }).
  • filter(predicate) — every matching entry collected into a Vec.
  • max_by(key_fn) — entry whose key_fn produces the maximum u64. Empty map returns Missing with op: "max_by".

Per-CPU maps

BPF_MAP_TYPE_PERCPU_ARRAY / _PERCPU_HASH / _LRU_PERCPU_HASH maps require narrowing to a CPU before reading individual values:

let map = snap.map("scx_pcpu")?;
let entry = map.cpu(1).at(0);                    // CPU 1's slot
let value = entry.get("").as_u64()?;             // empty path = root

SnapshotMap::cpu(n) narrows subsequent at / find calls to a specific CPU’s slot. An out-of-range CPU returns Missing with SnapshotError::PerCpuSlot { unmapped: false, len, ... }; an unmapped slot (None in the per-CPU vec) returns the same error variant with unmapped: true.

Calling entry.get(path) on a per-CPU entry without narrowing first surfaces SnapshotError::PerCpuNotNarrowed { map } — call .cpu(N) first.

Field accessors and dotted paths

SnapshotEntry::get(path) and SnapshotField::get(path) walk the entry’s value side along a dotted path. Each component matches a struct member; pointer dereferences are followed transparently.

let weight = entry.get("ctx.weight").as_u64()?;
let policy = entry.get("ctx.policy").as_str()?;     // enum variant name
let pid    = entry.get("leader.pid").as_i64()?;     // pointer chase

The dotted-path walker:

  1. Pointer chase. When a path step lands on RenderedValue::Ptr { deref: Some(...) }, the walker transparently follows the dereference (up to 16 hops) before matching the next component. The test author writes the path the BTF would suggest; pointer indirection is invisible.

  2. Empty path. get("") returns the current value as a SnapshotField::Value — useful for terminal accessors on per-CPU slots that hold a scalar directly.

  3. Composability. Two-segment paths are equivalent to chained get calls: entry.get("ctx.weight")entry.get("ctx").get("weight").

    Note that Snapshot::var does not split — it treats the full string as one global name. To walk into a struct, use snap.var("ctx").get("weight").

Terminal accessors

SnapshotField exposes typed terminal reads, all returning Result<T, SnapshotError>:

MethodReturnsAccepts
as_u64()u64Uint, non-negative Int/Enum, Bool (0/1), Char (raw byte), Ptr (pointer value, including cast-recovered pointers — see Cast-recovered pointers), per-CPU array key
as_i64()i64Int, Uint ≤ i64::MAX, Bool, Char, Enum, per-CPU array key
as_bool()boolBool direct; Int/Uint/Char/Enum/Ptr non-zero is true; per-CPU array key
as_f64()f64Float, Int, Uint, Enum, per-CPU array key
as_str()&strEnum with a resolved variant name
rendered()Option<&RenderedValue>the underlying value when present

Type mismatches surface as SnapshotError::TypeMismatch { expected, actual, requested } — for example, as_str() on a Uint reports expected: "Enum", actual: "Uint".

Cast-recovered pointers

Schedulers stash kernel pointers (task_struct *, cgroup *, …) and arena pointers in BPF map fields whose BTF declares them as u64 because BTF cannot express a pointer to a per-allocation type. The host-side cast analyzer walks the scheduler’s .bpf.o instruction stream during load, recovers the target struct for each provable (source_struct, field_offset) → target_struct mapping, and feeds the result into the renderer.

When the renderer encounters a u64 slot the analyzer flagged, it emits a RenderedValue::Ptr with cast_annotation set and chases the dereference through the address-space-appropriate reader. The full set of cast_annotation values:

AnnotationMeaning
"cast→arena"Cast analyzer flagged a u64 field; chase resolved to an arena allocation via the BTF-typed pointee.
"cast→kernel"Cast analyzer flagged a u64 field; chase resolved to a kernel slab / vmalloc / per-cpu allocation.
"sdt_alloc"BTF-typed Type::Ptr whose pointee was a BTF_KIND_FWD; the renderer recovered the real payload struct id via the sdt_alloc bridge. No cast-analyzer hit was involved.
"cast→arena (sdt_alloc)"Cast analyzer flagged a u64 field AND the chase target peeled to a Fwd; the bridge recovered the real arena payload struct id.
"cast→kernel (sdt_alloc)"Cast analyzer flagged a u64 field AND the chase target peeled to a Fwd; the bridge recovered the real kernel-side struct id.

A parallel cross-BTF Fwd resolution path is consulted whenever a chase target survives the local same-BTF Fwd resolve as a BTF_KIND_FWD: when the body lives in a sibling embedded BPF object’s BTF (the multi-.bpf.objs shape), the renderer switches the recursion to that sibling BTF and renders the full body. Cross-BTF resolution does NOT add a new annotation — the body is recovered transparently and the rendered subtree carries whichever annotation ("cast→arena", "cast→kernel", or None for a BTF-typed Type::Ptr) it would have had if the same struct lived in the entry BTF.

From the test author’s perspective:

  • as_u64() returns the raw pointer value (matching pre-analysis behavior, so existing tests do not need updating).
  • entry.get("ctx.task") and similar dotted-path walks transparently follow the cast-recovered chase; nested struct fields appear under the same path the BTF would suggest for a natively-typed pointer.
  • The cast_annotation is visible in failure-dump rendering and diagnostic output so an operator can distinguish cast-recovered pointers from BTF-typed ones; the test API does not require any extra calls to consume them.

Error handling

SnapshotError is the unified error type for every fallible accessor. Each variant carries the path or available alternatives needed to fix the call site without re-running the test:

  • MapNotFound { requested, available }Snapshot::map(name) miss.
  • VarNotFound { requested, available }Snapshot::var(name) miss.
  • AmbiguousVar { requested, found_in } — more than one *.bss/*.data/*.rodata map exposes a top-level member with the requested name. found_in lists every map (in capture order) where the name was seen; disambiguate via Snapshot::map(name) + .at(0).get(...) against a specific map.
  • FieldNotFound { requested, walked, component, available } — a path component did not match any struct member at that depth. walked is the prefix that resolved successfully; component is the failing segment; requested is the original user-supplied path.
  • NotAStruct { requested, walked, component, kind } — a path component reached a non-struct value where a struct was expected (e.g. descending into a Uint leaf). kind names the actual variant.
  • TypeMismatch { expected, actual, requested } — terminal accessor called on a rendered shape it cannot decode. expected names the scalar type the accessor requires; actual names the rendered variant; requested is the user-supplied lookup string (empty when the accessor was invoked on a leaf without a path walk).
  • IndexOutOfRange { map, index, len }SnapshotMap::at(n) past the entry list end.
  • PerCpuSlot { map, cpu, len, unmapped } — out-of-range or unmapped per-CPU slot; unmapped: true distinguishes a None slot from an out-of-range CPU.
  • NoMatch { map, op } — predicate-based lookup (find, max_by) found no match. op names the operation.
  • EmptyPathComponent { requested } — a path string contained an empty component (e.g. "a..b").
  • PerCpuNotNarrowed { map }entry.get called on a per-CPU entry without cpu(N) first.
  • NoRendered { map, side } — entry has no rendered key/value side (BTF type id missing at capture time, leaving hex bytes only).
  • PlaceholderSample { tag, reason } — a periodic-capture sample’s underlying FailureDumpReport is a placeholder produced by the freeze-rendezvous timeout fallback. Surfaces when projecting via SampleSeries::bpf; temporal patterns route the variant through their skip path so a placeholder never falsely registers as zero progress against a monotonicity / rate / steady / ratio band. reason carries the rendezvous-timeout cause text.
  • MissingStats { tag } — a SampleSeries::stats projection ran on a sample whose stats slot is None (stats client not wired or per-sample stats request failed). Distinct from in-JSON path misses (FieldNotFound / TypeMismatch) so the assertion site can branch on the cause without re-walking the source.

SnapshotError implements std::error::Error and Display, so it composes with ? and anyhow. The Display impl includes the path and any available alternatives so a failure message points the test author at the fix.

Worked example

Capture a snapshot, look up a map, walk into its first entry, and read a nested field:

use ktstr::prelude::*;

fn snapshot_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
    // Wire a bridge for the duration of the scenario.
    let cb: CaptureCallback = std::sync::Arc::new(|_name| {
        // Production: freeze + build a real FailureDumpReport. The
        // host installs this callback in real runs.
        Some(FailureDumpReport::default())
    });
    let bridge = SnapshotBridge::new(cb);
    let handle = bridge.clone();
    let _guard = bridge.set_thread_local();

    // Run the scenario, capturing once after spawn.
    let steps = vec![Step {
        setup: vec![CgroupDef::named("workers").workers(2)].into(),
        ops: vec![Op::snapshot("after_spawn")],
        hold: HoldSpec::FULL,
    }];
    let mut result = execute_steps(ctx, steps)?;

    // Drain the bridge and inspect the captured report.
    let captured = handle.drain();
    let report = captured
        .get("after_spawn")
        .ok_or_else(|| anyhow::anyhow!("snapshot 'after_spawn' missing"))?;
    let snap = Snapshot::new(report);

    // Top-level scalar.
    if let Ok(nr_cpus) = snap.var("nr_cpus_onln").as_u64() {
        result.details.push(AssertDetail::new(
            DetailKind::Other,
            format!("captured nr_cpus_onln = {nr_cpus}"),
        ));
    }

    Ok(result)
}

For the executor + bridge wiring outside a VM, see the host-side smoke tests in tests/snapshot_e2e.rs — they exercise the same pipeline against a hand-crafted FailureDumpReport so the assertion shape is covered without booting a guest.

Composing reads with writes

Snapshots are the read half of the host↔guest interaction. The write half — pre-seeding a BPF map value before the scenario starts — is the #[ktstr_test] attribute bpf_map_write = CONST, which targets a BpfMapWrite constant:

use ktstr::prelude::*;

const TRIGGER_FAULT: BpfMapWrite = BpfMapWrite {
    map_name_suffix: ".bss",   // matched against discovered maps
    offset: 42,                // byte offset within the map's value
    value: 1,                  // u32 written by the host
};

#[ktstr_test(bpf_map_write = TRIGGER_FAULT, expect_err = true)]
fn fault_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
    // The host has already written `1` at `.bss + 42` before
    // the scenario started. Capture and inspect the resulting
    // scheduler state mid-run.
    /* bridge wiring + Op::snapshot + Snapshot::new as above */
    Ok(AssertResult::pass())
}

The write is event-driven: the host polls for BPF map discoverability (scheduler loaded), polls the SHM ring for scenario start, then writes the configured u32 at the configured offset. Only BPF_MAP_TYPE_ARRAY maps are supported; the framework finds the map by map_name_suffix (e.g. ".bss") via BpfMapAccessor::find_map. See Monitor → BPF map writes for the prerequisites (vmlinux) and the full host-side contract.

Read+write workflows then compose naturally: the test pre-seeds guest state with bpf_map_write, lets the scheduler run, and asserts on the resulting state with Op::snapshot + the Snapshot accessor:

  1. Write (pre-scenario)bpf_map_write flips a .bss flag the scheduler reads.
  2. Run — the scenario’s ops drive workload behavior; the scheduler reacts to the flag.
  3. Read (mid-scenario)Op::snapshot("after") captures the scheduler state at the chosen point.
  4. AssertSnapshot::var(...).as_u64() / Snapshot::map(...).find(...).get(...).as_*() verifies the reaction. Errors carry the available alternatives so a typo or stale field name surfaces before the test author hand-edits the case.

The write side is a single one-shot poke at scheduler-load time; there is no Op variant for runtime writes. Ergonomic mid-scenario state mutation is reserved for cases where the scheduler itself exports a writable interface (sysfs, debugfs, BPF map command interface) and the test invokes that interface from a workload process.