Watch Snapshots
A watch snapshot registers a hardware data-write watchpoint on a
named kernel symbol. The host arms a watchpoint slot via the guest’s
hardware debug facilities; the produced captures share the
Snapshot accessor surface documented in Snapshots.
Watch snapshots are supported on x86_64 and aarch64 KVM hosts. The slot terminology below is arch-neutral — each architecture’s KVM plumbing maps the slots onto its native hardware-watchpoint facility (debug registers on x86_64, hardware watchpoints on aarch64).
Op::watch_snapshot("symbol") is the write-driven capture
trigger. Use it when the question is “what does the scheduler look
like whenever the kernel touches X?” rather than “what does it
look like at this point in my scenario?”. For time-driven capture,
use Op::snapshot instead.
How it works
The full pipeline is implemented and tested end-to-end:
Op::watch_snapshot(symbol)registers the symbol via the virtio-console port 1MSG_TYPE_SNAPSHOT_REQUESTTLV frame.- The freeze coordinator resolves the KVA from the vmlinux ELF,
validates 4-byte alignment, and arms a free user watchpoint slot
via
KVM_SET_GUEST_DEBUG. - When the guest writes to the watched address, the corresponding debug exit fires and the host identifies which slot tripped.
- The coordinator captures via
freeze_and_captureand stores the report in theSnapshotBridgeunder the symbol tag. - The report is also mirrored to a sidecar JSON file for post-hoc inspection.
The per-scenario cap of MAX_WATCH_SNAPSHOTS (= 3) is enforced
(slot 0 is reserved for the error-class exit_kind trigger; the
remaining 3 slots are available for user watches). A 4th
Op::watch_snapshot fails the step with a “cap exceeded” message.
Symbol-resolution failures bail immediately so a typo surfaces
visibly.
Op::watch_snapshot covers the full pipeline: registration,
cap enforcement, symbol resolution, hardware arming, and
automatic capture on write.
Issuing a watch
use ktstr::prelude::*;
let steps = vec![Step {
setup: vec![CgroupDef::named("workers").workers(2)].into(),
ops: vec![
Op::watch_snapshot("jiffies_64"),
Op::watch_snapshot("scx_watchdog_timestamp"),
],
hold: HoldSpec::FULL,
}];
execute_steps(ctx, steps)?;
Each Op::watch_snapshot invokes the active SnapshotBridge’s
register_watch callback with the symbol string. On success, the
callback is responsible for arming a hardware watchpoint that will
fire whenever the guest writes to the symbol’s address. Each fire
produces one capture, tagged with the symbol path itself.
Wiring the bridge
In
#[ktstr_test]scenarios that boot a VM, the bridge is wired automatically. Usepost_vmto read captures fromVmResult::snapshot_bridge. Do not install a thread-local bridge inside the scenario function — the in-VMOp::watch_snapshotregisters via the virtio-console port 1MSG_TYPE_SNAPSHOT_REQUESTTLV frame, the host coordinator arms the watchpoint and stores captures on the bridge it owns, and the test reads them after the VM exits.The
set_thread_localpattern below is for host-side unit tests that exercise the executor in process without booting a guest.
A watch-capable bridge for host-side unit tests needs both a capture
callback and a register_watch callback:
use ktstr::prelude::*;
let cb: CaptureCallback = std::sync::Arc::new(|_name| {
Some(FailureDumpReport::default())
});
let reg: WatchRegisterCallback = std::sync::Arc::new(|symbol: &str| {
// Host-side unit tests: record the symbol and return Ok. In a
// booted VM, the host coordinator's pipeline runs instead —
// see arm_user_watchpoint in src/vmm/freeze_coord.rs.
println!("would arm watchpoint on {symbol}");
Ok(())
});
let bridge = SnapshotBridge::new(cb).with_watch_register(reg);
let _guard = bridge.set_thread_local();
A bridge built only with SnapshotBridge::new(cb) (no
with_watch_register) rejects every Op::watch_snapshot with an
error pointing the operator at the missing wiring.
Symbol resolution
Production resolution is a verbatim match against the vmlinux ELF
symbol table. The freeze coordinator walks Elf::syms and accepts
the symbol whose strtab entry equals the requested string byte-for-byte
— there is no prefix stripping, BTF lookup, kallsyms walk, or per-CPU
offset arithmetic. Use the exact name nm vmlinux would print:
"jiffies_64"— the kernel’s monotonic tick counter."scx_watchdog_timestamp"— sched_ext’s watchdog timestamp.
Warning: high-frequency symbols soft-lock the guest. Watching a symbol that the kernel writes every jiffie (e.g.
jiffies_64atHZ=1000) fires 1000+ captures per second. Each capture freezes all vCPUs for the full dump pipeline. The guest spends almost all of its wall time paused, which is indistinguishable from a soft lock-up — schedulers stall, watchdogs fire, and the test wedges before any meaningful work runs. Pick symbols the kernel writes at scenario-relevant cadence (a state field, a per-event counter), not on every tick.
The string passed to Op::watch_snapshot must match a vmlinux ELF
symtab entry exactly; otherwise the step fails with
symbol '...' not found in vmlinux symtab. The
register_watch callback on a host-side test bridge can accept any
shape it wants — the e2e tests in tests/snapshot_e2e.rs use
"kernel.a" / "kernel.b" / etc. for the cap-enforcement test and
"exit_kind" for the in-VM test — but the Op::watch_snapshot ops
that flow through the production pipeline (in-VM scenarios with no
host-side bridge override) must use a verbatim ELF symbol.
Maximum of 3 watches per scenario
pub const MAX_WATCH_SNAPSHOTS: usize = 3;
The bridge enforces a per-scenario cap of 3 successfully-registered
watches. The number is tied to the per-vCPU hardware-watchpoint
slots KVM exposes via KVM_SET_GUEST_DEBUG: slot 0 is reserved for
the existing *scx_root->exit_kind watchpoint that drives the
error-class freeze trigger; the remaining three user watchpoint
slots are available for on-demand watches.
A 4th Op::watch_snapshot in the same scenario fails the step with
“cap exceeded” when the cap is exceeded:
let steps = vec![Step {
setup: vec![CgroupDef::named("cg").workers(2)].into(),
ops: vec![
Op::watch_snapshot("kernel.a"),
Op::watch_snapshot("kernel.b"),
Op::watch_snapshot("kernel.c"),
Op::watch_snapshot("kernel.d"), // <-- cap exceeded
],
hold: HoldSpec::FULL,
}];
let result = execute_steps(ctx, steps)?;
assert!(!result.passed);
// One AssertDetail carries the cap-exceeded message:
// "Op::WatchSnapshot cap exceeded: scenario already registered 3
// watchpoints (3 user watchpoint slots occupied; slot 0 reserved for the
// error-class exit_kind trigger)..."
A failed register (cap exceeded, callback error, missing
register_watch) does not consume a slot. The bridge rolls the
count back so the scenario can keep trying with different symbols up
to the cap.
Failure modes
The register callback is the single integration point where
production resolution can fail. The reasons documented on
WatchRegisterCallback:
- The symbol does not match any vmlinux ELF symtab entry (typo, symbol stripped from the build, or a non-ELF kernel image).
- The resolved KVA is not 4-byte aligned (the 4-byte watch length
the framework arms requires
addr & 0x3 == 0on every supported architecture). - All three available user watchpoint slots are already allocated inside the host’s KVM plumbing.
KVM_SET_GUEST_DEBUGrejected the arm.
When the callback returns Err(reason), the executor bails the step
immediately with a message containing the symbol and the failure
reason. Silent degradation is deliberately avoided — a watch that
never fires would look identical to a healthy passing run, and the
test author would never notice the captures were missing.
Slot 0 (exit_kind) is separate
The existing error-class freeze trigger watches
*scx_root->exit_kind on slot 0 and is not an
Op::watch_snapshot slot. It is wired by the freeze coordinator
independently to detect SCX_EXIT_ERROR writes and drive the
failure-dump pipeline. That trigger is unrelated to the on-demand
watch surface — it always runs, regardless of whether a scenario
declares any Op::watch_snapshot ops. The cap of 3 reflects the
three remaining user slots after slot 0 is held back.
For tests that want the failure dump produced by SCX_EXIT_ERROR,
nothing needs to be declared; the trigger fires automatically and
the dump is written to {sidecar_dir()}/{test_name}.failure-dump.json.
The watch-snapshot in-VM test in tests/snapshot_e2e.rs reads that
file back and feeds it through the Snapshot accessor as a way
to demonstrate the full read path.
Reading captures
Once a watchpoint fires, the resulting report is stored on the bridge
under the tag and read back exactly as Op::snapshot captures are.
Every accessor — Snapshot::map, Snapshot::var,
SnapshotMap::at / find / filter / max_by, dotted-path walks,
typed terminal reads — is shared. See Snapshots for
the full surface.