Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Auto-Repro

When a test fails because the scheduler crashes or exits, auto-repro boots a second VM with BPF probes attached to capture function arguments and struct fields along the scheduling path. Stack functions extracted from the crash output seed the probe list; when no crash stack is available (e.g. a BPF text error or verifier failure with no backtrace), auto-repro falls back to dynamic BPF program discovery in the repro VM.

How it works

  1. First VM – the test runs normally. If the scheduler crashes or exits (BPF error, verifier failure, stall), ktstr captures any stack trace from the scheduler log (COM2) or kernel console (COM1).

  2. Stack extraction – function names are parsed from the crash trace when available. BPF program symbols (bpf_prog_*) are recognized and their short names extracted. Generic functions (scheduler entry points, spinlocks, syscall handlers, sched_ext exit machinery, BPF trampolines, stack dump helpers) are filtered out. When no stack functions are found, the pipeline continues with an empty probe list.

  3. BPF discovery – in the repro VM, ktstr discovers loaded struct_ops programs via libbpf-rs and adds them to the probe list alongside any stack-extracted functions. Their kernel-side callers are added (e.g. enqueue -> do_enqueue_task) for bridge kprobes. This step ensures probes can capture variable states across the scheduler exit call chain even when the crash produced no extractable stack.

  4. BTF resolution – function signatures are resolved from vmlinux BTF (kernel functions) and program BTF (BPF callbacks). Known struct types (task_struct, rq, scx_dispatch_q, etc.) have curated fields resolved to byte offsets. Other struct pointer params have scalar, enum, and cpumask pointer fields auto-discovered from vmlinux or BPF program BTF.

  5. Second VM – ktstr boots a new VM and reruns the scenario with BPF probes:

    • Kprobe skeleton for kernel function entry (uses bpf_get_func_ip)
    • Fentry/fexit skeleton for BPF callbacks and kernel function exit (batched in groups of 4, shares maps via reuse_fd). Fexit re-reads struct fields after the function executes, capturing post-mutation state alongside the entry snapshot.
    • Tracepoint trigger (tp_btf/sched_ext_exit) fires inside scx_claim_exit() when the scheduler exits, in the context of the current task at exit time
  6. Stitching – the task_struct pointer is read from the trigger event’s bpf_get_current_task() value. Events with a task_struct parameter are filtered to that pointer; events without a task_struct parameter are retained if their task_ptr (from bpf_get_current_task() at probe time) matches the triggering task. Events are sorted by timestamp and formatted with decoded field values (cpumask ranges, DSQ names, enqueue flags, etc.) and source locations (DWARF for kernel, line_info for BPF).

  7. Diagnostic tails – the last 40 lines of the repro VM’s scheduler log (COM2, cycle-collapsed), sched_ext dump (COM1), and kernel console (COM1) are appended after the probe output when non-empty. A duration line reports total repro VM wall time. When probe data is absent, a crash reproduction status line indicates whether the crash reproduced.

Requirements

Auto-repro requires a kernel with the sched_ext_exit tracepoint (used as the probe trigger). Kernels built with CONFIG_SCHED_CLASS_EXT and tracepoint support include this. If the tracepoint is unavailable, auto-repro is skipped and the pipeline diagnostics report the cause.

Enabling auto-repro

In #[ktstr_test]:

#[ktstr_test(auto_repro = true)]
fn my_test(ctx: &Ctx) -> Result<AssertResult> { ... }

auto_repro defaults to true in #[ktstr_test].

Repro mode

During the second VM run, ktstr sets “repro mode” which disables the work-conservation watchdog. Workers normally send SIGUSR2 to the scheduler when stuck > 2 seconds. In repro mode, the scheduler stays alive so BPF assertion probes can fire.

Example output

The demo_host_crash_auto_repro test triggers a host-initiated crash via BPF map write and captures the scheduling path. Probe output shows each function with decoded struct fields and source locations. When fexit captures post-mutation state, changed fields show an arrow () between entry and exit values:

ktstr_test 'demo_host_crash_auto_repro' [sched=scx-ktstr] failed:
  scheduler died

--- auto-repro ---
=== AUTO-PROBE: scx_exit fired ===

  ktstr_enqueue                                                   main.bpf.c:21
    task_struct *p
      pid         97
      cpus_ptr    0xf(0-3)
      dsq_id      SCX_DSQ_INVALID
      enq_flags   NONE
      slice       0
      vtime       0
      weight      100
      sticky_cpu  -1
      scx_flags   QUEUED|ENABLED
  do_enqueue_task                                               kernel/sched/ext.c:1344
    rq *rq
      cpu         1
    task_struct *p
      pid         97
      cpus_ptr    0xf(0-3)
      dsq_id      SCX_DSQ_INVALID          →  SCX_DSQ_LOCAL
      enq_flags   NONE
      slice       20000000
      vtime       0
      weight      100
      sticky_cpu  -1
      scx_flags   QUEUED|DEQD_FOR_SLEEP    →  QUEUED

After the probe data, the auto-repro section includes the repro VM duration and the last 40 lines of the repro VM’s scheduler log, sched_ext dump, and dmesg (each only when non-empty).

Demo test

A demo test in this shape (reduced from demo_host_crash_auto_repro in tests/scenario_coverage.rs):

use ktstr::prelude::*;
use ktstr::test_support::{BpfMapWrite, KtstrTestEntry, run_ktstr_test};

fn scenario_yield_heavy(ctx: &Ctx) -> Result<AssertResult> {
    let steps = vec![Step::with_defs(
        vec![
            CgroupDef::named("demo_workers")
                .work_type(WorkType::YieldHeavy)
                .workers(4),
        ],
        HoldSpec::Fixed(Duration::from_secs(8)),
    )];
    execute_steps(ctx, steps)
}

Run manually to see full output:

cargo ktstr test --kernel ../linux -- --run-ignored ignored-only -E 'test(demo_host_crash_auto_repro)'