Investigate a Crash
When a scheduler crashes during a test, the failure output and auto-repro pipeline help identify the cause.
First step: enable full diagnostics
Rerun the failing test with RUST_BACKTRACE=1 before digging into
individual sections:
RUST_BACKTRACE=1 cargo ktstr test --kernel ../linux -- -E 'test(my_test)'
Setting RUST_BACKTRACE=1 unconditionally appends the
--- diagnostics --- section (init stage, VM exit code, last lines
of kernel console) to every failure, not only when the scheduler
self-dies. It also enables verbose VM console output (equivalent to
KTSTR_VERBOSE=1).
Reading failure output
A test failure message contains up to eight sections, each present only when relevant:
| Section | Content |
|---|---|
| Error line | Test name, scheduler, failure reason. |
--- stats --- | Per-cgroup worker count, CPU count, spread, gap, migrations, iterations. |
--- diagnostics --- | Init stage classification, VM exit code, last 20 lines of kernel console. |
--- timeline --- | Kernel version, topology, scheduler, scenario duration, phase breakdown with monitor samples. |
--- scheduler log --- | Scheduler process stdout+stderr (cycle-collapsed). |
--- monitor --- | Host-side monitor: sample count, max imbalance ratio, max local-DSQ depth, sustained-violation flag, SCX event counters (select_cpu_fallback, keep_last, skip_exiting, skip_migration_disabled), per-sched_domain load-balance rates, per-BPF-program verified_insns, and the merged threshold verdict. |
--- sched_ext dump --- | sched_ext_dump trace lines from the guest kernel. |
--- auto-repro --- | BPF probe data from a second VM run, plus repro VM duration, scheduler log, sched_ext dump, and dmesg tails. |
--- diagnostics --- appears automatically when the scheduler died
or crashed, or when RUST_BACKTRACE is set to 1 or full.
Auto-repro
auto_repro defaults to true in #[ktstr_test]. When the scheduler
crashes, ktstr automatically:
- Captures the crash stack trace from the scenario output.
- Boots a second VM with BPF kprobes (kernel functions) and fentry
probes (BPF callbacks) on each function in the crash chain, plus
a
tp_btf/sched_ext_exittracepoint trigger. - Reruns the scenario to capture function arguments at each crash point.
Reading auto-repro output
The probe output shows each function in the crash chain with:
- Function signature and argument values during execution of the same workload
- Source file and line number
- Call chain context
After the probe data, the auto-repro section includes the repro VM duration and the last 40 lines of the repro VM’s scheduler log (cycle-collapsed), sched_ext dump, and kernel console (dmesg). These supplement probe data when the crash produces sparse or no probe events. When probe data is absent, a crash reproduction status line replaces it.
See Auto-Repro for details on how the two-VM repro cycle works.