A/B Compare Branches
Compare scheduler behavior between two branches by running the
same #[ktstr_test] suite against each, then using
cargo ktstr stats compare to diff per-metric results with
dual-gate (absolute and relative) significance and exit non-zero
on any regression.
Setup worktrees
The examples below use the scx scheduler crate under
~/opensource/scx; substitute your own scheduler crate’s path and
remote everywhere scx appears.
cd ~/opensource/scx
# Create a worktree for the baseline branch.
git worktree add ~/opensource/scx-main upstream/main
Collect both runs into a shared run root
Each cargo nextest run --workspace writes its sidecars into
target/ktstr/{kernel}-{project_commit}/. The {project_commit}
half is the project tree’s HEAD short hex captured at first
sidecar write (suffixed -dirty when the worktree differs from
HEAD), so two branches with distinct HEADs land in distinct
directories.
Two back-to-back runs of the SAME kernel at the SAME commit
reuse the same directory — the second run pre-clears any prior
sidecars at first write, so each directory is a last-writer-wins
snapshot of (kernel, project commit).
Warning: The two worktrees MUST be at distinct commits for A/B comparison to work. If both checkouts share the same HEAD (e.g. baseline branch and feature branch happen to be even), the second run overwrites the first via the last-writer-wins pre-clear and the comparison degenerates to “identical pool of sidecars.” Confirm distinct commits with
git -C ~/opensource/scx rev-parse HEADandgit -C ~/opensource/scx-main rev-parse HEADbefore invoking the secondcargo nextest run.
Every sidecar also carries its own project_commit field (read
from the project tree’s git HEAD at sidecar-write time), so
the runs from two branches land disjoint values on the commit
dimension regardless of how the directories are named. The
project commit is discovered by walking up from the test
process’s current working directory to find a .git marker
— so the cd ~/opensource/scx-main / cd ~/opensource/scx
steps below are load-bearing, not stylistic. Without them the
probe would walk up from wherever you happened to invoke
cargo, potentially ending at an entirely different repo and
recording the wrong commit on every sidecar. The simplest
collection workflow is to merge both branches’ run
subdirectories under one root and rely on
--a-project-commit / --b-project-commit to partition them:
mkdir -p ~/opensource/scx-runs/ktstr
# Baseline.
cd ~/opensource/scx-main
cargo ktstr test --kernel ../linux
mv target/ktstr/* ~/opensource/scx-runs/ktstr/
# Experimental.
cd ~/opensource/scx
cargo ktstr test --kernel ../linux
mv target/ktstr/* ~/opensource/scx-runs/ktstr/
The {kernel}-{project_commit} subdirectory names are unique per
(kernel, project commit) pair, so two branches with distinct
HEADs coexist under one root without collision. Within a single
branch, two clean back-to-back runs at the same commit reuse
one directory (last-writer-wins via per-process pre-clear);
mark one of them as -dirty (uncommitted change) or commit /
amend between runs to land separate directories.
Do not set KTSTR_SIDECAR_DIR: cargo ktstr stats list
and cargo ktstr stats compare walk
{CARGO_TARGET_DIR or "target"}/ktstr/ by default and would
not see runs written to a custom flat directory unless
--dir DIR is passed.
Discover available dimension values
The framework records the project tree’s git commit (discovered
by walking parents of the test process’s cwd to find the
enclosing .git) on every sidecar via
SidecarResult::project_commit, so two runs from different
commits land disjoint values on the commit dimension and
--a-project-commit / --b-project-commit slice between them
without any per-run directory bookkeeping.
Use
cargo ktstr stats list-values --dir DIR to enumerate the
distinct values of every filterable dimension (kernel,
commit, kernel_commit, source, scheduler, topology,
work_type) present in the pool, so per-side filters target
real values. The commit and source keys map to the
internal SidecarResult::project_commit / run_source fields;
the per-side filter flags spell as --a-project-commit /
--b-project-commit and --a-run-source / --b-run-source
on the compare
subcommand.
cd ~/opensource/scx
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats list
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats list-values
Compare per-side filter groups
cd ~/opensource/scx
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats compare \
--a-project-commit <baseline-short-hex> \
--b-project-commit <current-short-hex>
stats compare is pool-driven: every sidecar under the runs
root is loaded into a single pool, and per-side filter flags
(--a-X / --b-X) partition the pool into the A and B
contrasts. The dimensions on which the A and B filters DIFFER
are the slicing dimensions of the contrast; every other
dimension is part of the dynamic pairing key the comparison
joins on. Slicing on project-commit alone joins each
baseline scenario with its matching experimental counterpart
on every other dimension (kernel, kernel-commit, run-source,
scheduler, topology, work_type).
Other slicing axes work the same way:
# Slice on kernel.
cargo ktstr stats compare --a-kernel 6.14 --b-kernel 7.0
# Slice on scheduler, pin both sides to one kernel.
cargo ktstr stats compare \
--a-scheduler scx_rusty --b-scheduler scx_lavd \
--kernel 6.14
Shared --X flags pin BOTH sides to the same value; per-side
--a-X / --b-X REPLACE the corresponding shared --X for
that side only (“more-specific replaces”). Slicing on more
than one dimension at once prints a stderr warning but is
supported for cohort sweeps.
compare applies the dual-gate significance check from the
unified MetricDef registry to every metric and prints colored
output (red = regression, green = improvement). Rows where
either side has passed=false are dropped from the math and
counted in the summary line; the exit code is non-zero when
any regression is detected, so the command can gate CI
directly. Narrow further with -E SUBSTRING (matches the
joined scenario topology scheduler work_type string),
override the relative gate uniformly with --threshold PCT
or per-metric via --policy FILE. The absolute gate from each
MetricDef is unaffected by --threshold — a delta must
clear both gates to count as significant.
See stats compare
for the full per-side flag table and validation rules, and
stats list-values
for the discovery counterpart.
Cleanup
git worktree remove ~/opensource/scx-main
rm -rf ~/opensource/scx-runs