A/B Compare Branches

Compare scheduler behavior between two branches by running the same #[ktstr_test] suite against each, then using cargo ktstr stats compare to diff per-metric results with dual-gate (absolute and relative) significance and exit non-zero on any regression.

Setup worktrees

The examples below use the scx scheduler crate under ~/opensource/scx; substitute your own scheduler crate’s path and remote everywhere scx appears.

cd ~/opensource/scx

# Create a worktree for the baseline branch.
git worktree add ~/opensource/scx-main upstream/main

Collect both runs into a shared run root

Each cargo nextest run --workspace writes its sidecars into target/ktstr/{kernel}-{project_commit}/. The {project_commit} half is the project tree’s HEAD short hex captured at first sidecar write (suffixed -dirty when the worktree differs from HEAD), so two branches with distinct HEADs land in distinct directories. Two back-to-back runs of the SAME kernel at the SAME commit reuse the same directory — the second run pre-clears any prior sidecars at first write, so each directory is a last-writer-wins snapshot of (kernel, project commit).

Warning: The two worktrees MUST be at distinct commits for A/B comparison to work. If both checkouts share the same HEAD (e.g. baseline branch and feature branch happen to be even), the second run overwrites the first via the last-writer-wins pre-clear and the comparison degenerates to “identical pool of sidecars.” Confirm distinct commits with git -C ~/opensource/scx rev-parse HEAD and git -C ~/opensource/scx-main rev-parse HEAD before invoking the second cargo nextest run.

Every sidecar also carries its own project_commit field (read from the project tree’s git HEAD at sidecar-write time), so the runs from two branches land disjoint values on the commit dimension regardless of how the directories are named. The project commit is discovered by walking up from the test process’s current working directory to find a .git marker — so the cd ~/opensource/scx-main / cd ~/opensource/scx steps below are load-bearing, not stylistic. Without them the probe would walk up from wherever you happened to invoke cargo, potentially ending at an entirely different repo and recording the wrong commit on every sidecar. The simplest collection workflow is to merge both branches’ run subdirectories under one root and rely on --a-project-commit / --b-project-commit to partition them:

mkdir -p ~/opensource/scx-runs/ktstr

# Baseline.
cd ~/opensource/scx-main
cargo ktstr test --kernel ../linux
mv target/ktstr/* ~/opensource/scx-runs/ktstr/

# Experimental.
cd ~/opensource/scx
cargo ktstr test --kernel ../linux
mv target/ktstr/* ~/opensource/scx-runs/ktstr/

The {kernel}-{project_commit} subdirectory names are unique per (kernel, project commit) pair, so two branches with distinct HEADs coexist under one root without collision. Within a single branch, two clean back-to-back runs at the same commit reuse one directory (last-writer-wins via per-process pre-clear); mark one of them as -dirty (uncommitted change) or commit / amend between runs to land separate directories.

Do not set KTSTR_SIDECAR_DIR: cargo ktstr stats list and cargo ktstr stats compare walk {CARGO_TARGET_DIR or "target"}/ktstr/ by default and would not see runs written to a custom flat directory unless --dir DIR is passed.

Discover available dimension values

The framework records the project tree’s git commit (discovered by walking parents of the test process’s cwd to find the enclosing .git) on every sidecar via SidecarResult::project_commit, so two runs from different commits land disjoint values on the commit dimension and --a-project-commit / --b-project-commit slice between them without any per-run directory bookkeeping. Use cargo ktstr stats list-values --dir DIR to enumerate the distinct values of every filterable dimension (kernel, commit, kernel_commit, source, scheduler, topology, work_type) present in the pool, so per-side filters target real values. The commit and source keys map to the internal SidecarResult::project_commit / run_source fields; the per-side filter flags spell as --a-project-commit / --b-project-commit and --a-run-source / --b-run-source on the compare subcommand.

cd ~/opensource/scx
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats list
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats list-values

Compare per-side filter groups

cd ~/opensource/scx
CARGO_TARGET_DIR=~/opensource/scx-runs cargo ktstr stats compare \
    --a-project-commit <baseline-short-hex> \
    --b-project-commit <current-short-hex>

stats compare is pool-driven: every sidecar under the runs root is loaded into a single pool, and per-side filter flags (--a-X / --b-X) partition the pool into the A and B contrasts. The dimensions on which the A and B filters DIFFER are the slicing dimensions of the contrast; every other dimension is part of the dynamic pairing key the comparison joins on. Slicing on project-commit alone joins each baseline scenario with its matching experimental counterpart on every other dimension (kernel, kernel-commit, run-source, scheduler, topology, work_type).

Other slicing axes work the same way:

# Slice on kernel.
cargo ktstr stats compare --a-kernel 6.14 --b-kernel 7.0

# Slice on scheduler, pin both sides to one kernel.
cargo ktstr stats compare \
    --a-scheduler scx_rusty --b-scheduler scx_lavd \
    --kernel 6.14

Shared --X flags pin BOTH sides to the same value; per-side --a-X / --b-X REPLACE the corresponding shared --X for that side only (“more-specific replaces”). Slicing on more than one dimension at once prints a stderr warning but is supported for cohort sweeps.

compare applies the dual-gate significance check from the unified MetricDef registry to every metric and prints colored output (red = regression, green = improvement). Rows where either side has passed=false are dropped from the math and counted in the summary line; the exit code is non-zero when any regression is detected, so the command can gate CI directly. Narrow further with -E SUBSTRING (matches the joined scenario topology scheduler work_type string), override the relative gate uniformly with --threshold PCT or per-metric via --policy FILE. The absolute gate from each MetricDef is unaffected by --threshold — a delta must clear both gates to count as significant.

See stats compare for the full per-side flag table and validation rules, and stats list-values for the discovery counterpart.

Cleanup

git worktree remove ~/opensource/scx-main
rm -rf ~/opensource/scx-runs