Skip to content

Benchmark Strategy

Status

Accepted

Context

Ferroni's performance claims need reproducible evidence. The question is what to benchmark, how to measure, and how to prevent regressions.

Decision

A two-suite benchmark architecture:

Reference suite: battle_bench

Real-world scenarios that produce the numbers we publish. Always Ferroni vs Oniguruma at -O3.

CategoryWhat is measured
Syntax highlightingFull, unmodified Shiki grammars -- TypeScript (279 patterns), CSS (117 patterns), Rust (81 patterns). Compile time, first-match latency, full-line tokenization.
Text searchLiteral search, no-match rejection, field extraction, timestamp matching on 10-50 KB log inputs.
Pattern matchingOne representative pattern per regex feature (quantifiers, lookaround, Unicode, backreferences, alternation, named captures).
CompilationSimple to complex patterns, measuring compile latency.

Key rule: benchmark against complete, unmodified production grammars -- no cherry-picked subsets. The Shiki grammars are committed as-is in benches/grammars/.

Internal suite: codspeed_bench

Ferroni-only micro-benchmarks tracked by CodSpeed in CI. These catch performance regressions before they reach main. They are intentionally internal-facing: useful for optimizing parser, executor, scanner, RegSet, and public API paths, but not meant for README marketing tables.

Tooling

  • Criterion.rs for local measurement and HTML reports (target/criterion/report/index.html).
  • codspeed-criterion-compat for CI integration -- same benchmark code, instrumented for CodSpeed's wall-time tracking.
  • C comparison via optional ffi feature. The cc crate builds Oniguruma from a pinned local source snapshot prepared on demand for head-to-head measurement.
  • Pinned battle inputs in benches/battle_inputs.toml. This records the exact external artifacts behind the publishable suite.

Build profile

Both release and bench profiles use lto = "thin" to allow cross-crate inlining (especially for memchr) without the compile-time cost of full LTO. This matches realistic deployment conditions.

Rationale

  • Real grammars prevent overfitting. Benchmarking against subsets risks optimizing for patterns that don't matter.
  • C comparison keeps claims honest. Every speedup number is relative to the same engine at -O3, not a strawman.
  • Two suites separate concerns. battle_bench is small, stable, and publishable; codspeed_bench is free to optimize for regression coverage and engineering feedback.
  • Compilation is part of the workload. Syntax highlighters compile grammars at startup. Ignoring compile time gives an incomplete picture.
  • README numbers should stay human-scale. The README intentionally rounds values; exact raw numbers live in /perf/benchmark-results.

Consequences

  • Shiki grammar JSON files are committed to the repository (benches/grammars/). These are updated when Shiki releases new grammar versions.
  • The ffi feature adds a C build step. Running ./scripts/prepare-oniguruma-sources.sh && cargo bench --features ffi --bench battle_bench requires a C compiler; cargo bench (without ffi) runs the Rust-only internal suite.
  • Exact external input revisions for battle_bench live in benches/battle_inputs.toml; each published measurement run should also record machine and toolchain details in /perf/benchmark-results.
  • Reference benchmark results are documented in /perf/benchmark-results.md and summarized in README.
  • New optimizations should be validated against the internal suite first; user-facing claims should cite the reference suite.