External Benchmarking

Methodology and operating rules for serious external benchmark comparisons.

External Benchmarking

ferrocat-bench exposes a manual, reproducible comparison suite for external baselines.

The internal microbenchmarks remain the fast day-to-day performance loop. The external comparison suite is for serious cross-runtime checkpoints on a documented reference host, especially when comparing Ferrocat against GNU gettext, Node, and Python ecosystem tools.

Reproduce A Fast Baseline

For a quick local signal before and after hot-path changes, use the quick official profile:

cargo run --release -p ferrocat-bench -- compare gettext-official-quick-v1 --out target/benchmark/gettext-official-quick-v1.json

Keep the generated JSON report with your notes when a change is performance-motivated. For numbers you intend to publish or compare publicly, use the full gettext-official-v1 profile on the documented reference host.

Reference Host Rules

use one documented benchmark machine for official comparisons
keep Rust, Node, Python, and GNU gettext versions fixed across report runs
minimize background load and network activity during a run
keep the machine on AC power
compare reports only within the same host and toolchain class

Required Tooling

Rust toolchain able to run cargo run -p ferrocat-bench
Node.js plus the packages declared in benchmark/node/package.json
Python 3 plus the packages declared in benchmark/python/requirements.txt
GNU gettext commands msgcat and msgmerge

Suggested setup:

./benchmark/setup.sh

If benchmark/python/.venv exists, ferrocat-bench will automatically prefer that interpreter for verify-benchmark-env and compare, so polib does not need to be installed globally.

If you only want the Python side, run:

./benchmark/python/setup.sh

Verify The Environment

Run:

cargo run -p ferrocat-bench -- verify-benchmark-env

This checks the required executables and adapter packages and prints the detected tool versions that will be captured in the report metadata.

Benchmark Profiles

gettext-official-v1
- the smallest official benchmark profile
- intentionally benchmark-focused rather than test-focused
- one conservative primary locale: de
- one second normal locale: fr
- one more complex plural locale: pl
- one representative large corpus size per scenario
gettext-official-quick-v1
- the fast companion to gettext-official-v1
- keeps the same fixture and external-tool matrix
- lowers the minimum sample duration
- uses fewer warmup and measured runs
- useful for local iteration and regression checks, but not the publication-grade profile
gettext-compat-v1
- extended external benchmark suite
- broader gettext-only matrix with additional locale/family coverage
- useful when you want more detail than the slim official profile
gettext-workflows-v1
- focused workflow suite for classic gettext merge paths
- compares merge_catalog against msgmerge
- kept separate from the slim official profile so workflow tuning does not dominate the main benchmark story
gettext-workflows-ecosystem-v1
- extended workflow suite for classic gettext merge paths
- compares merge_catalog against msgmerge, pofile, pofile-ts, and polib
- external library numbers are measured as reconstructed parse -> merge -> serialize pipelines
- useful when you want workflow numbers across the broader gettext ecosystem
serious-v1
- advanced/internal benchmark suite
- mixed and ICU-heavy workloads
- useful for ferrocat's broader performance direction, but not the official cross-tool gettext baseline

Run The Official Gettext Suite

Use the checked-in gettext-official-v1 profile and write the report outside the internal performance history:

cargo run --release -p ferrocat-bench -- compare gettext-official-v1 --out benchmark/results/gettext-official-v1-$(date +%Y%m%d-%H%M%S).json

The compare command:

validates semantic equivalence for each comparison group before timing
calibrates iterations per scenario to a minimum sample duration
runs 2 warmups per scenario
records 10 measured samples per parse/stringify scenario
stores raw samples plus aggregated statistics in JSON

For a quicker checkpoint with the same comparison matrix:

cargo run --release -p ferrocat-bench -- compare gettext-official-quick-v1 --out benchmark/results/gettext-official-quick-v1-$(date +%Y%m%d-%H%M%S).json

That profile currently uses:

minimum_sample_millis: 100
1 warmup and 3 measured samples for parse/stringify scenarios
1 warmup and 2 measured samples for workflow scenarios

Use it for faster day-to-day checks. Keep gettext-official-v1 as the primary report for published comparisons.

For GNU gettext CLI scenarios, the report also records an empty-cli-run baseline using a minimal header-only input. This adds:

baseline_elapsed_ns and adjusted sample fields for msgcat / msgmerge
adjusted median statistics alongside the raw end-to-end statistics

The raw timing remains the primary comparison number. The adjusted timing is a secondary estimate for understanding how much of the CLI measurement is fixed overhead and how much is actual fixture work.

For the workflow-oriented suite:

cargo run --release -p ferrocat-bench -- compare gettext-workflows-v1 --out benchmark/results/gettext-workflows-v1-$(date +%Y%m%d-%H%M%S).json

That profile covers:

merge_catalog versus msgmerge

The msgmerge benchmark path runs with --no-fuzzy-matching. This makes the comparison closer to ferrocat's exact-match merge model and avoids comparing heuristic translation carry-over against a library that intentionally does not optimize for msgid-based fuzzy recovery.

For the broader workflow ecosystem suite:

cargo run --release -p ferrocat-bench -- compare gettext-workflows-ecosystem-v1 --out benchmark/results/gettext-workflows-ecosystem-v1-$(date +%Y%m%d-%H%M%S).json

That profile extends the workflow comparison with:

pofile
pofile-ts
polib

These are measured as fair reconstructed workflows using each library's parse and stringify APIs around the same extracted-message merge step.

For the broader compatibility/detail suite:

cargo run --release -p ferrocat-bench -- compare gettext-compat-v1 --out benchmark/results/gettext-compat-v1-$(date +%Y%m%d-%H%M%S).json

Use this when you want more fixture variety than the slim official profile provides.

Result Storage

Internal microbenchmark history stays in /performance/performance-history
External comparison reports should be written under benchmark/results/
Do not copy external compare results into the internal performance history tables

Current `gettext-official-v1` Shape

gettext-ui-de-10000
gettext-saas-fr-10000
gettext-commerce-pl-10000

External baselines currently wired:

polib, pofile, pofile-ts, and gettext-parser on the classic gettext parse/stringify corpora: gettext-ui-de-10000, gettext-saas-fr-10000, gettext-commerce-pl-10000
msgcat on stringify comparisons
msgmerge on the conservative merge corpus, with --no-fuzzy-matching
ferrocat internal owned vs borrowed parse baselines on de, fr, and pl

Workflow-only baselines currently wired:

pofile, pofile-ts, and polib on gettext-workflows-ecosystem-v1
each measured as parse -> merge -> serialize pipelines on gettext-ui-de-1000 and gettext-ui-de-10000
gettext-parser is intentionally excluded from workflow benchmarking for now because its PO compile/parse path does not preserve obsolete entries in a way that is semantically fair for msgmerge-style workflows
update_catalog is intentionally excluded from the public cross-tool benchmark tables because it is a broader catalog-maintenance API without a clean direct equivalent in the external comparison set

This is intentional. The official profile answers the small, understandable benchmark question first. The broader gettext-compat-v1 profile is available when you want more detail, and the advanced mixed-* / ICU-heavy corpora remain separate from the official Gettext comparison track.

Reporting Expectations

When you share benchmark results from the external suite, include the environment block from the JSON report together with the throughput table. At minimum, keep these fields visible:

system_label
os
cpu_model
memory_bytes
rustc_version
node_version
python_version
msgcat_version / msgmerge_version when GNU gettext numbers are shown

This keeps published numbers tied to the machine and toolchain they were measured on without exposing private hostnames.

External Benchmarking

Reproduce A Fast Baseline

Reference Host Rules

Required Tooling

Verify The Environment

Benchmark Profiles

Run The Official Gettext Suite

Result Storage

Current gettext-official-v1 Shape

Reporting Expectations

Current `gettext-official-v1` Shape