Performance History

Append-only benchmark history and profiling checkpoints for Ferrocat.

Performance History

Append-only history for benchmark and profiling checkpoints.

Rules:

add rows, do not replace older numbers
keep command, fixture, build profile, and notes explicit
prefer comparable reruns over ad-hoc measurements

Date	Area	Build	Command	Fixture	Iterations	Iter/s	MiB/s	Notes
2026-03-14	parse	dev	`cargo run -p ferrocat-bench -- parse realistic 1000`	`realistic.po`	1000	20121.0	15.10	Pre byte-line-scanner baseline
2026-03-14	parse	dev	`cargo run -p ferrocat-bench -- parse realistic 1000`	`realistic.po`	1000	29211.6	21.92	Post byte-line-scanner + `memchr` refactor
2026-03-14	parse	dev	`cargo run -p ferrocat-bench -- parse mixed-1000 200`	generated `mixed-1000`	200	412.1	47.88	1000 entries, mixed features, deterministic corpus
2026-03-14	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-1000 200`	generated `mixed-1000`	200	2830.9	328.91	Release baseline after byte-line-scanner refactor
2026-03-14	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-1000 200`	generated `mixed-1000`	200	2957.0	343.56	Added borrow-or-own fast path for quoted strings
2026-03-14	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-1000 200`	generated `mixed-1000`	200	3041.8	353.41	Centralized scanner classification/helpers without borrowed-item overhead
2026-03-14	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-1000 200`	generated `mixed-1000`	200	3393.1	394.23	Scanner backend helpers added; repeated runs showed noticeable single-run variance
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	392.4	463.22	Byte-based quoted extraction plus unchecked UTF-8 conversion on parser-owned slices; profiled run reached 395.5 iter/s
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	401.2	473.63	`LineScanner` now trims only leading ASCII whitespace on the hot path and carries a smaller line record.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	406.4	479.70	Singular `msgstr[0]` now stays on a scratch string in parser state and only promotes to a `Vec<String>` for plural cases.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	415.5	490.45	Inlined scanner helpers, specialized keyword classification, and replaced `Plural-Forms` parsing with a byte-based path.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	457.9	540.49	Replaced always-`Vec<String>` `msgstr` storage with a `MsgStr` enum (`None`/`Singular`/`Plural`), removing per-item vector overhead for the common singular case.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	473.3	558.71	`ParserState::reset` now clears fields in place and reuses `PoItem` allocations instead of rebuilding the whole parser state every item.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	483.5	570.72	Split reset paths so the common post-`mem::take` case avoids a full `PoItem::clear_for_reuse`, while the header path still clears in place.
2026-03-15	parse-borrowed	release	`cargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200`	generated `mixed-10000`	200	816.0	963.27	First zero-copy parse path with borrowed `Cow`-backed items; header key/value are still materialized today.
2026-03-15	parse-borrowed	release	`cargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200`	generated `mixed-10000`	200	835.9	986.75	Borrowed parser now extracts standard header fragments directly from raw `msgstr` lines, so common `...\\n` header entries stay borrowed too.
2026-03-15	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 200`	generated `mixed-10000`	200	478.6	564.91	`unescape_string` now decodes escape-heavy slices via a byte-oriented span copier instead of repeated `String` appends; first post-build run showed noise, repeat settled higher.
2026-03-15	parse-borrowed	release	`cargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200`	generated `mixed-10000`	200	858.6	1013.56	Same byte-oriented unescape path modestly improved the borrowed parser slow path for escaped strings.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated merge-mixed-10000	100	107.7	155.32	First `merge_catalog(existing_po, extracted_messages)` benchmark using borrowed parse of the existing catalog, preserved translations for matching keys, and obsolete-marking for removed keys.
2026-03-15	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 200`	generated `mixed-10000`	200	896.5	1065.16	Same `MsgStr` enum change preserved stringify throughput while using a more compact translation model.
2026-03-14	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-1000 200`	generated `mixed-1000`	200	1268.0	148.29	Baseline before simple-keyword direct-write fast path
2026-03-14	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-1000 200`	generated `mixed-1000`	200	3213.3	375.80	Direct fast path for common single-line keyword writes
2026-03-14	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-1000 200`	generated `mixed-1000`	200	4532.1	530.03	Replaced multiline/folding `Vec<String>` pipeline with direct segmented writes; repeated runs ranged from 4246.4 to 4532.1 iter/s
2026-03-14	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-1000 200`	generated `mixed-1000`	200	7507.4	877.99	Replaced temporary escaped strings with direct buffer writes; scratch buffer reused for multiline segments
2026-03-14	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 200`	generated `mixed-10000`	200	830.1	986.28	Same direct-escape write path confirmed on larger corpus after Time Profiler-guided optimization
2026-03-15	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 200`	generated `mixed-10000`	200	881.6	1047.44	Added `aarch64` NEON escape-byte scan path; repeated `mixed-10000` runs stayed around 868.7-887.2 iter/s
2026-03-15	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 200`	generated `mixed-10000`	200	919.2	1092.16	Reused one scratch buffer across the whole stringify pass instead of recreating multiline escape buffers per keyword
2026-03-16	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 500 --warmup 1 --runs 5`	generated `mixed-10000`	500	1014.0	1204.77	Reused the first escape scan result inside `write_keyword` so the common simple-keyword path no longer re-scans values before escaping; measured with the new median-based harness.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated `merge-mixed-10000`	100	114.6	165.29	Fixed obsolete-item roundtrip accounting in both owned and borrowed parsers and reran the realistic merge benchmark.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated `merge-mixed-10000`	100	145.2	209.45	Replaced owned string-key bookkeeping with borrowed `msgid` buckets plus a `Vec<bool>` matched map, removing repeated key allocation and `HashSet` churn in the merge loop.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated `merge-mixed-10000`	100	164.9	237.92	Replaced `BorrowedPoItem::clone().into_owned()` style materialization with direct owned construction in the merge path, cutting intermediate Vec/Cow clone churn for matched and obsolete items.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated `merge-mixed-10000`	100	248.0	357.70	Switched `merge_catalog` to a direct transient render path that writes merged items straight into the output buffer while preserving canonical default `stringify_po` formatting.
2026-03-15	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 100`	generated `merge-mixed-10000`	100	297.2	428.73	Replaced the general borrowed parser in the merge path with a merge-specialized borrowed parser that stores lighter-weight item data and parses only the structures `merge_catalog` needs.
2026-03-16	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 500`	generated `merge-mixed-10000`	500	310.4	447.73	Added a merge-local quoted-string fast path so keyword lines, continuations, and header fragments avoid the generic quoted extractor on the common unescaped case; longer runs were used because the 100-iteration merge benchmark showed noticeable noise.
2026-03-16	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5`	generated `merge-mixed-10000`	500	292.7	422.14	Added multi-run median reporting to the benchmark harness and kept the merge parser's common post-`mem::take` reset path specialized; the same setup measured `279.0 iter/s` when falling back to a full `ParserState::reset`, so the split reset remains a real win.
2026-03-16	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5`	generated `merge-mixed-10000`	500	288.7	416.40	Final verification rerun with the same median-based harness settled slightly lower but with a much tighter range (`287.9..290.1 iter/s`), still comfortably above the `279.0 iter/s` full-reset A/B baseline.
2026-03-16	merge	release	`cargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5`	generated `merge-mixed-10000`	500	311.5	449.40	The same shared `write_keyword` fast-path cleanup also improved merge throughput, which now benefits from reusing the serializer's first escape scan result in its direct render path.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3`	generated `merge-mixed-1000`	20	478.8	69.37	First end-to-end benchmark for the new high-level `update_catalog` API, including normalization, canonical catalog merge, ICU export, and final PO serialization.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3`	generated `merge-mixed-1000`	20	488.0	70.70	Replaced the high-level API's heuristic plural-category mapping with ICU4X `icu_plurals`, using locale-aware cardinal categories when the locale and `nplurals` agree and falling back to the existing count-based mapping for mismatches or invalid locales.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3`	generated `merge-mixed-1000`	20	479.6	69.48	Centralized plural handling behind `PluralProfile`, surfaced parse-time plural diagnostics, and added conservative `Plural-Forms` header generation only for safe 1/2-form gettext locales; throughput stayed effectively flat within normal run-to-run noise.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3`	generated `merge-mixed-1000`	20	425.9	61.71	Replaced the ad-hoc ICU plural heuristic with the new `ferrocat-icu` MessageFormat-v1 parser and conservative projection adapter. This materially improved correctness and diagnostics for ICU-backed catalog parsing, but introduced a noticeable first-pass cost that should be profiled in a follow-up optimization round.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 5`	generated `merge-mixed-1000`	20	464.3	67.27	Added a cheap ICU-shape fast path ahead of `parse_icu`, so obviously non-ICU strings skip the new MessageFormat parser entirely; this recovered most of the first integration regression while keeping the stricter ICU correctness path for strings that actually look like ICU syntax.
2026-03-16	raw-icu-parse	release	`cargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5`	generated `icu-nested-1000`	50	380.5	105.23	First raw `ferrocat-icu` nested-parser baseline after adding dedicated ICU benchmark fixtures. This isolates parser cost from `ferrocat-po` merge and serialization overhead.
2026-03-16	raw-icu-validate	release	`cargo run --release -p ferrocat-bench -- validate-icu icu-plural-1000 50 --warmup 1 --runs 5`	generated `icu-plural-1000`	50	1474.9	99.55	First direct validation benchmark on simple top-level ICU plurals; useful as a low-structure comparison point against the nested parse workload.
2026-03-16	raw-icu-parse	release	`cargo run --release -p ferrocat-bench -- parse-icu icu-args-1000 50 --warmup 1 --runs 5`	generated `icu-args-1000`	50	1777.0	108.28	Raw ICU parser throughput on argument-heavy messages after adding the dedicated ICU benchmark harness.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5`	generated `catalog-icu-heavy`	10	142.8	36.23	First ICU-heavy end-to-end benchmark with a catalog corpus that mixes projectable ICU plurals, unsupported valid ICU structures, tags, and formatter usage.
2026-03-16	raw-icu-parse	release	`cargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5`	generated `icu-nested-1000`	50	440.0	121.69	Shifted the `ferrocat-icu` parser hot path away from repeated `char`/slice probing toward cheaper byte-oriented structural scanning in `parse_nodes`, `parse_identifier`, `skip_whitespace`, and related helpers.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5`	generated `catalog-icu-heavy`	10	143.5	36.41	Follow-up end-to-end rerun after the parser tightening and in-place ICU literal escaping. The raw parser improved materially; the catalog-heavy pipeline moved only slightly because merge and projection/serialization still make up most of the remaining cost.
2026-03-16	raw-icu-extract-variables	release	`cargo run --release -p ferrocat-bench -- extract-icu-variables icu-tags-1000 50 --warmup 1 --runs 3`	generated `icu-tags-1000`	50	2969.4	288.54	Direct helper benchmark for variable extraction on tag-heavy messages, confirming that AST-derived helper passes are much cheaper than full parsing once the message is already parsed.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-unsupported 5 --warmup 1 --runs 3`	generated `catalog-icu-unsupported`	5	114.6	28.06	Baseline for valid-but-unsupported ICU structures that force conservative diagnostics and fallback instead of clean projection into the current catalog plural model.
2026-03-16	raw-icu-parse	release	`cargo run --release -p ferrocat-bench -- parse-icu icu-args-1000 50 --warmup 1 --runs 5`	generated `icu-args-1000`	50	1774.2	108.10	Final parser rerun after the second ICU optimization round. The argument-heavy path stayed essentially flat, which is expected because the latest work primarily targeted nested option parsing and end-to-end ICU-heavy merge overhead.
2026-03-16	raw-icu-parse	release	`cargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5`	generated `icu-nested-1000`	50	442.2	122.28	Final nested-parser measurement after byte-prefix handling for `offset:`/close-tag checks and a cleanup of downstream ICU-heavy merge helpers. This is the current best nested raw-parser result for the M1 parser.
2026-03-16	raw-icu-validate	release	`cargo run --release -p ferrocat-bench -- validate-icu icu-plural-1000 50 --warmup 1 --runs 5`	generated `icu-plural-1000`	50	1821.7	122.97	Final validation benchmark after the same parser tightening. Validation inherits the parser improvements and now clearly outpaces the first raw validation baseline.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5`	generated `catalog-icu-heavy`	10	151.7	38.50	Improved ICU-heavy end-to-end throughput by combining parser byte-prefix cleanups with cheaper large-collection merge dedupe handling for comments, placeholders, and similar small string/origin sets.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-projectable 10 --warmup 1 --runs 5`	generated `catalog-icu-projectable`	10	128.1	27.22	First explicitly recorded benchmark for the fully projectable ICU-heavy catalog path, used to separate clean projection cost from the unsupported/diagnostic-heavy corpus.
2026-03-16	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog catalog-icu-unsupported 5 --warmup 1 --runs 3`	generated `catalog-icu-unsupported`	5	113.5	27.78	Final unsupported ICU corpus rerun after the merge-helper cleanup. Throughput stayed effectively flat, which is a good sign that the optimization favored projectable/common cases without regressing the conservative fallback path.
2026-03-16	update-catalog-file	release	`cargo run --release -p ferrocat-bench -- update-catalog-file mixed-1000 5 --warmup 1 --runs 2`	generated `merge-mixed-1000`	5	402.5	58.31	First file-oriented benchmark for `update_catalog_file`, including fixture reset, file read/write path, and atomic rewrite behavior around the same high-level catalog pipeline.
2026-03-18	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 --runs 3`	generated `mixed-10000`	100	324.0	382.47	Post high-level catalog API borrowing refactor, rustdoc/lint hardening, and internal helper/module cleanup. Compared with the pre-change checkpoint in the same session (`302.1 iter/s`, `356.65 MiB/s`), owned parse improved instead of regressing.
2026-03-18	parse-borrowed	release	`cargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 --runs 3`	generated `mixed-10000`	100	455.2	537.35	Same refactor checkpoint for the borrowed parser. Compared with the same-session baseline (`423.7 iter/s`, `500.09 MiB/s`), the zero-copy path remained allocation-light and got faster.
2026-03-18	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 --runs 3`	generated `mixed-10000`	100	990.2	1167.04	Serializer throughput stayed comfortably above the earlier same-session baseline (`936.1 iter/s`, `1103.24 MiB/s`), which is a good sign that the API/docs refactor did not spill overhead into PO output hot paths.
2026-03-18	merge	release	`cargo run --release -p ferrocat-bench -- merge gettext-ui-de-1000 --runs 3`	generated `gettext-ui-de-1000`	400	1779.5	358.56	Merge throughput also improved relative to the same-session baseline (`1645.9 iter/s`, `331.64 MiB/s`), confirming that the surrounding catalog API cleanup left the direct merge path healthy.
2026-03-18	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog gettext-ui-de-1000 --runs 3`	generated `gettext-ui-de-1000`	400	342.5	70.71	High-level catalog update stayed essentially flat-to-slightly-better versus the same-session baseline (`340.1 iter/s`, `70.21 MiB/s`). This was the most important guardrail for the borrowing-based request API change.
2026-03-18	parse	release	`cargo run --release -p ferrocat-bench -- parse mixed-10000 --runs 3`	generated `mixed-10000`	100	325.6	384.30	Round-2 catalog API modularization that moved the remaining catalog workflow out of `api.rs` into `api/catalog.rs`. Owned parse stayed slightly above the earlier same-day checkpoint (`324.0 iter/s`, `382.47 MiB/s`).
2026-03-18	parse-borrowed	release	`cargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 --runs 3`	generated `mixed-10000`	100	458.4	541.12	Same round-2 modularization checkpoint for the borrowed parser. The zero-copy path remained faster than the earlier same-day measurement (`455.2 iter/s`, `537.35 MiB/s`).
2026-03-18	stringify	release	`cargo run --release -p ferrocat-bench -- stringify mixed-10000 --runs 3`	generated `mixed-10000`	100	996.9	1174.95	Serializer throughput remained comfortably ahead of the earlier same-day checkpoint (`990.2 iter/s`, `1167.04 MiB/s`) after the catalog module split.
2026-03-18	merge	release	`cargo run --release -p ferrocat-bench -- merge gettext-ui-de-1000 --runs 3`	generated `gettext-ui-de-1000`	400	1804.2	363.54	Merge throughput ticked up again after the remaining API-layer code moved out of the facade module, which is a good sign that the structural split stayed out of the hot path.
2026-03-18	update-catalog	release	`cargo run --release -p ferrocat-bench -- update-catalog gettext-ui-de-1000 --runs 3`	generated `gettext-ui-de-1000`	400	347.0	71.64	Most important round-2 guardrail: the end-to-end high-level catalog update path improved over the earlier same-day checkpoint (`342.5 iter/s`, `70.71 MiB/s`) while landing the module split.
2026-03-18	parse-catalog-po	release	`cargo run --release -p ferrocat-bench -- parse-catalog-po catalog-modern-de-10000 --runs 3`	generated `catalog-modern-de-10000`	100	101.9	157.39	First fair internal storage-format parse benchmark on the new modern catalog fixture family. This compares `CatalogStorageFormat::Po` against NDJSON on the same ICU-oriented catalog semantics instead of classic gettext plural entries.
2026-03-18	parse-catalog-ndjson	release	`cargo run --release -p ferrocat-bench -- parse-catalog-ndjson catalog-modern-de-10000 --runs 3`	generated `catalog-modern-de-10000`	100	92.2	166.54	Matching NDJSON parse checkpoint for the modern internal storage-format benchmark family. The lower iter/s but higher MiB/s relative to the PO row reflects the larger one-record-per-line JSON representation rather than a different message corpus.
2026-03-18	stringify-catalog-po	release	`cargo run --release -p ferrocat-bench -- stringify-catalog-po catalog-modern-de-10000 --runs 3`	generated `catalog-modern-de-10000`	100	243.8	379.05	First PO render checkpoint on the modern internal storage-format fixture family. This path serializes the same canonical catalog semantics back into PO with ICU strings in `msgid`/`msgstr`, without reintroducing classic gettext plural entries.
2026-03-18	stringify-catalog-ndjson	release	`cargo run --release -p ferrocat-bench -- stringify-catalog-ndjson catalog-modern-de-10000 --runs 3`	generated `catalog-modern-de-10000`	100	277.0	500.28	Matching NDJSON render checkpoint for the same modern catalog corpus. NDJSON currently renders faster in this benchmark-local path because it writes a direct JSON-line representation from the parsed catalog while the PO side reconstructs full PO item structure.
2026-03-18	update-catalog-file	release	`cargo run --release -p ferrocat-bench -- update-catalog-file catalog-modern-de-1000 --runs 3`	generated `catalog-modern-de-1000`	400	285.8	43.90	First file-oriented PO storage benchmark on the modern catalog fixture family. This is the fairer baseline for comparing high-level storage rewrite cost against NDJSON on the same ICU-oriented message mix.
2026-03-18	update-catalog-file-ndjson	release	`cargo run --release -p ferrocat-bench -- update-catalog-file-ndjson catalog-modern-de-1000 --runs 3`	generated `catalog-modern-de-1000`	400	281.2	50.21	Matching file-oriented NDJSON storage benchmark on the modern fixture family. The end-to-end update path stayed very close to PO, which suggests the storage-format choice matters less here than in the raw parse/render microbenchmarks.