Skip to content

Performance History

Append-only history for benchmark and profiling checkpoints.

Rules:

  • add rows, do not replace older numbers
  • keep command, fixture, build profile, and notes explicit
  • prefer comparable reruns over ad-hoc measurements
DateAreaBuildCommandFixtureIterationsIter/sMiB/sNotes
2026-03-14parsedevcargo run -p ferrocat-bench -- parse realistic 1000realistic.po100020121.015.10Pre byte-line-scanner baseline
2026-03-14parsedevcargo run -p ferrocat-bench -- parse realistic 1000realistic.po100029211.621.92Post byte-line-scanner + memchr refactor
2026-03-14parsedevcargo run -p ferrocat-bench -- parse mixed-1000 200generated mixed-1000200412.147.881000 entries, mixed features, deterministic corpus
2026-03-14parsereleasecargo run --release -p ferrocat-bench -- parse mixed-1000 200generated mixed-10002002830.9328.91Release baseline after byte-line-scanner refactor
2026-03-14parsereleasecargo run --release -p ferrocat-bench -- parse mixed-1000 200generated mixed-10002002957.0343.56Added borrow-or-own fast path for quoted strings
2026-03-14parsereleasecargo run --release -p ferrocat-bench -- parse mixed-1000 200generated mixed-10002003041.8353.41Centralized scanner classification/helpers without borrowed-item overhead
2026-03-14parsereleasecargo run --release -p ferrocat-bench -- parse mixed-1000 200generated mixed-10002003393.1394.23Scanner backend helpers added; repeated runs showed noticeable single-run variance
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200392.4463.22Byte-based quoted extraction plus unchecked UTF-8 conversion on parser-owned slices; profiled run reached 395.5 iter/s
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200401.2473.63LineScanner now trims only leading ASCII whitespace on the hot path and carries a smaller line record.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200406.4479.70Singular msgstr[0] now stays on a scratch string in parser state and only promotes to a Vec<String> for plural cases.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200415.5490.45Inlined scanner helpers, specialized keyword classification, and replaced Plural-Forms parsing with a byte-based path.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200457.9540.49Replaced always-Vec<String> msgstr storage with a MsgStr enum (None/Singular/Plural), removing per-item vector overhead for the common singular case.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200473.3558.71ParserState::reset now clears fields in place and reuses PoItem allocations instead of rebuilding the whole parser state every item.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200483.5570.72Split reset paths so the common post-mem::take case avoids a full PoItem::clear_for_reuse, while the header path still clears in place.
2026-03-15parse-borrowedreleasecargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200generated mixed-10000200816.0963.27First zero-copy parse path with borrowed Cow-backed items; header key/value are still materialized today.
2026-03-15parse-borrowedreleasecargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200generated mixed-10000200835.9986.75Borrowed parser now extracts standard header fragments directly from raw msgstr lines, so common ...\\n header entries stay borrowed too.
2026-03-15parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 200generated mixed-10000200478.6564.91unescape_string now decodes escape-heavy slices via a byte-oriented span copier instead of repeated String appends; first post-build run showed noise, repeat settled higher.
2026-03-15parse-borrowedreleasecargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 200generated mixed-10000200858.61013.56Same byte-oriented unescape path modestly improved the borrowed parser slow path for escaped strings.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100107.7155.32First merge_catalog(existing_po, extracted_messages) benchmark using borrowed parse of the existing catalog, preserved translations for matching keys, and obsolete-marking for removed keys.
2026-03-15stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 200generated mixed-10000200896.51065.16Same MsgStr enum change preserved stringify throughput while using a more compact translation model.
2026-03-14stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-1000 200generated mixed-10002001268.0148.29Baseline before simple-keyword direct-write fast path
2026-03-14stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-1000 200generated mixed-10002003213.3375.80Direct fast path for common single-line keyword writes
2026-03-14stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-1000 200generated mixed-10002004532.1530.03Replaced multiline/folding Vec<String> pipeline with direct segmented writes; repeated runs ranged from 4246.4 to 4532.1 iter/s
2026-03-14stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-1000 200generated mixed-10002007507.4877.99Replaced temporary escaped strings with direct buffer writes; scratch buffer reused for multiline segments
2026-03-14stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 200generated mixed-10000200830.1986.28Same direct-escape write path confirmed on larger corpus after Time Profiler-guided optimization
2026-03-15stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 200generated mixed-10000200881.61047.44Added aarch64 NEON escape-byte scan path; repeated mixed-10000 runs stayed around 868.7-887.2 iter/s
2026-03-15stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 200generated mixed-10000200919.21092.16Reused one scratch buffer across the whole stringify pass instead of recreating multiline escape buffers per keyword
2026-03-16stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 500 --warmup 1 --runs 5generated mixed-100005001014.01204.77Reused the first escape scan result inside write_keyword so the common simple-keyword path no longer re-scans values before escaping; measured with the new median-based harness.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100114.6165.29Fixed obsolete-item roundtrip accounting in both owned and borrowed parsers and reran the realistic merge benchmark.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100145.2209.45Replaced owned string-key bookkeeping with borrowed msgid buckets plus a Vec<bool> matched map, removing repeated key allocation and HashSet churn in the merge loop.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100164.9237.92Replaced BorrowedPoItem::clone().into_owned() style materialization with direct owned construction in the merge path, cutting intermediate Vec/Cow clone churn for matched and obsolete items.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100248.0357.70Switched merge_catalog to a direct transient render path that writes merged items straight into the output buffer while preserving canonical default stringify_po formatting.
2026-03-15mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 100generated merge-mixed-10000100297.2428.73Replaced the general borrowed parser in the merge path with a merge-specialized borrowed parser that stores lighter-weight item data and parses only the structures merge_catalog needs.
2026-03-16mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 500generated merge-mixed-10000500310.4447.73Added a merge-local quoted-string fast path so keyword lines, continuations, and header fragments avoid the generic quoted extractor on the common unescaped case; longer runs were used because the 100-iteration merge benchmark showed noticeable noise.
2026-03-16mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5generated merge-mixed-10000500292.7422.14Added multi-run median reporting to the benchmark harness and kept the merge parser's common post-mem::take reset path specialized; the same setup measured 279.0 iter/s when falling back to a full ParserState::reset, so the split reset remains a real win.
2026-03-16mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5generated merge-mixed-10000500288.7416.40Final verification rerun with the same median-based harness settled slightly lower but with a much tighter range (287.9..290.1 iter/s), still comfortably above the 279.0 iter/s full-reset A/B baseline.
2026-03-16mergereleasecargo run --release -p ferrocat-bench -- merge mixed-10000 500 --warmup 1 --runs 5generated merge-mixed-10000500311.5449.40The same shared write_keyword fast-path cleanup also improved merge throughput, which now benefits from reusing the serializer's first escape scan result in its direct render path.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3generated merge-mixed-100020478.869.37First end-to-end benchmark for the new high-level update_catalog API, including normalization, canonical catalog merge, ICU export, and final PO serialization.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3generated merge-mixed-100020488.070.70Replaced the high-level API's heuristic plural-category mapping with ICU4X icu_plurals, using locale-aware cardinal categories when the locale and nplurals agree and falling back to the existing count-based mapping for mismatches or invalid locales.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3generated merge-mixed-100020479.669.48Centralized plural handling behind PluralProfile, surfaced parse-time plural diagnostics, and added conservative Plural-Forms header generation only for safe 1/2-form gettext locales; throughput stayed effectively flat within normal run-to-run noise.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 3generated merge-mixed-100020425.961.71Replaced the ad-hoc ICU plural heuristic with the new ferrocat-icu MessageFormat-v1 parser and conservative projection adapter. This materially improved correctness and diagnostics for ICU-backed catalog parsing, but introduced a noticeable first-pass cost that should be profiled in a follow-up optimization round.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog mixed-1000 20 --warmup 1 --runs 5generated merge-mixed-100020464.367.27Added a cheap ICU-shape fast path ahead of parse_icu, so obviously non-ICU strings skip the new MessageFormat parser entirely; this recovered most of the first integration regression while keeping the stricter ICU correctness path for strings that actually look like ICU syntax.
2026-03-16raw-icu-parsereleasecargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5generated icu-nested-100050380.5105.23First raw ferrocat-icu nested-parser baseline after adding dedicated ICU benchmark fixtures. This isolates parser cost from ferrocat-po merge and serialization overhead.
2026-03-16raw-icu-validatereleasecargo run --release -p ferrocat-bench -- validate-icu icu-plural-1000 50 --warmup 1 --runs 5generated icu-plural-1000501474.999.55First direct validation benchmark on simple top-level ICU plurals; useful as a low-structure comparison point against the nested parse workload.
2026-03-16raw-icu-parsereleasecargo run --release -p ferrocat-bench -- parse-icu icu-args-1000 50 --warmup 1 --runs 5generated icu-args-1000501777.0108.28Raw ICU parser throughput on argument-heavy messages after adding the dedicated ICU benchmark harness.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5generated catalog-icu-heavy10142.836.23First ICU-heavy end-to-end benchmark with a catalog corpus that mixes projectable ICU plurals, unsupported valid ICU structures, tags, and formatter usage.
2026-03-16raw-icu-parsereleasecargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5generated icu-nested-100050440.0121.69Shifted the ferrocat-icu parser hot path away from repeated char/slice probing toward cheaper byte-oriented structural scanning in parse_nodes, parse_identifier, skip_whitespace, and related helpers.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5generated catalog-icu-heavy10143.536.41Follow-up end-to-end rerun after the parser tightening and in-place ICU literal escaping. The raw parser improved materially; the catalog-heavy pipeline moved only slightly because merge and projection/serialization still make up most of the remaining cost.
2026-03-16raw-icu-extract-variablesreleasecargo run --release -p ferrocat-bench -- extract-icu-variables icu-tags-1000 50 --warmup 1 --runs 3generated icu-tags-1000502969.4288.54Direct helper benchmark for variable extraction on tag-heavy messages, confirming that AST-derived helper passes are much cheaper than full parsing once the message is already parsed.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-unsupported 5 --warmup 1 --runs 3generated catalog-icu-unsupported5114.628.06Baseline for valid-but-unsupported ICU structures that force conservative diagnostics and fallback instead of clean projection into the current catalog plural model.
2026-03-16raw-icu-parsereleasecargo run --release -p ferrocat-bench -- parse-icu icu-args-1000 50 --warmup 1 --runs 5generated icu-args-1000501774.2108.10Final parser rerun after the second ICU optimization round. The argument-heavy path stayed essentially flat, which is expected because the latest work primarily targeted nested option parsing and end-to-end ICU-heavy merge overhead.
2026-03-16raw-icu-parsereleasecargo run --release -p ferrocat-bench -- parse-icu icu-nested-1000 50 --warmup 1 --runs 5generated icu-nested-100050442.2122.28Final nested-parser measurement after byte-prefix handling for offset:/close-tag checks and a cleanup of downstream ICU-heavy merge helpers. This is the current best nested raw-parser result for the M1 parser.
2026-03-16raw-icu-validatereleasecargo run --release -p ferrocat-bench -- validate-icu icu-plural-1000 50 --warmup 1 --runs 5generated icu-plural-1000501821.7122.97Final validation benchmark after the same parser tightening. Validation inherits the parser improvements and now clearly outpaces the first raw validation baseline.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-heavy 10 --warmup 1 --runs 5generated catalog-icu-heavy10151.738.50Improved ICU-heavy end-to-end throughput by combining parser byte-prefix cleanups with cheaper large-collection merge dedupe handling for comments, placeholders, and similar small string/origin sets.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-projectable 10 --warmup 1 --runs 5generated catalog-icu-projectable10128.127.22First explicitly recorded benchmark for the fully projectable ICU-heavy catalog path, used to separate clean projection cost from the unsupported/diagnostic-heavy corpus.
2026-03-16update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog catalog-icu-unsupported 5 --warmup 1 --runs 3generated catalog-icu-unsupported5113.527.78Final unsupported ICU corpus rerun after the merge-helper cleanup. Throughput stayed effectively flat, which is a good sign that the optimization favored projectable/common cases without regressing the conservative fallback path.
2026-03-16update-catalog-filereleasecargo run --release -p ferrocat-bench -- update-catalog-file mixed-1000 5 --warmup 1 --runs 2generated merge-mixed-10005402.558.31First file-oriented benchmark for update_catalog_file, including fixture reset, file read/write path, and atomic rewrite behavior around the same high-level catalog pipeline.
2026-03-18parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 --runs 3generated mixed-10000100324.0382.47Post high-level catalog API borrowing refactor, rustdoc/lint hardening, and internal helper/module cleanup. Compared with the pre-change checkpoint in the same session (302.1 iter/s, 356.65 MiB/s), owned parse improved instead of regressing.
2026-03-18parse-borrowedreleasecargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 --runs 3generated mixed-10000100455.2537.35Same refactor checkpoint for the borrowed parser. Compared with the same-session baseline (423.7 iter/s, 500.09 MiB/s), the zero-copy path remained allocation-light and got faster.
2026-03-18stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 --runs 3generated mixed-10000100990.21167.04Serializer throughput stayed comfortably above the earlier same-session baseline (936.1 iter/s, 1103.24 MiB/s), which is a good sign that the API/docs refactor did not spill overhead into PO output hot paths.
2026-03-18mergereleasecargo run --release -p ferrocat-bench -- merge gettext-ui-de-1000 --runs 3generated gettext-ui-de-10004001779.5358.56Merge throughput also improved relative to the same-session baseline (1645.9 iter/s, 331.64 MiB/s), confirming that the surrounding catalog API cleanup left the direct merge path healthy.
2026-03-18update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog gettext-ui-de-1000 --runs 3generated gettext-ui-de-1000400342.570.71High-level catalog update stayed essentially flat-to-slightly-better versus the same-session baseline (340.1 iter/s, 70.21 MiB/s). This was the most important guardrail for the borrowing-based request API change.
2026-03-18parsereleasecargo run --release -p ferrocat-bench -- parse mixed-10000 --runs 3generated mixed-10000100325.6384.30Round-2 catalog API modularization that moved the remaining catalog workflow out of api.rs into api/catalog.rs. Owned parse stayed slightly above the earlier same-day checkpoint (324.0 iter/s, 382.47 MiB/s).
2026-03-18parse-borrowedreleasecargo run --release -p ferrocat-bench -- parse-borrowed mixed-10000 --runs 3generated mixed-10000100458.4541.12Same round-2 modularization checkpoint for the borrowed parser. The zero-copy path remained faster than the earlier same-day measurement (455.2 iter/s, 537.35 MiB/s).
2026-03-18stringifyreleasecargo run --release -p ferrocat-bench -- stringify mixed-10000 --runs 3generated mixed-10000100996.91174.95Serializer throughput remained comfortably ahead of the earlier same-day checkpoint (990.2 iter/s, 1167.04 MiB/s) after the catalog module split.
2026-03-18mergereleasecargo run --release -p ferrocat-bench -- merge gettext-ui-de-1000 --runs 3generated gettext-ui-de-10004001804.2363.54Merge throughput ticked up again after the remaining API-layer code moved out of the facade module, which is a good sign that the structural split stayed out of the hot path.
2026-03-18update-catalogreleasecargo run --release -p ferrocat-bench -- update-catalog gettext-ui-de-1000 --runs 3generated gettext-ui-de-1000400347.071.64Most important round-2 guardrail: the end-to-end high-level catalog update path improved over the earlier same-day checkpoint (342.5 iter/s, 70.71 MiB/s) while landing the module split.
2026-03-18parse-catalog-poreleasecargo run --release -p ferrocat-bench -- parse-catalog-po catalog-modern-de-10000 --runs 3generated catalog-modern-de-10000100101.9157.39First fair internal storage-format parse benchmark on the new modern catalog fixture family. This compares CatalogStorageFormat::Po against NDJSON on the same ICU-oriented catalog semantics instead of classic gettext plural entries.
2026-03-18parse-catalog-ndjsonreleasecargo run --release -p ferrocat-bench -- parse-catalog-ndjson catalog-modern-de-10000 --runs 3generated catalog-modern-de-1000010092.2166.54Matching NDJSON parse checkpoint for the modern internal storage-format benchmark family. The lower iter/s but higher MiB/s relative to the PO row reflects the larger one-record-per-line JSON representation rather than a different message corpus.
2026-03-18stringify-catalog-poreleasecargo run --release -p ferrocat-bench -- stringify-catalog-po catalog-modern-de-10000 --runs 3generated catalog-modern-de-10000100243.8379.05First PO render checkpoint on the modern internal storage-format fixture family. This path serializes the same canonical catalog semantics back into PO with ICU strings in msgid/msgstr, without reintroducing classic gettext plural entries.
2026-03-18stringify-catalog-ndjsonreleasecargo run --release -p ferrocat-bench -- stringify-catalog-ndjson catalog-modern-de-10000 --runs 3generated catalog-modern-de-10000100277.0500.28Matching NDJSON render checkpoint for the same modern catalog corpus. NDJSON currently renders faster in this benchmark-local path because it writes a direct JSON-line representation from the parsed catalog while the PO side reconstructs full PO item structure.
2026-03-18update-catalog-filereleasecargo run --release -p ferrocat-bench -- update-catalog-file catalog-modern-de-1000 --runs 3generated catalog-modern-de-1000400285.843.90First file-oriented PO storage benchmark on the modern catalog fixture family. This is the fairer baseline for comparing high-level storage rewrite cost against NDJSON on the same ICU-oriented message mix.
2026-03-18update-catalog-file-ndjsonreleasecargo run --release -p ferrocat-bench -- update-catalog-file-ndjson catalog-modern-de-1000 --runs 3generated catalog-modern-de-1000400281.250.21Matching file-oriented NDJSON storage benchmark on the modern fixture family. The end-to-end update path stayed very close to PO, which suggests the storage-format choice matters less here than in the raw parse/render microbenchmarks.