ADR 0018: Add Machine-Translation Metadata to Catalog Entries
Accepted architecture decision record: machine-translation entry metadata.
ADR 0018: Add Machine-Translation Metadata to Catalog Entries
- Status: Accepted
- Date: 2026-05-12
Context
Ferrocat already has source-side semantic message metadata, but automatic translation workflows need translation-side provenance. In automation-centered catalogs, most translations may be produced by a machine translation pipeline, and the important question is not review state. It is whether the machine translation facts still describe the current translation text.
The metadata should stay compact in PO files, fit naturally in NDJSON records, and not require Ferrocat to call any translation model itself.
Decision
Add optional machine-translation metadata to each high-level catalog entry.
The public model contains:
model, a caller-defined model identifier such asopenai/gpt-5.5-high- optional
modified, the time when the machine-generated translation was last modified - optional
confidence, an integer from 0 to 100 hash, a change-detection hash for the current translation payload
Ferrocat does not store provider, review, or source-message digest fields.
If machine-translation metadata is absent, the translation is no longer
machine-provenance-tracked.
The hash contract is:
- SHA-256 over a canonical, length-delimited translation payload
- prefixed with the fixed
ferrocat:mt:v1namespace - truncated to 128 bits
- encoded as unpadded Base64URL
The namespace avoids a plain hash of only the translated string, but it is not secret. The hash is a change-detection marker, not a signature or tamper-proof security boundary.
PO stores the metadata in one compact line:
#@ ferrocat-mt model=openai/gpt-5.5-high modified=2026-05-12T10:30:00Z confidence=95 hash=...NDJSON stores it under the optional mt field in format: ferrocat.ndjson.v1. The format is still new enough that this field can be part
of the v1 record contract without introducing a v2 marker.
Parsing preserves stale metadata. High-level catalog writers verify the hash against the current translation and drop the whole metadata block when it no longer matches.
Consequences
Positive:
- machine translation provenance becomes first-class catalog data
- human edits are represented by removing stale machine-translation metadata
- PO remains compact and readable
- NDJSON gets a versioned storage contract for the new field
Negative:
- the high-level catalog model and NDJSON storage contract grow a new public field
- callers must recompute the hash when they update machine-generated translations
- the fixed namespace can be reproduced from open source code, so it must not be treated as a secret
Alternatives Considered
Add Review State
Rejected for this phase. The current workflow treats missing or stale machine-translation metadata as sufficient evidence that the translation is no longer machine-provenance-tracked.
Track Provider Separately
Rejected because the model string can include provider information without
requiring Ferrocat to standardize provider naming.
Store One PO Metadata Line Per Field
Rejected because the compact single-line #@ ferrocat-mt ... form is easier to
scan in ordinary PO review.