Gettext Task Landscape

A workflow-level map of gettext-inspired catalog jobs, ecosystem patterns, and Ferrocat's product direction.

Gettext Task Landscape

This is an advanced reference page. You do not need to know GNU gettext to use Ferrocat; start with the Guide if you want the product-level path first.

Ferrocat is not trying to be a command-line clone of GNU gettext. It is a new catalog engine that learns from the jobs established translation tooling already proved useful: extract messages, update catalogs, preserve translator context, check release readiness, and produce runtime data applications can load.

That distinction matters because catalog jobs are broader than one tool's command-line flags. Some projects use stable IDs, some use real source strings as msgid, and many need to preserve translator-facing PO metadata either way. Ferrocat's core promise is simpler: keep catalog identity explicit, preserve translations when catalogs evolve, surface conflicts clearly, and compile the result into something applications can load efficiently.

In the Palamedes ecosystem, that means Ferrocat is the catalog and semantics layer while Palamedes is the JS/TS i18n framework layer. The two projects should be read together: one handles the catalog jobs, the other handles application authoring, extraction, adapters, and runtime integration.

Task Matrix

Task	Gettext idea	Common ecosystem pattern	Ferrocat today	Product direction
Extract source messages	`xgettext` scans code and writes POT files	Babel uses configurable extraction methods for Python apps and templates	Not part of the core crate	Keep extraction host-specific; define clean adapter inputs instead of embedding every language parser
Integrate app frameworks	GNU gettext leaves this mostly to language bindings and application code	Framework packages usually own macros, hooks, loaders, and routing conventions	Ferrocat stays host-neutral; Palamedes owns JS/TS framework integration	Keep Ferrocat focused on catalog semantics and let Palamedes bind those semantics into application frameworks
Parse and write PO files	PO is the human-editable catalog format	`pofile`, `gettext-parser`, and `polib` expose editable object models	`parse_po`, `parse_po_borrowed`, `stringify_po`	Add richer source positions only when the public error model is ready
Update a catalog from new source	`msgmerge` carries existing translations forward into a new template	Many projects script this around gettext binaries or PO libraries	`merge_catalog`, `update_catalog`, `update_catalog_file`	Prefer exact source-string identity and explicit obsolete strategies over fuzzy-first behavior
Combine catalogs	`msgcat`, `msgcomm`, and `msguniq` merge or select messages by occurrence	Usually handled by shell pipelines or ad hoc scripts	`combine_catalogs` with explicit conflict and selection rules	Expand selection and reporting when real workflows need it
Reduce Git merge pain	Usually handled outside gettext through file layout, merge drivers, or process rules	Large teams often rely on line-based review and hosted pull request merges	ICU-native NDJSON stores one message per line	Treat storage format as a collaboration feature, not just an interchange detail
Track AI translation metadata	Usually modeled in platform-specific sidecars or external TMS state	AI-assisted workflows often need model, confidence, timestamp, and stale-edit detection	`MachineTranslationMetadata` on catalog entries, stored in PO or NDJSON	Keep machine-translation provenance with the catalog while clearing it when the translated text changes
Filter or edit message attributes	`msgattrib` and `msggrep` select fuzzy, untranslated, obsolete, or matching messages	`polib`-style object traversal makes this easy but unstructured	Low-level parsed data is available; no high-level predicate API yet	Add structured filtering and transform APIs for CI and translator workflows
Check completeness	`msgcmp` checks whether a translation catalog covers a template	CI often shells out or writes custom checks	`audit_catalogs` reports missing locales, missing translations, empty translations, and target-only messages	Expand reporting only from concrete release workflows
Validate translations	`msgfmt` checks and compiles catalogs; Weblate, FormatJS, and Translate Toolkit add workflow checks	`gettext-parser` and `polib` can compile PO/MO; app stacks add their own checks	`audit_catalogs` reports ICU syntax, ICU compatibility, semantic metadata conflicts, obsolete entries, and visible `fuzzy` flags	Keep validation reports structured before considering full MO parity
Build runtime payloads	`msgfmt` produces MO files for libintl-style runtime lookup	Node/Python libraries often parse or compile PO/MO directly	`compile_catalog_artifact` emits host-neutral runtime maps	Keep MO support optional; prioritize modern bundle and runtime artifacts
Handle plural semantics	Gettext uses `Plural-Forms`, `msgid_plural`, and `msgstr[n]`	Libraries vary in how much plural logic they understand	`GettextCompat` for classic PO, `IcuNative` for ICU messages	Keep semantic mode choices explicit instead of guessing
Work with non-UTF-8 catalogs	GNU gettext has deep charset conversion support	`gettext-parser` handles charset conversion for Buffer input	Ferrocat is UTF-8-oriented today	Document charset boundaries before adding conversion complexity

Ecosystem Lessons

Project	Main role	Useful lesson for Ferrocat
GNU gettext	Full reference toolchain for extraction, merge, manipulation, validation, MO compilation, and runtime lookup	Treat the tools as a catalog of jobs, not as an API shape to copy
FormatJS CLI	Extraction and verification for ICU-heavy JavaScript projects	A verify-style report is useful, but Ferrocat should keep it host-neutral
Weblate checks	Hosted translation checks with many catalog-quality rules	Checks should be visible, configurable diagnostics rather than hidden runtime fallback
Translate Toolkit `pofilter`	Predicate-style PO QA tests	Start with a small report API before adding a broad predicate language
Lingui	Application workflow tooling for message extraction, compile, and catalog status	Workflow ergonomics matter, but host-specific extraction belongs outside Ferrocat
Paraglide JS	Generated JS runtime modules from message catalogs	Runtime code generation belongs in host adapters; shared catalog semantics belong in Ferrocat
Tolgee CLI	Sync-oriented extraction and translation workflow tooling	Sync and repair flows are separate from a read-only shippability report
`pofile`	JavaScript PO parse and serialize library	A small editable PO object model is valuable when callers want direct control
`gettext-parser`	Node.js PO/MO parse and compile library	MO support, stream parsing, and charset handling are useful comparison points
Python `polib`	Mature PO/POT/MO manipulation library	Entry-level mutation and filtering are important everyday workflows
Babel	Python localization workflow with pluggable extraction	Extraction belongs near the host language and framework
Rust `polib`	Rust PO/MO load, manipulate, and save library	UTF-8-only PO/MO support is a pragmatic baseline to compare against
`gettext-rs`	Safe Rust bindings for gettext runtime APIs	Runtime lookup and catalog authoring are different layers and should stay separate

Product Direction

Ferrocat should keep moving toward task-oriented catalog APIs with exact catalog identity, explicit diagnostics, and practical PO interoperability:

Predictable catalog maintenance. Whether msgid is a stable key or real source copy, updates should use exact identity and clear conflicts. Fuzzy matching can be considered later, but it should not hide source changes or silently rewrite translator intent.
Structured diagnostics over terminal text. Library users need data they can show in CI, editors, dashboards, and release tooling.
Read-only audit before repair. Completeness, ICU, metadata, stale-message, obsolete, and fuzzy checks should be available as a report before Ferrocat grows any repair or fuzzy-matching behavior.
Explicit conflict policy. Combine and update workflows should make precedence and failure behavior visible in options.
Collaboration-friendly storage. NDJSON should be positioned as a practical large-team catalog format: one message per line, cleaner pull request diffs, narrower conflicts, and no dependency on custom merge handlers for everyday hosted Git workflows.
AI translation metadata with cleanup. Machine-generated translations should carry compact model, confidence, modified-time, and hash metadata in the catalog itself, and stale metadata should disappear when the translation text changes.
Palamedes as the application layer. Ferrocat should expose the catalog behavior Palamedes needs without absorbing framework-specific extraction, bundling, routing, or runtime concerns.
Host-specific extraction. JavaScript, Rust, Python, Markdown, templates, and framework conventions deserve adapters, not one oversized parser in the core crate.
Runtime output for modern apps. MO compatibility is useful to evaluate, but host-neutral compiled artifacts are a better default for bundlers, edge runtimes, and multi-language product stacks.

Near-Term Backlog

The most promising gettext-inspired additions are:

catalog filtering and attribute transforms inspired by msgattrib and msggrep
catalog filtering and transforms on top of audit results, when the report vocabulary is stable enough
optional MO read/write evaluation against existing Rust and Node libraries
clearer extractor adapter contracts for both ID-based and source-as-msgid tools
tighter Palamedes documentation links for extraction, runtime artifacts, and future chunk-aware sidecars